Alright everyone here is my promised test. This time I used extracted game pack "Data00.packed" from the game "Raiders of the Broken Planet - Wardog Fury". Unpacked resource is of 5.33gb size and contain many thousands of different files, including txt, lua scripts, dds, cm3, msh, fb, fsb, json, actclass, actgroup and whatnot. It does not contain any compressed ogg audio which I wanted to avoid. First I am going to dump all results and then go through it step by step:
Code:
data(_srt).flat 5.33gb
data.flat.srep 4.49gb
data_srt.flat.srep 4.49gb
FA -m5(-mc:4x4/4x4:lzma:lc8):
-data.flat 1144mb_ram 5:10min 2.51gb
-data.flat.srep 1143mb_ram 5:10min 2.46gb
-data_srt.flat 1226mb_ram 5:20min 2.49gb
-data_srt.flat.srep 1218mb_ram 5:15min 2.46gb | Decmp: cpu100% 38sec
FA -mx:
-data_srt.flat 1884mb_ram 23:28min 2.46gb
-data_srt.flat.srep ^1950mb_ram 22:35min 2.45gb | Decmp: cpu25% 2:22min
7z -ultra(d:64m:word=64):
-data_srt.flat.srep 2158mb_ram 12:03min 2.41gb | Decmp: cpu25% 1:48min
7z(a -txz -mcrc=0 -mf=off -myx=0 -mmt=4 -m0=lzma2:d=64m:fb=16:a=0):
-data_srt.flat 3155mb_ram 7:32min 2.59gb
-data_srt.flat.srep 3199mb_ram 6:57min 2.57gb | Decmp: cpu25% 1:48min
same_lzma2_settings(cpu100%, data_srt.flat.srep):
-7z(lzma2:d96m:a=1:mc=32:fb=32:mf=hc4): 3967mb_ram 16:40min 2.43gb | Decmp: cpu25% 1:50min
-FA -m5(no custom string, same as 7z^): 1542mb_ram 6:19min 2.49gb | Decmp: cpu100% 38sec
skip_cmp_data_test(on data_srt.flat.srep.7z):
-7z(^): 8:09min
-FA -m5: 16sec
7z_PPMd(data_srt.flat.srep):
-d1g:word=32 46:00min 2.59gb
-d1g:word=8 35:00min 2.60gb
-d1g:word=4 32:40min 2.68gb
-d256m:word=4 24:20min 2.66gb
Ok so first I tarred all files into single file, I used both 7z(data.flat) and FA(data_srt.flat). FA used special sorting with GES(group-extension-size) format. This is only to demonstrate grouping and sorting effectiveness
for our case. Obviously both files have same size as extracted directory, 5.33gb.
Then I
srep -m3f each, both resulted in same size of 4.49gb. I use new srep64 that have options up to -m5f btw. It is no surprise that both files are still of same size since srep have a full range seek.
Finally I started compressing,
FA -m5(-mc:4x4/4x4:lzma:lc8): on all files. This is same as -m5 except -lc8 option and dictionary defaults to 64mb instead 96mb. Read my previous post in this thread for more details if you like. You can see that without srep, sorting help a tiny bit, with srep compressed size was same regardless of sorting. This was only out of curiosity, from now we will focus on only one data, tarred by FA.
data_srt.flat.srep took 5:15min to compress, to 2.46gb, and during decompression test it took only 38sec to decompress and utilized
all 4 cores for it. More important, during compression it only needed ~1.2gb memory - this is with 4x4!
Then I tried
FA -mx. Without srep it did a tiny bit better, with srep size is almost the same, but took 4x more time to compress, used more memory, only ~2 cores during compression
and during decompression.
Now when you compare it with 7z under it, first with default
-ultra settings, 7z used similar amount of memory, but took almost only half the time than FA -mx did(but still twice of FA -m5 with negligible gain!) and still compressed tiny better. Decompression despite LZMA2 is still only single core bound. Thats HORRIBLE.
I only used FA -mx as a reference for explanation and comparison, it is
not worth using -mx in FA. So yes, if I had to chose between FA -mx and 7z/xz, I would also chose the later. But FA -m5 with 4x4 is miles better and quicker, but lets continue first I will come to that later.
Finally I used 7z with settings as was suggested:
7z(a -txz -mcrc=0 -mf=off -myx=0 -mmt=4 -m0=lzma2:d=64m:fb=16:a=0):. Lets only focus on data_srt.flat.srep from now on. And forget FA -mx as well. How does "this" 7z compare to FA -m5?:
FA: -data_srt.flat.srep 1218mb_ram 5:15min 2.46gb | Decmp: cpu100% 38sec
7z: -data_srt.flat.srep 3199mb_ram 6:57min 2.57gb | Decmp: cpu25% 1:48min
^First compression times of 7z are much closer to FA -m5, so kudos to artag for finding better balance.
But, 7z use ~2.5x more memory, is still slower, *compress worse*, and decompress slower! Yeah, really great benefit of having latest 64bit LZMA2.
Maybe you say, "try LZMA's on same settings, dummy!". Sure, at your service:
same_lzma2_settings(cpu100%, data_srt.flat.srep):
-7z(lzma2:d96m:a=1:mc=32:fb=32:mf=hc4): 3967mb_ram 16:40min 2.43gb | Decmp: cpu25% 1:50min
-FA -m5(no custom string, same as 7z^): 1542mb_ram 6:19min 2.49gb | Decmp: cpu100% 38sec
^Yes, 7z's LZMA2 is a tiny bit more efficient thus compress slightly better. That "better" is pretty much nothing though. Everything else is horrible. Its ~2.5x slower, use ~2.5x more memory and still decompress much slower because of 1t. Although perhaps decompressing can be fixed if you can chain it to 4x4 in FA, it doesnt answer for other deficiencies. 4x4 design and hyper optimization of internal LZMA is superior in every respect.
But I am not done, lets talk about "skipping":
skip_cmp_data_test(on data_srt.flat.srep.7z):
-7z(^): 8:09min
-FA -m5: 16sec
^In 7z I used same settings as suggested before, aka
7z(a -txz -mcrc=0 -mf=off -myx=0 -mmt=4 -m0=lzma2:d=64m:fb=16:a=0). I tried to compress already compressed 7z file. 7z doesnt even detect its own format and tries to compress it again and it took 8:09min, which is
MORE than it took compressing original uncompressed srep file. FA -m5 did basically raw copy on the other hand. So... I dont know. Where are those myths about "superior" 7z/xz coming from? Its just horrible in every respect.
Finally I tried PPMd of 7z just for comparison, it is not relevant to this post but maybe you will find it useful. I actually like PPMd of 7z quite a bit. It seems like a solid replacement of uharc and definitely quicker than PAQ's, although not as good.
----------------------------------------------------------------------------------------------------------------------------------------
In conclusion,
FA -mx is truly not a good choice because it does not take advantages of what makes FA so great. It tries to be strongest option but gives up its 4x4 awesomeness and not only for compression but also decompression. And I believe it also does not skip compressed data like
-m5 does which is another awesome trait of FA. In that case, yes I would also go with modern 64bit xz/7z.
But, internal LZMA on -m5 settings and all mentioned benefits put both 7z and xz to shame. It is significantly quicker, use significantly less memory, decompress significantly quicker utilizing all 4 cores and compress only slightly worse, 2.4% in this case. 2.4% for ~2.5x better speed, ~2.5x less memory usage and full speed 4t decompression. To me 4x4:lzma design is so much better than LZMA2, the later feel like a lost opportunity to me. Lets hope LZMA3 get it right one day. On top of that,
FA -m5 can truly skip uncompressed data, how cool is that? 7z does simply NOT, I am sorry! In the end, the only benefits of latest 7z\xz and their LZMA2 is 64bit, which is only good for bigger dictionary. However, dictionary beyond 64mb is IMO much less relevant especially if you srep data before it. Srep make the job so you dont need to overheat your LZMA. Think of it as a 2-pass filter compression. Internal LZMA is actually quicker and more efficient with memory than 7z/xz. It is clear Bulat took a lot of effort squeezing it. It should also be clear that I am in love with 4x4 and I really hate lzma2. But if you think decompressing on single thread is acceptable, go for it.
I see FA as a state of art, highly efficient all-in-one package. No need to replace anything except srep for srep64. Certainly not 4x4:lzma with that horrible lzma2 of ill-optimized 7z/xz exe's. Memory argument and the need for bigger dictionary is void IMO. Again for you artag:
Code:
7z(a -txz -mcrc=0 -mf=off -myx=0 -mmt=4 -m0=lzma2:d=64m:fb=16:a=0):
-data_srt.flat.srep 3199mb_ram 6:57min 2.57gb | Decmp: cpu25% 1:48min
FA -m5(-mc:4x4/4x4:lzma:lc8):
-data_srt.flat.srep 1218mb_ram 5:15min 2.46gb | Decmp: cpu100% 38sec
Best regards.