|
#31
|
|||
|
|||
|
lzma2 does skip uncompressable blocks, stated by the author
https://sourceforge.net/p/sevenzip/d...read/2f6085ba/ simple tests: take an uncompressable mp4 file, rar , mkv or something like that compres with lzma:256m and 7zip:d=256 (set mnt to two if you want to test both algorithms on same conditions) do some tests with different uncompressable files the lzma will compress very little and sometimes will be bigger (because overhead added to not compressed blocks) lzma2 will be smaller and on the worst case only a few klobytes larger do an extraction test of the compressed files, lzma2 is faster (uncompressed blocks are just copied to output) and finally, this is a test on single compression algorithms, if you use one of the built-in methods on FA, you are combining a chain of algorithms/tools, that is another different thing. That's not what i'm talking about yes i have a four core cpu, if you invoke 4x4:lzma:256m on quad core or i7, well 32 bit fa will fall beyond memory limits (i'm aware of 64bit fazip but i haven't tested it yet) lzma on FA and lzma2 on 7zip is single thread decompression, cpu usage confirms that, also author https://sourceforge.net/p/sevenzip/f...requests/1095/ what is rz? ![]() any coments of somebody with experience with zstd or lzham (combined with FA) http://www.fileforums.com/showthread...464#post465464 quick compare (single mkv file, expect better results with something with mixed compressable-uncompressable blocks like game assets or 3d models-scenes):
Last edited by artag; 02-01-2018 at 22:32. |
| The Following User Says Thank You to artag For This Useful Post: | ||
COPyCAT (24-01-2018) | ||
| Sponsored Links |
|
#32
|
|||
|
|||
|
Dear artag, first thank you for the pics. But I think we need to get deeper into this.
In your first link, Igor does mention that LZMA2 is better for already compressed data: "Also LZMA2 is better than LZMA, if you compress already compressed data." However it does not imply skipping of the compressed chunks and your results actually prove it. It may imply better operation with said data. Specifically, your data show that FA LZMA is in fact almost twice as quick as 7z LZMA2 on same thread counts! I assume here that since on FA you used only LZMA and I dont see any 4x4 there, that FA only used 2 threads. That would compare to second pic with 7z LZMA -mmt=2 output, not the last one that take advantage of more cores. Also, while newer LZMA2 does give better ratio with compressed data - implying more efficient work, it still does not have anything to do with "skipping". If you want to see what true "skipping" is, compress said file with default FA -m5 parameter. Fa -m5 skip for real, invoking raw disk copy, you would not see 6mb/s like in your pic but almost full disk speed(and not because more cores being used). What you are showing have nothing to do with real data skipping, everything to do with more efficient operation and author also did not said anything like that. You misread his comment. I already have data I collected yesterday for my next big post in this thread, you will see the difference. Soon... PS(Btw RZ is this: https://encode.ru/threads/2829-RAZOR...based-archiver) Last edited by elit; 03-01-2018 at 06:24. |
|
#33
|
|||
|
|||
|
Alright everyone here is my promised test. This time I used extracted game pack "Data00.packed" from the game "Raiders of the Broken Planet - Wardog Fury". Unpacked resource is of 5.33gb size and contain many thousands of different files, including txt, lua scripts, dds, cm3, msh, fb, fsb, json, actclass, actgroup and whatnot. It does not contain any compressed ogg audio which I wanted to avoid. First I am going to dump all results and then go through it step by step:
Code:
data(_srt).flat 5.33gb data.flat.srep 4.49gb data_srt.flat.srep 4.49gb FA -m5(-mc:4x4/4x4:lzma:lc8): -data.flat 1144mb_ram 5:10min 2.51gb -data.flat.srep 1143mb_ram 5:10min 2.46gb -data_srt.flat 1226mb_ram 5:20min 2.49gb -data_srt.flat.srep 1218mb_ram 5:15min 2.46gb | Decmp: cpu100% 38sec FA -mx: -data_srt.flat 1884mb_ram 23:28min 2.46gb -data_srt.flat.srep ^1950mb_ram 22:35min 2.45gb | Decmp: cpu25% 2:22min 7z -ultra(d:64m:word=64): -data_srt.flat.srep 2158mb_ram 12:03min 2.41gb | Decmp: cpu25% 1:48min 7z(a -txz -mcrc=0 -mf=off -myx=0 -mmt=4 -m0=lzma2:d=64m:fb=16:a=0): -data_srt.flat 3155mb_ram 7:32min 2.59gb -data_srt.flat.srep 3199mb_ram 6:57min 2.57gb | Decmp: cpu25% 1:48min same_lzma2_settings(cpu100%, data_srt.flat.srep): -7z(lzma2:d96m:a=1:mc=32:fb=32:mf=hc4): 3967mb_ram 16:40min 2.43gb | Decmp: cpu25% 1:50min -FA -m5(no custom string, same as 7z^): 1542mb_ram 6:19min 2.49gb | Decmp: cpu100% 38sec skip_cmp_data_test(on data_srt.flat.srep.7z): -7z(^): 8:09min -FA -m5: 16sec 7z_PPMd(data_srt.flat.srep): -d1g:word=32 46:00min 2.59gb -d1g:word=8 35:00min 2.60gb -d1g:word=4 32:40min 2.68gb -d256m:word=4 24:20min 2.66gb Then I srep -m3f each, both resulted in same size of 4.49gb. I use new srep64 that have options up to -m5f btw. It is no surprise that both files are still of same size since srep have a full range seek. Finally I started compressing, FA -m5(-mc:4x4/4x4:lzma:lc8): on all files. This is same as -m5 except -lc8 option and dictionary defaults to 64mb instead 96mb. Read my previous post in this thread for more details if you like. You can see that without srep, sorting help a tiny bit, with srep compressed size was same regardless of sorting. This was only out of curiosity, from now we will focus on only one data, tarred by FA. data_srt.flat.srep took 5:15min to compress, to 2.46gb, and during decompression test it took only 38sec to decompress and utilized all 4 cores for it. More important, during compression it only needed ~1.2gb memory - this is with 4x4! Then I tried FA -mx. Without srep it did a tiny bit better, with srep size is almost the same, but took 4x more time to compress, used more memory, only ~2 cores during compression and during decompression. Now when you compare it with 7z under it, first with default -ultra settings, 7z used similar amount of memory, but took almost only half the time than FA -mx did(but still twice of FA -m5 with negligible gain!) and still compressed tiny better. Decompression despite LZMA2 is still only single core bound. Thats HORRIBLE. I only used FA -mx as a reference for explanation and comparison, it is not worth using -mx in FA. So yes, if I had to chose between FA -mx and 7z/xz, I would also chose the later. But FA -m5 with 4x4 is miles better and quicker, but lets continue first I will come to that later. Finally I used 7z with settings as was suggested: 7z(a -txz -mcrc=0 -mf=off -myx=0 -mmt=4 -m0=lzma2:d=64m:fb=16:a=0):. Lets only focus on data_srt.flat.srep from now on. And forget FA -mx as well. How does "this" 7z compare to FA -m5?: FA: -data_srt.flat.srep 1218mb_ram 5:15min 2.46gb | Decmp: cpu100% 38sec 7z: -data_srt.flat.srep 3199mb_ram 6:57min 2.57gb | Decmp: cpu25% 1:48min ^First compression times of 7z are much closer to FA -m5, so kudos to artag for finding better balance. But, 7z use ~2.5x more memory, is still slower, *compress worse*, and decompress slower! Yeah, really great benefit of having latest 64bit LZMA2. Maybe you say, "try LZMA's on same settings, dummy!". Sure, at your service: same_lzma2_settings(cpu100%, data_srt.flat.srep): -7z(lzma2:d96m:a=1:mc=32:fb=32:mf=hc4): 3967mb_ram 16:40min 2.43gb | Decmp: cpu25% 1:50min -FA -m5(no custom string, same as 7z^): 1542mb_ram 6:19min 2.49gb | Decmp: cpu100% 38sec ^Yes, 7z's LZMA2 is a tiny bit more efficient thus compress slightly better. That "better" is pretty much nothing though. Everything else is horrible. Its ~2.5x slower, use ~2.5x more memory and still decompress much slower because of 1t. Although perhaps decompressing can be fixed if you can chain it to 4x4 in FA, it doesnt answer for other deficiencies. 4x4 design and hyper optimization of internal LZMA is superior in every respect. But I am not done, lets talk about "skipping": skip_cmp_data_test(on data_srt.flat.srep.7z): -7z(^): 8:09min -FA -m5: 16sec ^In 7z I used same settings as suggested before, aka 7z(a -txz -mcrc=0 -mf=off -myx=0 -mmt=4 -m0=lzma2:d=64m:fb=16:a=0). I tried to compress already compressed 7z file. 7z doesnt even detect its own format and tries to compress it again and it took 8:09min, which is MORE than it took compressing original uncompressed srep file. FA -m5 did basically raw copy on the other hand. So... I dont know. Where are those myths about "superior" 7z/xz coming from? Its just horrible in every respect. Finally I tried PPMd of 7z just for comparison, it is not relevant to this post but maybe you will find it useful. I actually like PPMd of 7z quite a bit. It seems like a solid replacement of uharc and definitely quicker than PAQ's, although not as good. ---------------------------------------------------------------------------------------------------------------------------------------- In conclusion, FA -mx is truly not a good choice because it does not take advantages of what makes FA so great. It tries to be strongest option but gives up its 4x4 awesomeness and not only for compression but also decompression. And I believe it also does not skip compressed data like -m5 does which is another awesome trait of FA. In that case, yes I would also go with modern 64bit xz/7z. But, internal LZMA on -m5 settings and all mentioned benefits put both 7z and xz to shame. It is significantly quicker, use significantly less memory, decompress significantly quicker utilizing all 4 cores and compress only slightly worse, 2.4% in this case. 2.4% for ~2.5x better speed, ~2.5x less memory usage and full speed 4t decompression. To me 4x4:lzma design is so much better than LZMA2, the later feel like a lost opportunity to me. Lets hope LZMA3 get it right one day. On top of that, FA -m5 can truly skip uncompressed data, how cool is that? 7z does simply NOT, I am sorry! In the end, the only benefits of latest 7z\xz and their LZMA2 is 64bit, which is only good for bigger dictionary. However, dictionary beyond 64mb is IMO much less relevant especially if you srep data before it. Srep make the job so you dont need to overheat your LZMA. Think of it as a 2-pass filter compression. Internal LZMA is actually quicker and more efficient with memory than 7z/xz. It is clear Bulat took a lot of effort squeezing it. It should also be clear that I am in love with 4x4 and I really hate lzma2. But if you think decompressing on single thread is acceptable, go for it. I see FA as a state of art, highly efficient all-in-one package. No need to replace anything except srep for srep64. Certainly not 4x4:lzma with that horrible lzma2 of ill-optimized 7z/xz exe's. Memory argument and the need for bigger dictionary is void IMO. Again for you artag: Code:
7z(a -txz -mcrc=0 -mf=off -myx=0 -mmt=4 -m0=lzma2:d=64m:fb=16:a=0): -data_srt.flat.srep 3199mb_ram 6:57min 2.57gb | Decmp: cpu25% 1:48min FA -m5(-mc:4x4/4x4:lzma:lc8): -data_srt.flat.srep 1218mb_ram 5:15min 2.46gb | Decmp: cpu100% 38sec |
|
#34
|
|||
|
|||
|
Here we go
Code:
Original file(same as in past posts): 660mb LOLZ(default+mt4 = base setting): -: 242.73mb 7:18min +tt1: 244.16mb 5:47min (0.6% size gain, +24% speed gain) +tt1+mtt1: 241.39mb 6:02min !(better than mtt0 here) +tt1+al0: 244.46mb 5:47min +tt1+mc32: 244.17mb 5:50min +tt1+cm0: 247.37mb 5:52min +tt1+dto0: 244.39mb 5:24min +tt1+fba512: 244:13mb 6:09min +tt1+fba128: 244.15mb 6:03min +tt1+fba32: 244.22mb 5:42min +tt1+fba8: 244.76mb 5:52min +tt1+fba0: 244.12mb 6:26min +tt1+dm00: 243.63mb 5:49min +tt1+gm11 244.28mb 5:54min DLZ: 265.52mb DDS test: file1.dds(21.33mb) | ARC -m5(lzma) -> 7.29mb | ARC(&bmp) -> 6.46mb | DLZ-> 4.72mb | LOLZ -> 4.18mb file2.dds(5.33mb) | ARC -m5(lzma) -> 4.6mb | ARC($bmp) -> 4.34mb | DLZ -> 3.77mb | LOLZ -> 3.58mb BMP file(~5mb): arc -m5 -> ~2mb lolz -> ~1.5mb (+25% gain) You may ask about -mtt1 vs -mtt0. Here as you see, despite recommendations, -mtt1 gave better ratio. With -mtt0 after all you need at least 2x+ more dictionary size than block size, whereas in -mtt1 mode they can be both same. But -mtt1 have another advantage, it does what made FA -m5 so great: 4x4. And it goes further, its actually XxX fully scalable so its even better. With -mtt0 you will use same number of threads for both compression and decompression, but with -mtt1, only your system is a limit(and you can also use less for decmp than was used by packer). So lolz can decompress using all cores! Finally, thats what I call progress, unlike that crappy lzma2 replacement! Today, you pack a game with your 4 core CPU, but in 5-10 years you come back to it and unpack it on your new 32 core cpu and you will be able to utilize all cores. And *that* is AMAZING. As you see lolz is also better at compressing pictures, you may consider replacing old FA's -mm $bmp with lolz($bmp=lolz). Making it specialized at compressing textures, pictures and especially dds formats make it prime choice for packing games. But, not all is great. For starters it is significantly slower than FA -m5. Remember previous page tests: lzma lc8: 266.57mb - 42s This is the same file I keep testing on. Here it took almost 6 mins. Thats like.. I dont know, 8x+ slower. Compressing 22gb game took me around 3:40h from that pure lolz was around 3h I guess, decompression should be better though. Question is, is it worth it? 241.39mb vs 266.57mb = ~9.5%. That is level of zpaq I tested before. Pretty much 2 digit gain and I replicated same gain with full game compression where with FA -m5 I got it to ~9.9gb and with lolz to ~8.9gb. 10% is not bad and on some files like bmp 25% is even better. But its also slower. If you really wanted good replacement to internal FA's lzma, this may be it. Even I am going to use it, at least sometimes. But there is one more issue, lolz doesnt(to my knowledge) support <stdio>. Call me spoiled, but all those gigabytes of game tmp files to trash my disk is horrible. I could not get unpack cls to work yet thats another problem(for me). If they can make io to work(both for packer and unpack) that would be different story. Anyway, potential is there, this is one of the better compressors out there. Oh and I also included DLZ packer for reference. I think its great even though only single threaded. Specialized for dds and lagging only slightly behind lolz. Its only problem is that it came at the same time as lolz(to public), otherwise this could be what uharc once was. I was complaining in another thread for lack of dds compressors(msc didnt work) and now I got 2 at once, oh well .
Last edited by elit; 12-01-2018 at 18:25. |
|
#35
|
||||
|
||||
|
Quote:
![]() Keep up the good work, i am sure many people benefits from your informations.
__________________
Haters gonna hate
|
| The Following User Says Thank You to KaktoR For This Useful Post: | ||
elit (13-01-2018) | ||
|
#36
|
|||
|
|||
|
Today I additionally tried to compress whole game of 22gb(18gb after srep) with lolz to see effect of certain parameters:
Code:
+lolz:mt4 >> 8.98gb +lolz:mt4:tt1:mc32:mtt1 >> 9.07gb +lolz:mt4:tt1:mc32:mtt1:dto0:cm0 >> 9.49gb |
| The Following User Says Thank You to elit For This Useful Post: | ||
Simorq (13-01-2018) | ||
|
#37
|
|||
|
|||
|
^^^^
How much time it took though(in compression)?
__________________
NOT AVAILABLE |
|
#38
|
|||
|
|||
|
It was around 3h40min on my intel 4690k @4.2ghz. At least 40+min however was due to disk trashing because of no <stdio> in lolz. There were multiple steps for which each FA decided to copy new $tmp file. Thats why I could not measure decomp speed properly, it use disk even through archive testing. For this lack of <stdio> alone I may consider sticking to internal lzma until it get implemented in lolz.
Lolz on its own runs around ~2mb/s during compression on my rig - with 4 threads. Last edited by elit; 13-01-2018 at 14:49. |
|
#39
|
|||
|
|||
|
Quote:
__________________
NOT AVAILABLE |
|
#40
|
|||
|
|||
|
@78372 by any chance do you know the setting for Brotli in Zip-zstd broti MT ?
https://github.com/mcmilk/7-Zip-zstd/releases is it possible to use zstd/Brotli within 7z.dll/exe to compress and decompress with <stdin> <stdout>... |
|
#41
|
|||
|
|||
|
Quote:
__________________
NOT AVAILABLE Last edited by 78372; 14-01-2018 at 05:29. |
|
#42
|
|||
|
|||
|
but do you know setting for arc.ini does it work ?
[External compressor:7zip] header = 0 packcmd = 7za a -txz -an -mcrc=0 -mf=off -myx=0 -mmt=4 -m0=lzma2{ ption} -si -so <stdin> <stdout>unpackcmd = 7za x -txz -an -y -si -so <stdin> <stdout> *what to change to brotli ?? |
|
#43
|
|||
|
|||
|
Quote:
__________________
NOT AVAILABLE |
|
#44
|
|||
|
|||
|
Today I tried something different. Rather than focusing on general compressors, I decided to compare default multimedia compressor of FA vs alternatives. Specifically in this test, I was focusing on graphics .bmp files and default $bmp of FA which use grzip for this purpose. I compared it to similar alternatives suitable for graphics as well as more general compressors such as lolz.
Test consisted of around 1.3gb of .bmp files. First and most important thing, by default FA process each file separately. I tarred them into single file and compared default FA's $bmp algo how it cope with separated files vs 1 put together: original: 1.33gb -FA -mbmp on dir of files(internally each processed separately and put into archive = default behavior): 539mb From now all following results are on tarred single file: Code:
-FA -mbmp: 469.61mb !!! -bcm: 531.96mb -bsc: 518.54mb -bsc -m5: 518.86mb -dlz: 489.85mb -uharc -m3: 457.73mb -lolz: 451.42mb -FA -xppmd: 483.05mb Now, dlz and lolz re specialized on dds textures and there they kick ass ~10%(I already tried), but for common image formats, dont bother. Default FA's grzip rules, it is internal thus supporting pipelining so no disk trashing, it is extremely fast, multi-threaded and is just great overall. However, as you see in this test default behavior of processing images separately hurt *big time*. How to solve it? Simple, add rep before bmp: $bmp=bmp >> $bmp=5rep+bmp Then when you use something like: FA -m5/$bmp=5rep+bmp -mc:rep/srep... what will happen is that FA will tar and srep images first before applying grzip. And btw, adding rep or srep help even further, it doesnt hurt at all. |
|
#45
|
|||
|
|||
|
And finally, I tested audio(wav) part of mm compressor of FA. Here results were unexpected. Although FA apply tta compressor on each(any) wav file, results are not so clear as you will see.
But first, let me tell you that originally I tried to compare tta to flac. On specific game I tried(IL2 Sturmovik), flac refused to recognize file format even though it have RIFF header and is played fine in foobar2000. These wav's must have been compressed or something. On top of that, you cannot "tar" files and apply flac on it, but tta is able to. Because of issues and compatibility problems, I came to conclussion that it is not worth replacing internal tta with external compressor. But thats not all, I tried tta vs lzma and... you will see. But lets start from beginning. Original *.wav folder 44.72mb: FA -mwav: 43.92mb < Clearly not working Now tarred wavs to single file: Code:
FA -mwav: 42.84mb (small gain when tarred but not working properly) FA -mxlzma: 20.52mb !!! FA -mdelta+xlzma: 20.52mb FA -xlzma+tta: 20.52mb FA -m5rep+delta+lzma: 21.14mb FA -mtta+delta+xlzma: 42.84mb FA -mtta+xlzma: 42.84mb (trying all kind of order as you see) FA -mdelta+tta: 42.84mb Code:
11k16bitpcm.wav(298kb): FA -mtta: 203kb FA -mxlzma: 260kb FA -m5rep+tta: 203kb So this means: -do continue using internal tta but verify it first if your wav files are "standard" ones or not, test and compare with lzma! -this thing still need more testing to get right idea, I am particularly curious about combination of delta+tta on "real" wav's as well as effect of rep etc... EDIT: Since tta doesnt work properly on tarred wav's, this argument is done now, no need for rep or delta on top of it $wav. -(feel free to tar files together same way as $bmp I described above, like: $wav=5rep+wav for example(or better 0+tta because unlike $bmp here can be a slight loss of compression), tta is great in that unlike other codecs, it work on raw data regardless of extension. This could also help with some data packs that contain wav files inside as well as some audio banks.) EDIT: ^Do not tar wav files after all, even though it worked on my test dir, during actual game repack test it resulted in decompression error. Keep standard $wav=wav setting. Last edited by elit; 18-01-2018 at 04:44. |
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Return To Castle Wolfenstein info | the_fsr | PC Games | 0 | 01-04-2004 18:20 |