Cost vs return - aka worthiness debate [Archive]

View Full Version : Cost vs return - aka worthiness debate

elit

27-09-2017, 12:39

So, I have made my own repacks "couple" of times before. And one thing I repeatedly kept stumbling upon was how certain(often propagated) extreme compression methods were just not worth it. And I always wondered if I missed something or what, because in all my cases extreme compression methods that took like half day gained at most 1% or less than those that could finish job in 1-2h.

For example, packing Project CARS 2 with (freearc)srep:m5f+lzma:mx took I think ~5h and rendered about ~25.4gb from +-40gb. With srep:m3f+lzma:m5 it took about 1+1/2h and I got 25.68gb. And a lot of that time was probably disk bottleneck. There is a huge speed difference on FreeArc between -mx and -m5 as -m5 utilize all 4 cores and other speedups. I understand LZMA is more effective with 1-2 cores instead of breaking data to blocks, but I just dont see much benefits of it - at least if used with srep.

Another example like above was Ghost Recon Wildlands, with similar ridiculous ratio. In fact every single time I decided to "ok lets try on this one again", -mx option simply wasnt worth it. I saw bigger difference only if I did not used srep, then sometimes it really helped more(but not groundbreaking more either), but with anything chained from srep it just didnt do much better. In fact for one or 2 cases(I dont remember which games anymore), srep did the job and lzma gained 0%, you could pretty much just srep them and be done with it.

And if that wasnt enough, I tried to replace default FA's lzma, I tried srep+7z, then srep+(horribly-slow-at-max-settings)xz, and it wasnt worth it at all. Like, say 1.58gb with max xz vs 1.61gb with FA -m5. Thats nothing and one took at least hour+ to do it while other only around ~5min.
Oh and sometimes, FA's -mx gave me few kb bigger archive than -m5!
Hell I even tried zpaq -m4, that one gave better ratio when alone only(vs lzma without srep) but not better when with srep+zpaq or vs srep+lzma.

But I keep reading everywhere people like to chain srep with xz or 7z or whatever and I wonder... why?
With srep at least, even -m5f if fine and quick enough, but this extreme lzma BSDM fetish puzzle me.

------------------------------------------------------------------------------------------------------------------------------

So with that said I would like to know, if you have different experience and you tried that FA's -m5 to compare, can you tell me where you got that significant difference, at which game and how much better and with what parameters? What game can you confirm to see better ratio worth bragging about with srep+lzma:mx(xz,7z) vs srep+lzma:m5? Remember only with srep combined.

Thanks for discussion :).

ChronoCross

27-09-2017, 19:54

I think you've grown, and you're an adult.
Analyzing these things makes you realize how much time you've lost on things like this. LOL.
I tell you something, before going to work, I turn on the computer and start doing my backups - repack and when I go home they are ready.
I have an AMD and as it is known the cores work independently of each other as opposed to intel that reinforces their cores with each other. so I only use one core to compress. It takes a lot of time but good is the best option.
Finally the difference between m5 or mx on my PC results vary from 500mb to 800mb in games eg: 8gb> m5 = compressed in 2500mb ---- and ---- mx = compressed in 1800mb I have a big difference, but it always depends on the data to be compressed and the experience of each of the backups you have done, like wich tech or algo i need to compress this game.
so i choose mx instead m5.
I do not mind the compression time.
but the time of extraction if I care.
i like compression slow and faster unpack.
Good luck in the ilumination path.

Razor12911

27-09-2017, 20:16

Project CARS 2 is a bad example because the game is encrypted, what do you think all compression methods will do to high entropy data? Absolutely nothing, which is why there isn't much difference between weak and strong compression method but all in all you do have a point.

Edit:
Best thing you can run comparison is on data that's uncompressed because also Ghost Recon Wildlands, from what I can read, you compressed already compressed data, try precompressing game first then run test once more.

felice2011

27-09-2017, 23:23

If you want to save time, with fast scans of files in general to know the level of entropy, before you test for hours the methods sometimes ineffective on some files, http://www.fileforums.com/showthread.php?t=99070 & http://www.fileforums.com/showthread.php?t=99136. As the "INDEX - Conversion Tutorial Index" has not been updated ... For those who want something magical there is always this http://www.fileforums.com/showthread.php?t=99080, for sure they will be satisfied.;):D

Chayan Manna

27-09-2017, 23:46

For those who want something magical there is always this http://www.fileforums.com/showthread.php?t=99080, for sure they will be satisfied.;):D

Hehe, It's really a magical tool :p:D

elit

28-09-2017, 04:39

Finally the difference between m5 or mx on my PC results vary from 500mb to 800mb in games eg: 8gb> m5 = compressed in 2500mb ---- and ---- mx = compressed in 1800mb I have a big difference

Thanks a lot, but was that with srep combined?

Project CARS 2 is a bad example because the game is encrypted... Best thing you can run comparison is on data that's uncompressed...

Yes that is good advice, but remember original size was over 40+gb so even without that you can still compress it to almost half size - without decrypting. In that regards shouldnt -mx vs -m5 still show a visible difference? Its still a lot of gigs that were compressed after all.

Hehe, It's really a magical tool :p:D

Yeah, he have me for stupid :D

elit

28-09-2017, 04:48

compress a game with a64 and see yourself

A64 is an accelerator flag and if anything, higher number should provide worse compression(although in latest versions its about same as a1), but more specifically, it doesnt have anything to do with compression ratio it is to speed up process.

With that said, I dont have problem with srep at max at all its quick enough, it is lzma that makes me question.

elit

28-09-2017, 04:57

One thing i was considering to try is uharc(for games). I know it have a 2gb file limit though and its single core only. But I was thinking "maybe if I pipe it in freearc through <stdin><stdout>" and bind it into 4x4, then FA would run 4 instances on separate blocks(just like it does 4x4:lzma on -m5), would that work?

Something like:
-mc:4x4/4x4:64mb:uharc

packcmd=uharc {cmd} - - <stdin><stdout>
unpackcmd=uharc {cmd} - - <stdin><stdout>

Or like that, I wonder if that would give visibly better ratio than lzma since uharc is a multimedia cmp...?

Andu21

28-09-2017, 06:17

Interesting topic, curiosly enough i find myself in that same situation. You could try Uharc's cls made by Razor12911 as well http://fileforums.com/showthread.php?t=98005 iirc it overcomes the filesize limit. Regarding compression ratio i still have to find a game in which uharc beats lzma, maybe a precompressor is needed in order to do that or maybe a test in older games.

elit

28-09-2017, 08:32

You could try Uharc's cls made by Razor12911...
Thank you! I completely missed this one, man that guy is full of good surprises :D
Interestingly he use 0.6a in his package, I am pretty sure latest version was 0.6b. Will try to replace exe with newer it should work.

elit

28-09-2017, 10:15

Alright since you people may be interested in this, I tried uharc0.6b at various settings and also FA with -m5 and -mx and additionally 7zip:ultra for comparison. This was done on random preprocessed 660mb .forge file(from its original 448mb, using ztool). This is of course only single small test and doesnt have enough data to prove, but it does reflect with my other tests and show qualities and flaws well enough IMO.

On uharc these additional options were used: -md32768 -mm+
Then I tested 3 its main methods each: mx, mz, m3.

Results:
original: 660mb
uharc -mz: 317.22mb
uharc -mx: 290.79mb
uharc -m3: 284.27mb
7z -ultra: 271.37mb
FA -mx: 271.46mb
FA -m5: 272.59mb

FA -m5 was absolutely quickest by significant margin to any other test, while resulting in only slightly bigger archive - specifically 0.4145% difference from FA -mx. Halo? Anyone? :D
Ehm, anyway, for uharc except -mz all others were very slow, binding them to 4x4(if possible) would help but my guess is they would only match FA -mx speed then. FA -mx was still quicker than uharc(except -mz) but its true that FA use 2 cores while uharc used only 1.

7zip was great no doubt !but!, first it is slow as hell despite using all 4 cores and second, it used over 3gb+ memory for this single 660mb file(!!!). FA was much more rational with memory usage on both -m5 and -mx(1gb at peak and even shrinked during compression). Uharc had best memory usage, in -mx mode it used only 50mb(!), this would put him as a good candidate to chain it with 4x4 and srep could then pipe directly to 4xuha:ppm for only ~200mb usage without waiting for itself to finish 1st cycle.

Anyway, between uharc -mx and FA -m5 was still 6.67% difference in favor of FA -m5!

In conclusion, this small test once again confirmed my thoughts, i.e. dont bother with anything higher than FA -m5, especially with ztool/srep, its really good enough and you are likely wasting your time with anything higher. Bulat has it figured out for us and gave us best and most optimized compression tool ever made. FA -m5 rule.

EDIT: I just tried MCM compressor with -m9 option that I was PM'd, 271.08mb + taking ages = not worth it

78372

28-09-2017, 10:58

Joe Forster/STA

28-09-2017, 11:28

I think you've grown, and you're an adult. Analyzing these things makes you realize how much time you've lost on things like this.

Very wisely said, I cannot agree more! However, if I may add something, this time was not completely lost if you learnt something new, interesting during those experiments, detours.

elit

28-09-2017, 13:05

May I ask, why are you looking for internal compression methods while we can use srep+lzma which is better. For me, I just use srep:m3f+lzma which us actually good who wants to spend less time and resources while compressing. You want good difference between mx/m5 vs srep+lzma, then you can try compressing a big game without compressed streams like far cry 4

But lzma is also internal compression of FreeArc. What I was questioning is people going extra efforts to replacing internal LZMA with external LZMA's which, at least in my limited tests did not proved to be any significantly better and in fact were pretty much always(significantly) slower.

Now, in case you meant why to use freearc at all and not just piped external command tools(like srep+xz in command like for example), I was thinking about it in the past and really like the idea of own, clean tool chain on cmd, but freearc still offer other things, like groups, arc universal format regardless of replaced tools, sorting, UI, specific optimizations like exe, bmp and wav, skipping of compressed data and so on. Would really simple srep+lzma be better than srep+freearc with its many other advantages? Or am I missing here something?

78372

28-09-2017, 20:08

As you say of external, a major benefit of using external is removing 32bit limitation. srep+lzma is always better than srep+mx or soo, if you are not using old srep versions as srep 1.5. Yesterday I tested this on a 513mb sample file. "513mb" so rep can on work on it instead of srep. First I applied mx and got 454MB on that file. Then I used srep+lzma, which didn't took much time and gave me 452mb. These 513mb files got some executables, some texts and some other files and clearly fa's grouping didn't benefited me anyways. srep+lzma is not always good, if you want multimedia and other detection and use algorithms as groups, you must use masked data compression. You can find it here (http://fileforums.com/showthread.php?t=97530)

elit

29-09-2017, 07:58

That was indeed idea that appealed to me, thats why I tried that 64bit xz/7z in the past, instead of internal LZMA. I loved 64bit idea. But, it wasnt useful, it was slower, used more memory and did not gave me much better ratio. I think internal LZMA of FA is better than people give it credit for. I think they underestimate it because its only 32bit but I would not be surprised if Bulat applied his own extra tricks on it.

Btw believe or not but not long ago I srep+fa:m5 one ~360mb game. Then I used only -m5 with internal rep but everything else same. For some reason, the one with srep got smaller. Not by much but still, and it wasnt even m5f, I only used m3f. Go figure...:rolleyes:

As for masked compressor, I know about it I saw the page before. Looks great but I dont see much point, is it replicating what FA with GUI already have? Or is there anything it can do that FA cant in terms of better compression?(btw I dont like game installers, I like to archive into single archive format)

PS:(I was PM'd again regarding RAZOR Compressor. I know about it and tried it before. Back then when I tried it was extremely slow and only marginal gain but, later I will update post here to add it to that benchmark for completion, so stay sharp ^_^.)

1234567890123

29-09-2017, 13:20

I was PM'd again regarding RAZOR Compressor. I know about it and tried it before. Back then when I tried it was extremely slow and only marginal gain but, later I will update post here to add it to that benchmark for completion, so stay sharp ^_^.
the reason of the use razor is installation time.it's very fast there with better ratio
waiting your benchmarks :D

elit

29-09-2017, 14:59

Greetings again everyone, so I just finished RAZOR compressor results, but I also threw in zpaq on -m4 and -m5 settings for comparison. I always had certain sympathy for zpaq and its family compressors, but I never fount it worth the time wasted. There is a significant difference in ratio(and of course, time) between -m4 vs -m5 but on that later. So lets start.

RAZOR Compressor: 264.55mb That is 2.95% or: 3% better ratio(vs 272.59mb FA -m5)
3% is like, around ~2gb better savings on 60gb game. Sure 2gb less is nice but in the grand scheme of such big data... not really. Or rather, it would be if it wasnt for... time:
FA -m5: 41s
razor: 10min(pffff)

Thats almost 15x slower, so if you compress 60gb game(and remember with precompressing it will actually be around 90+gb), if FA -m5 took ~2h which is typical, you are going to need around 30h(more than full day) with razor. Now, since razor is not multithreaded(used about 33% on 4 core cpu so like 1 1/2), maybe if you 4x4 bind it on FA if possible then that is more reasonable. In that case you could probably get around ~8h - still 4x slower but at least more reasonable, you could let this over night and be fine next morning.

At this point its up to your own judgment if 3% is worth it. For me its not but if other type of data can get better ratio AND if you can 4x4 it(which I am almost sure you cant yet), then maybe. I think its not possible to 4x4 it because razor only work on files not stdin/out and i think thats needed for 4x4. Again, if somebody know better please let me know.

Now to Zpaq:
zpaq -m4: 284.51mb = sucks, 4x slower and worse than old FA -m5 lzma lol, but
zpaq -m5: 246.81mb

Thats ~10% better vs FA -m5! Now we are talking, not with those single digit garbage differences! And hell even that is for my spoiled brain not much but at least 2 digits difference! For this particular file, zpaq -m5 beat razor compressor but keep in mind this:
- zpaq already utilized all 4 cores so no more speedup and...
- it took 12min and...
- it will take same amount of time to decompress, which mean you decide today to play 60gb game you will play it tomorrow at soonest = no thanks

Conclusion from this:
Zpaq is out of game despite being best winner with visibly better ratio margin(for this particular file), no need to talk about paq family in general anymore.
FreeArc -m5 is still the best overall, the only potential replacement worth bragging about could be razor compressor, but not in current state if you cannot 4x4 it. And if you can, let me know but even then its a question of whether ~3[-5?]% gain is worth 4x more compression time. For current single threaded 15x s-l-o-w-n-e-s-s certainly not - not to me at least.

TL;DR:
Keep using FreeArc -m5 with srep64+ztool and stop wasting your time. You have life to live. :D Thats it.

And so I am done for now. Take this thread for what you will, I hope it was somewhat helpful to you guys despite its limitations. Do feel free to continue discussion though, one never know what can be discovered down the road ^_^.

elit

30-09-2017, 18:41

Today I tried well acclaimed zcm from encode.ru at max settings: -m8 -s -t1. It resulted in ~100mb bigger archive than FA -m5 lol. I also tried to tune zpaq(with x, s options that provide more possibilities) to see if I can get ti to more reasonable speed, while keeping good ratio. I was never able to even match FA -m5 with custom setting, anything other than -m5 was no go.

Anyway, since lzma seems like a best option, I decided to test its various settings to see if I can squeeze more from it while maintaining speed. I was inspired by one user in this forum. For this test I used only 16mb dictionary because I wanted avoid FA throttling when it reach memory limit(it wont 4x4 in such cases). But once worthy parameters are found I test them with default rest against previous posts references. I will also refer compression times.

LZMA have multiple parameters, I tested most important ones: mc, lc, bt4, fb. Later also dictionary for comparison. Fb is also known as "word size" in 7z and it is the one that almost everyone set to max 273. In FA it is only referred as a number without letter in lzma parameters. Ok lets start.

Default lzma16mb:
lzma d16mb: 274.18mb - 27s (rest default: mc32,non-bt4, 32[fb], lc3)

Testing mc:
lzma d16mb mc128: 273.89mb - 58s
lzma d16mb mc1000: 273.78mb - 4.53min

^mc have a huge(multi fold) impact on speed while not giving better ratio by almost nothing - aka 0.1%!

Next I tried fb aka "word":
lzma d16mb 64: 273.97mb - 33s
lzma d16mb 128: 273.85mb - 42s
lzma d16mb 273: 274.07mb - 1min

^between default 32 and 128 is only 0.12% difference while compression time increased 55%. With 273 ratio actually worsened. However, this 273 is also affected by dictionary size, for comparison lzma:273:128mb resulted in 272.03mb while lzma:32:128mb resulted 272.39mb. Still gain is almost nothing but, one took 40s while another 2min!

Testing bt4:
lzma d16mb bt4: 273.36mb - 36s

^this parameter gained 0.3% while slowing packing down by 33%. Meh.

Dictionary 96mb(vs 16mb reference above):
lzma 96mb: 272.59mb - 42s

^0.6% gain, 55% time increase and a memory hog.

Finally, lc:
lzma d16mb lc8: 268.22mb - 28s

^yep, ~2.2% gain for F.R.E.E.

So to conclude, the only option worth changing from default was lc. Now lets use that with default settings and compare with previous posts results(-mc4x4/4x4:lzma:lc8):

lzma lc8: 266.57mb - 42s

^but something funny is here, FA defaults to 64mb dictionary, but in -m5 it use 96mb. Lets enforce it:
lzma 96mb lc8: 266.77mb - 37s

^dont ask me why bigger dictionary resulted in slightly bigger file and lower compression time, I just dont...

Conclusion:
The only parameter worth bragging about was lc, which can give you nice 2.2% for free. Rest I would recommend to leave default. The cost/return seems not worth it but maybe you encounter different scenario. With additional 2.2% gain, FA -m5 now not only canceled scary 2 digit(10%) gain of zpaq(back to single digit), it almost matched razor compressor! Which gave 264.55mb or ~3% gain. But at what cost! Razor is now only ~0.75% better while being 15x slower. FitGirl you are c-r-a-z-y :D.

And so, default FA -m5, now with lc8 extra tunning is even more appealing to me. Hopefully other people will find this helpful too.

elit

01-10-2017, 17:47

bcm -b64mb: 323.64mb - 1.30min
bcm -b512mb: 324.26mb - 1.40min

elit

02-10-2017, 14:09

RAR5:
rar -m5 -ma5 -md96m: 328.11mb - 32s
rar -m5: 315.53mb - 47s

artag

31-12-2017, 19:44

since we are talking about decent space gains vs reasonable compression time:

Using 7zip in replace of internal compression, gives me a good decompression time, because
how it handles uncompressable blocks. Also, it gives me some times better compression than
internal LZMA (because it does not add overhead to uncompressable blocks, try with mkv/mp4 file
and you'll see).

To all the experienced users: what switches do you use for improving compression
time to sane levels (and lose some mb but stay practicall).

I'm talking about internal LZMA and 7zip LZMA.

Also, any advice for a (stable and tested) substitute to lzma(2)?

oodle?
the new lolz?

thanks

elit

01-01-2018, 17:22

Using 7zip in replace of internal compression, gives me a good decompression time, because
how it handles uncompressable blocks. Also, it gives me some times better compression than
internal LZMA (because it does not add overhead to uncompressable blocks, try with mkv/mp4 file
and you'll see).

Unless something changed, its exactly opposite in my experiences. 7zip never looked like skipping uncompressable data(but maybe it changed now?) to me and was much slower than FA -m5, only FA -mx seemed like to not skip. FA -m5 is the quickest LZMA I know, skipping everything that is compressed - the jumps it make during compression are very, very visible. Thats what arc.groups file is also for btw.

And for decompression, 7zip cannot even compare, it is using only single thread while FA can use 4 threads(if you compressed with -m5). Good luck beating that with 7zip.

78372

01-01-2018, 18:24

7z LZMA2 skips incompressible data and can do multithreading better.
In my case I don't use multithreading and because of my poor pc, I just can't use bigger dictionary so I use srep and afterwords xz(lzma2) with preconfigured options(9x). It is not that slow but ratio is good.

artag

01-01-2018, 21:40

Ok, it seems there is a missunderstanding here.

I don't use the builting methods of FA. Those are masked (group) compression and some other pre-built stuff. If it works for you, be happy.

I'm talking about something else.

I have mi own compression chains. Usually, people use somethin like

pzlib+srep+lzma for game assets compression (or 3d scenes/models in my case)

lzma on FA only use two cores and has some speed issues during decompression.

7zip LZMA2 can use more than 2 cores and it has increased speed during compression,

also during decompression thanks to skipped uncompressable blocks.

It gives me faster compression and decompresion and is a good substitute for internal

LZMA, specially if time is a concern. Even if you use less agressive settings (because of time) it its a better feature - vs - size compared to internal lzma.

I know about 4x4 lzma, but the memory scales too much during compression if i use a larger
block, with 7zip i can use 2 or 3 threads and save memory if i want larger blocks.

My question to experienced users (or repackers?) is if there is a better candidate than
7zip. (Even if i sacrifice some MB, better compression / decompression speed is desirable).

you can use 7zip as external like this:

[External compressor:7zip]
header = 0
packcmd = 7za a -txz -an -mcrc=0 -mf=off -myx=0 -mmt=4 -m0=lzma2{:option} -si -so <stdin> <stdout>
unpackcmd = 7za x -txz -an -y -si -so <stdin> <stdout>

Set -mmt=4 to 3 or 2 if it's eating more memory that you have or you wat larger blocks.

invoke it like this

7zip:d=512m or 7zip:d=64m:fb=16 or 7zip:d=64m:fb=16:a=0

pzlib+srep+7zip ,etc

etc
less d=, less memory, or reduce mmt.

use task manager to monitor the exe

<stdin> <stdout> works ok on my end, even with inno setup

good luck

78372

01-01-2018, 23:57

Here is the XZ I typically use instead of 7z xz. It also supports stdio, and you have some predefined levels of compression. It is only ST.

Gupta

02-01-2018, 04:40

elit

02-01-2018, 07:37

In repacking we compress once and decompress many times ~ the answer for all ur arguments
RZ takes more time for compression but decompress faster then lzma with compression benefit ofc same goes for lolz srep etc

RZ take *way* too long and results are diminishing, I tried it again few days ago this time on ~12+gb game(with decompiled/decompressed resources). Srep even hurt it but also without, gain was like nothing. RZ gave me 6.4gb without srep and 6.5gb with srep, while FA -m5+srep gave me around 6.6-6.7gb. FA packed it in few minutes while RZ took night. Frankly I dont get all the fuss about RZ and the like, its a slow garbage with diminishing returns, srep + non-aggressive lzma can come so close to it for under 3% loss most of the time.

He also said "Even if i sacrifice some MB, better compression / decompression speed is desirable" so no it does not answer his arguments. Furthermore, while I agree that decompressing speed is more important, I dont agree about its exclusivity. Unless you are repacker who upload your repacks to the world, chances are you are likely only going to decompress on occasion. But wasting day and night compressing for marginal gain is crazy.

Ok, it seems there is a missunderstanding here. I don't use the builting methods of FA.
lzma on FA only use two cores and has some speed issues during decompression.
7zip LZMA2 can use more than 2 cores and it has increased speed during compression,
also during decompression thanks to skipped uncompressable blocks.
I know about 4x4 lzma, but the memory scales too much during compression if i use a larger
block, with 7zip i can use 2 or 3 threads and save memory if i want larger blocks.

you can use 7zip as external like this:

[CODE][External compressor:7zip]

I see, first thank you for config snippets, when I have time I will definitely give it try. With that said, when you say lzma on FA only use 2 cores its again flag dependent. -mx use only 2, -m5 use 4. Or more precisely, 7z use lzma2 while FA use 4xlzma1(or 2xlzma1 as one can use 2 cores I believe). For decompression again I cant see 7zip beating FA because regardless how you compress with 7z it only use 1 core during decompression from what I know. Sure its better now if it can skip through uncompresseable data but so can do FA and with 4 cores in use. Also, I cant see how FA use "too much memory" in 4x4 mode when it is known to have a hardcoded limitation(its only 32bit after all). FA use around same much memory in -mx and -m5, in -mx it allocate 176mb dictionary+2 cores, while in -m5 96mb dict+4x4.

I think what you are trying to say is that in same 2core compression mode, 7z is better than FA. Which I think is true, but it was your decision not to use -m5(or any build in methods as you mentioned). I have made many tests outside this topic, I never saw once srep+FA -mx having any significant gain over srep+FA -m5. Without srep that would be different though. I will try again srep+7z with your settings though.

7z LZMA2 skips incompressible data and can do multithreading better.
Will try it how well it "skips", though multithreading as mentioned already is only good for compression. There are people here willing to waste days on marginal compression, saying decompression speed matter most, yet many of them have no problem with limiting single threaded decompression of 7z. Go figure. Me it bothered a lot when I used 7z in the past, before I discovered FA.

Gupta

02-01-2018, 08:20

LoL{no offense} answer will be here when i have time In the mean time can you edit ur post with the srep flags u use

I don't waste days and on decompression... Have my own wrapper for multithreaded decompression { hint: if you don't know programming u can use an inbuilt program for multithreaded decompression}

I don't think it "skips"

elit

02-01-2018, 17:13

In the mean time can you edit ur post with the srep flags u use

Have my own wrapper for multithreaded decompression

I don't think it "skips"

- I just used normal -m3f on the latest srep(which have m1f-m5f range)

- RZ doesnt even support stdio and its not an open source. Are you telling me you can "stream" from multiple parallel RZ's into single archive output??

-Indeed it doesnt, I just tested, will be part of my next big "data" comment. I dont know where are these guys getting such a false info.

artag

02-01-2018, 22:10

lzma2 does skip uncompressable blocks, stated by the author

https://sourceforge.net/p/sevenzip/discussion/45797/thread/2f6085ba/

simple tests:

take an uncompressable mp4 file, rar , mkv or something like that

compres with lzma:256m and 7zip:d=256 (set mnt to two if you want to test
both algorithms on same conditions)

do some tests with different uncompressable files

the lzma will compress very little and sometimes will be bigger (because
overhead added to not compressed blocks)

lzma2 will be smaller and on the worst case only a few klobytes larger

do an extraction test of the compressed files, lzma2 is faster (uncompressed
blocks are just copied to output)

and finally, this is a test on single compression algorithms, if you use one of the
built-in methods on FA, you are combining a chain of algorithms/tools, that is another
different thing. That's not what i'm talking about

yes i have a four core cpu, if you invoke 4x4:lzma:256m on quad core or i7, well
32 bit fa will fall beyond memory limits (i'm aware of 64bit fazip but i haven't tested it yet)

lzma on FA and lzma2 on 7zip is single thread decompression, cpu usage confirms that, also author

https://sourceforge.net/p/sevenzip/feature-requests/1095/

what is rz? :)

any coments of somebody with experience with zstd or lzham (combined with FA)

http://www.fileforums.com/showthread.php?p=465464#post465464

quick compare (single mkv file, expect better results with something with mixed compressable-uncompressable blocks like game assets or 3d models-scenes):

http://i65.tinypic.com/2a95nyp.png

elit

03-01-2018, 06:19

Dear artag, first thank you for the pics. But I think we need to get deeper into this.

In your first link, Igor does mention that LZMA2 is better for already compressed data: "Also LZMA2 is better than LZMA, if you compress already compressed data."
However it does not imply skipping of the compressed chunks and your results actually prove it. It may imply better operation with said data.

Specifically, your data show that FA LZMA is in fact almost twice as quick as 7z LZMA2 on same thread counts! I assume here that since on FA you used only LZMA and I dont see any 4x4 there, that FA only used 2 threads. That would compare to second pic with 7z LZMA -mmt=2 output, not the last one that take advantage of more cores.

Also, while newer LZMA2 does give better ratio with compressed data - implying more efficient work, it still does not have anything to do with "skipping". If you want to see what true "skipping" is, compress said file with default FA -m5 parameter. Fa -m5 skip for real, invoking raw disk copy, you would not see 6mb/s like in your pic but almost full disk speed(and not because more cores being used). What you are showing have nothing to do with real data skipping, everything to do with more efficient operation and author also did not said anything like that. You misread his comment.
I already have data I collected yesterday for my next big post in this thread, you will see the difference. Soon...

PS(Btw RZ is this: https://encode.ru/threads/2829-RAZOR-strong-LZ-based-archiver)

elit

03-01-2018, 14:20

Alright everyone here is my promised test. This time I used extracted game pack "Data00.packed" from the game "Raiders of the Broken Planet - Wardog Fury". Unpacked resource is of 5.33gb size and contain many thousands of different files, including txt, lua scripts, dds, cm3, msh, fb, fsb, json, actclass, actgroup and whatnot. It does not contain any compressed ogg audio which I wanted to avoid. First I am going to dump all results and then go through it step by step:

data(_srt).flat 5.33gb
data.flat.srep 4.49gb
data_srt.flat.srep 4.49gb

FA -m5(-mc:4x4/4x4:lzma:lc8):
-data.flat 1144mb_ram 5:10min 2.51gb
-data.flat.srep 1143mb_ram 5:10min 2.46gb
-data_srt.flat 1226mb_ram 5:20min 2.49gb
-data_srt.flat.srep 1218mb_ram 5:15min 2.46gb | Decmp: cpu100% 38sec

FA -mx:
-data_srt.flat 1884mb_ram 23:28min 2.46gb
-data_srt.flat.srep ^1950mb_ram 22:35min 2.45gb | Decmp: cpu25% 2:22min

7z -ultra(d:64m:word=64):
-data_srt.flat.srep 2158mb_ram 12:03min 2.41gb | Decmp: cpu25% 1:48min

7z(a -txz -mcrc=0 -mf=off -myx=0 -mmt=4 -m0=lzma2:d=64m:fb=16:a=0):
-data_srt.flat 3155mb_ram 7:32min 2.59gb
-data_srt.flat.srep 3199mb_ram 6:57min 2.57gb | Decmp: cpu25% 1:48min

same_lzma2_settings(cpu100%, data_srt.flat.srep):
-7z(lzma2:d96m:a=1:mc=32:fb=32:mf=hc4): 3967mb_ram 16:40min 2.43gb | Decmp: cpu25% 1:50min
-FA -m5(no custom string, same as 7z^): 1542mb_ram 6:19min 2.49gb | Decmp: cpu100% 38sec

skip_cmp_data_test(on data_srt.flat.srep.7z):
-7z(^): 8:09min
-FA -m5: 16sec

7z_PPMd(data_srt.flat.srep):
-d1g:word=32 46:00min 2.59gb
-d1g:word=8 35:00min 2.60gb
-d1g:word=4 32:40min 2.68gb
-d256m:word=4 24:20min 2.66gb

Ok so first I tarred all files into single file, I used both 7z(data.flat) and FA(data_srt.flat). FA used special sorting with GES(group-extension-size) format. This is only to demonstrate grouping and sorting effectiveness for our case. Obviously both files have same size as extracted directory, 5.33gb.

Then I srep -m3f each, both resulted in same size of 4.49gb. I use new srep64 that have options up to -m5f btw. It is no surprise that both files are still of same size since srep have a full range seek.

Finally I started compressing, FA -m5(-mc:4x4/4x4:lzma:lc8): on all files. This is same as -m5 except -lc8 option and dictionary defaults to 64mb instead 96mb. Read my previous post in this thread for more details if you like. You can see that without srep, sorting help a tiny bit, with srep compressed size was same regardless of sorting. This was only out of curiosity, from now we will focus on only one data, tarred by FA.

data_srt.flat.srep took 5:15min to compress, to 2.46gb, and during decompression test it took only 38sec to decompress and utilized all 4 cores for it. More important, during compression it only needed ~1.2gb memory - this is with 4x4!

Then I tried FA -mx. Without srep it did a tiny bit better, with srep size is almost the same, but took 4x more time to compress, used more memory, only ~2 cores during compression and during decompression.
Now when you compare it with 7z under it, first with default -ultra settings, 7z used similar amount of memory, but took almost only half the time than FA -mx did(but still twice of FA -m5 with negligible gain!) and still compressed tiny better. Decompression despite LZMA2 is still only single core bound. Thats HORRIBLE.
I only used FA -mx as a reference for explanation and comparison, it is not worth using -mx in FA. So yes, if I had to chose between FA -mx and 7z/xz, I would also chose the later. But FA -m5 with 4x4 is miles better and quicker, but lets continue first I will come to that later.

Finally I used 7z with settings as was suggested: 7z(a -txz -mcrc=0 -mf=off -myx=0 -mmt=4 -m0=lzma2:d=64m:fb=16:a=0):. Lets only focus on data_srt.flat.srep from now on. And forget FA -mx as well. How does "this" 7z compare to FA -m5?:
FA: -data_srt.flat.srep 1218mb_ram 5:15min 2.46gb | Decmp: cpu100% 38sec
7z: -data_srt.flat.srep 3199mb_ram 6:57min 2.57gb | Decmp: cpu25% 1:48min

^First compression times of 7z are much closer to FA -m5, so kudos to artag for finding better balance. But, 7z use ~2.5x more memory, is still slower, *compress worse*, and decompress slower! Yeah, really great benefit of having latest 64bit LZMA2.

Maybe you say, "try LZMA's on same settings, dummy!". Sure, at your service:
same_lzma2_settings(cpu100%, data_srt.flat.srep):
-7z(lzma2:d96m:a=1:mc=32:fb=32:mf=hc4): 3967mb_ram 16:40min 2.43gb | Decmp: cpu25% 1:50min
-FA -m5(no custom string, same as 7z^): 1542mb_ram 6:19min 2.49gb | Decmp: cpu100% 38sec

^Yes, 7z's LZMA2 is a tiny bit more efficient thus compress slightly better. That "better" is pretty much nothing though. Everything else is horrible. Its ~2.5x slower, use ~2.5x more memory and still decompress much slower because of 1t. Although perhaps decompressing can be fixed if you can chain it to 4x4 in FA, it doesnt answer for other deficiencies. 4x4 design and hyper optimization of internal LZMA is superior in every respect.

But I am not done, lets talk about "skipping":
skip_cmp_data_test(on data_srt.flat.srep.7z):
-7z(^): 8:09min
-FA -m5: 16sec

^In 7z I used same settings as suggested before, aka 7z(a -txz -mcrc=0 -mf=off -myx=0 -mmt=4 -m0=lzma2:d=64m:fb=16:a=0). I tried to compress already compressed 7z file. 7z doesnt even detect its own format and tries to compress it again and it took 8:09min, which is MORE than it took compressing original uncompressed srep file. FA -m5 did basically raw copy on the other hand. So... I dont know. Where are those myths about "superior" 7z/xz coming from? Its just horrible in every respect.

Finally I tried PPMd of 7z just for comparison, it is not relevant to this post but maybe you will find it useful. I actually like PPMd of 7z quite a bit. It seems like a solid replacement of uharc and definitely quicker than PAQ's, although not as good.

----------------------------------------------------------------------------------------------------------------------------------------

In conclusion, FA -mx is truly not a good choice because it does not take advantages of what makes FA so great. It tries to be strongest option but gives up its 4x4 awesomeness and not only for compression but also decompression. And I believe it also does not skip compressed data like -m5 does which is another awesome trait of FA. In that case, yes I would also go with modern 64bit xz/7z.

But, internal LZMA on -m5 settings and all mentioned benefits put both 7z and xz to shame. It is significantly quicker, use significantly less memory, decompress significantly quicker utilizing all 4 cores and compress only slightly worse, 2.4% in this case. 2.4% for ~2.5x better speed, ~2.5x less memory usage and full speed 4t decompression. To me 4x4:lzma design is so much better than LZMA2, the later feel like a lost opportunity to me. Lets hope LZMA3 get it right one day. On top of that, FA -m5 can truly skip uncompressed data, how cool is that? 7z does simply NOT, I am sorry! In the end, the only benefits of latest 7z\xz and their LZMA2 is 64bit, which is only good for bigger dictionary. However, dictionary beyond 64mb is IMO much less relevant especially if you srep data before it. Srep make the job so you dont need to overheat your LZMA. Think of it as a 2-pass filter compression. Internal LZMA is actually quicker and more efficient with memory than 7z/xz. It is clear Bulat took a lot of effort squeezing it. It should also be clear that I am in love with 4x4 and I really hate lzma2. But if you think decompressing on single thread is acceptable, go for it.

I see FA as a state of art, highly efficient all-in-one package. No need to replace anything except srep for srep64. Certainly not 4x4:lzma with that horrible lzma2 of ill-optimized 7z/xz exe's. Memory argument and the need for bigger dictionary is void IMO. Again for you artag:
7z(a -txz -mcrc=0 -mf=off -myx=0 -mmt=4 -m0=lzma2:d=64m:fb=16:a=0):
-data_srt.flat.srep 3199mb_ram 6:57min 2.57gb | Decmp: cpu25% 1:48min
FA -m5(-mc:4x4/4x4:lzma:lc8):
-data_srt.flat.srep 1218mb_ram 5:15min 2.46gb | Decmp: cpu100% 38sec

Best regards.

elit

12-01-2018, 18:08

Here we go
Original file(same as in past posts): 660mb

LOLZ(default+mt4 = base setting):
-: 242.73mb 7:18min
+tt1: 244.16mb 5:47min (0.6% size gain, +24% speed gain)

+tt1+mtt1: 241.39mb 6:02min !(better than mtt0 here)

+tt1+al0: 244.46mb 5:47min
+tt1+mc32: 244.17mb 5:50min
+tt1+cm0: 247.37mb 5:52min
+tt1+dto0: 244.39mb 5:24min
+tt1+fba512: 244:13mb 6:09min
+tt1+fba128: 244.15mb 6:03min
+tt1+fba32: 244.22mb 5:42min
+tt1+fba8: 244.76mb 5:52min
+tt1+fba0: 244.12mb 6:26min
+tt1+dm00: 243.63mb 5:49min
+tt1+gm11 244.28mb 5:54min

DLZ: 265.52mb

DDS test:
file1.dds(21.33mb) | ARC -m5(lzma) -> 7.29mb | ARC(&bmp) -> 6.46mb | DLZ-> 4.72mb | LOLZ -> 4.18mb
file2.dds(5.33mb) | ARC -m5(lzma) -> 4.6mb | ARC($bmp) -> 4.34mb | DLZ -> 3.77mb | LOLZ -> 3.58mb

BMP file(~5mb):
arc -m5 -> ~2mb
lolz -> ~1.5mb (+25% gain)
Ok so first of all, lolz is very stable in both output speed and compression. There doesnt seem to be anything one can do to significantly change neither. Best speed compression I was able to achieve was with combination of options but it wasnt much worth it even then. Biggest variable was -tt parameter. I personally recommend -mc32 -tt1 -mtt1 -mt(#) as a best set overall.

You may ask about -mtt1 vs -mtt0. Here as you see, despite recommendations, -mtt1 gave better ratio. With -mtt0 after all you need at least 2x+ more dictionary size than block size, whereas in -mtt1 mode they can be both same. But -mtt1 have another advantage, it does what made FA -m5 so great: 4x4. And it goes further, its actually XxX fully scalable so its even better. With -mtt0 you will use same number of threads for both compression and decompression, but with -mtt1, only your system is a limit(and you can also use less for decmp than was used by packer). So lolz can decompress using all cores! Finally, thats what I call progress, unlike that crappy lzma2 replacement!
Today, you pack a game with your 4 core CPU, but in 5-10 years you come back to it and unpack it on your new 32 core cpu and you will be able to utilize all cores. And *that* is AMAZING.

As you see lolz is also better at compressing pictures, you may consider replacing old FA's -mm $bmp with lolz($bmp=lolz). Making it specialized at compressing textures, pictures and especially dds formats make it prime choice for packing games.

But, not all is great. For starters it is significantly slower than FA -m5. Remember previous page tests:
lzma lc8: 266.57mb - 42s
This is the same file I keep testing on. Here it took almost 6 mins. Thats like.. I dont know, 8x+ slower. Compressing 22gb game took me around 3:40h from that pure lolz was around 3h I guess, decompression should be better though. Question is, is it worth it? 241.39mb vs 266.57mb = ~9.5%. That is level of zpaq I tested before. Pretty much 2 digit gain and I replicated same gain with full game compression where with FA -m5 I got it to ~9.9gb and with lolz to ~8.9gb. 10% is not bad and on some files like bmp 25% is even better. But its also slower. If you really wanted good replacement to internal FA's lzma, this may be it. Even I am going to use it, at least sometimes.

But there is one more issue, lolz doesnt(to my knowledge) support <stdio>. Call me spoiled, but all those gigabytes of game tmp files to trash my disk is horrible. I could not get unpack cls to work yet thats another problem(for me). If they can make io to work(both for packer and unpack) that would be different story. Anyway, potential is there, this is one of the better compressors out there.

Oh and I also included DLZ packer for reference. I think its great even though only single threaded. Specialized for dds and lagging only slightly behind lolz. Its only problem is that it came at the same time as lolz(to public), otherwise this could be what uharc once was. I was complaining in another thread for lack of dds compressors(msc didnt work) and now I got 2 at once, oh well :).

KaktoR

13-01-2018, 02:42

[...]I could not get unpack cls to work yet thats another problem(for me).[...]

Same here :(

Keep up the good work, i am sure many people benefits from your informations.

elit

13-01-2018, 08:44

Today I additionally tried to compress whole game of 22gb(18gb after srep) with lolz to see effect of certain parameters:
+lolz:mt4 >> 8.98gb
+lolz:mt4:tt1:mc32:mtt1 >> 9.07gb
+lolz:mt4:tt1:mc32:mtt1:dto0:cm0 >> 9.49gb
Additionally I tried to decompress them, all of them expressed approximately same decompression speed and disk was limiting me. Clearly, do not disable context modeling, it have a good ~5% impact and doesnt slow down in neither compression nor decompression too much. Going from default to "-tt1 -mc32 -mtt1" had only 1% difference, in other words it is ok to lower tt# to 1 for speed and *especially* it is good idea to use -mtt1 which give you decompression # cores option without limit = future proof.

78372

13-01-2018, 10:14

^^^^
How much time it took though(in compression)?

elit

13-01-2018, 14:47

^^^^
How much time it took though(in compression)?
It was around 3h40min on my intel 4690k @4.2ghz. At least 40+min however was due to disk trashing because of no <stdio> in lolz. There were multiple steps for which each FA decided to copy new $tmp file. Thats why I could not measure decomp speed properly, it use disk even through archive testing. For this lack of <stdio> alone I may consider sticking to internal lzma until it get implemented in lolz.

Lolz on its own runs around ~2mb/s during compression on my rig - with 4 threads.

78372

13-01-2018, 19:45

It was around 3h40min on my intel 4690k @4.2ghz. At least 40+min however was due to disk trashing because of no <stdio> in lolz. There were multiple steps for which each FA decided to copy new $tmp file. Thats why I could not measure decomp speed properly, it use disk even through archive testing. For this lack of <stdio> alone I may consider sticking to internal lzma until it get implemented in lolz.

Lolz on its own runs around ~2mb/s during compression on my rig - with 4 threads.

slow enough for daily purpose, not a choice of mine, better stick to lzma(2)/zstd

doofoo24

13-01-2018, 20:12

@78372 by any chance do you know the setting for Brotli in Zip-zstd broti MT ?
https://github.com/mcmilk/7-Zip-zstd/releases
is it possible to use zstd/Brotli within 7z.dll/exe to compress and decompress with <stdin> <stdout>...

78372

13-01-2018, 20:55

doofoo24

13-01-2018, 21:05

but do you know setting for arc.ini does it work ?

[External compressor:7zip]
header = 0
packcmd = 7za a -txz -an -mcrc=0 -mf=off -myx=0 -mmt=4 -m0=lzma2{:option} -si -so <stdin> <stdout>
unpackcmd = 7za x -txz -an -y -si -so <stdin> <stdout>

*what to change to brotli ??

78372

14-01-2018, 05:29

elit

15-01-2018, 13:33

Today I tried something different. Rather than focusing on general compressors, I decided to compare default multimedia compressor of FA vs alternatives. Specifically in this test, I was focusing on graphics .bmp files and default $bmp of FA which use grzip for this purpose. I compared it to similar alternatives suitable for graphics as well as more general compressors such as lolz.

Test consisted of around 1.3gb of .bmp files. First and most important thing, by default FA process each file separately. I tarred them into single file and compared default FA's $bmp algo how it cope with separated files vs 1 put together:

original: 1.33gb
-FA -mbmp on dir of files(internally each processed separately and put into archive = default behavior): 539mb

From now all following results are on tarred single file:
-FA -mbmp: 469.61mb !!!
-bcm: 531.96mb
-bsc: 518.54mb
-bsc -m5: 518.86mb
-dlz: 489.85mb
-uharc -m3: 457.73mb
-lolz: 451.42mb
-FA -xppmd: 483.05mb
First of all, even though uharc and lolz were(only ones) better than tarred FA's default $bmp, speed was like ~8x worse. Especially uharc which was slowest of them all. From my test it wasnt worth it, even for lolz not. Forget them, for that little gain they were way too slow. What was surprising to me was that even bsc and bcm did not beat default FA. In fact not even dlz did which was also very slow.

Now, dlz and lolz re specialized on dds textures and there they kick ass ~10%(I already tried), but for common image formats, dont bother.

Default FA's grzip rules, it is internal thus supporting pipelining so no disk trashing, it is extremely fast, multi-threaded and is just great overall.
However, as you see in this test default behavior of processing images separately hurt *big time*. How to solve it? Simple, add rep before bmp:

$bmp=bmp >> $bmp=5rep+bmp
Then when you use something like: FA -m5/$bmp=5rep+bmp -mc:rep/srep... what will happen is that FA will tar and srep images first before applying grzip. And btw, adding rep or srep help even further, it doesnt hurt at all.

elit

15-01-2018, 15:45

And finally, I tested audio(wav) part of mm compressor of FA. Here results were unexpected. Although FA apply tta compressor on each(any) wav file, results are not so clear as you will see.

But first, let me tell you that originally I tried to compare tta to flac. On specific game I tried(IL2 Sturmovik), flac refused to recognize file format even though it have RIFF header and is played fine in foobar2000. These wav's must have been compressed or something. On top of that, you cannot "tar" files and apply flac on it, but tta is able to. Because of issues and compatibility problems, I came to conclussion that it is not worth replacing internal tta with external compressor. But thats not all, I tried tta vs lzma and... you will see. But lets start from beginning.

Original *.wav folder 44.72mb:
FA -mwav: 43.92mb < Clearly not working

Now tarred wavs to single file:
FA -mwav: 42.84mb (small gain when tarred but not working properly)
FA -mxlzma: 20.52mb !!!
FA -mdelta+xlzma: 20.52mb
FA -xlzma+tta: 20.52mb
FA -m5rep+delta+lzma: 21.14mb
FA -mtta+delta+xlzma: 42.84mb
FA -mtta+xlzma: 42.84mb (trying all kind of order as you see)
FA -mdelta+tta: 42.84mb

You can see something is wrong here. Neither tta nor flac seem to be able to process these wav's at all. Could be compressed? But how can then lzma achieve half the size?! This is problem because FA doesnt recognize wav's internally and apply tta based on extension. I couldnt help it so I downloaded random wav file from internet to confirm.

11k16bitpcm.wav(298kb):
FA -mtta: 203kb
FA -mxlzma: 260kb
FA -m5rep+tta: 203kb

Aha!
So this means:

-do continue using internal tta but verify it first if your wav files are "standard" ones or not, test and compare with lzma!

-this thing still need more testing to get right idea, I am particularly curious about combination of delta+tta on "real" wav's as well as effect of rep etc...
EDIT: Since tta doesnt work properly on tarred wav's, this argument is done now, no need for rep or delta on top of it $wav.

-(feel free to tar files together same way as $bmp I described above, like: $wav=5rep+wav for example(or better 0+tta because unlike $bmp here can be a slight loss of compression), tta is great in that unlike other codecs, it work on raw data regardless of extension. This could also help with some data packs that contain wav files inside as well as some audio banks.)
EDIT: ^Do not tar wav files after all, even though it worked on my test dir, during actual game repack test it resulted in decompression error. Keep standard $wav=wav setting.