|
#1
|
|||
|
|||
|
How to get most of lolz parallelism
Originally I was planing to add this as a comment to my "Cost vs return debate" thread, but I did not wanted to necro my old topic plus this is not completely relevant to that. I am not going to compare here lolz with other compressors, I am going to share my experience on how its MT works and how to get most of it.
You may have noticed that even when using -mt1, lolz will benefit from other cores by offloading certain(most certainly detection) calculations to other threads. Still, it is very common procedure of the user to select number of threads equivalent to number of CPU cores. I have made small test and compared its MT scaling on my intel 4690k CPU. I used 21a7 version but later I also tried latest 22c4b just to see ldmf in action, whether it is efficient speed wise. This test is not about final file size, it was done only on small ~600mb file and any variations were negligible regardless of any setting in all test so I will only focus on speed and time here. Basic setting was -d128 -mc8 -tt1, and from there I started changing -mt and for last ldmf tests I changed only -ldl. Lets start with -mt1: mt1.png ^Compression on "single" thread is done in 3:48min and average speed was ~2762k/s. I quoted "single" deliberately because CPU usage often spiked as high as 74%(!) and was at least 36-42% most of the time. Lets try -mt2: mt2.png ^We see almost linear scaling, speed jumped to ~4448k/s and time it took was 2:21min. This is -mt3: mt3.png ^Speed is even higher, at ~5100k/s and time took only 2:03min. Scaling is already worse, we gained only about ~600k/s vs previous ~1700k/s. Finally, -mt4(and you already know what to expect do you ):mt4.png ^Yep, speed is actually slightly slower than -mt3 as well as time by 1sec, but this is not even full picture. I made this test twice and what I am showing you is second one where I did not even touched computer during processing. In the first test, even having open web browser took the speed down to ~4400k/s which was less than -mt3! Most certainly culprit is a L2/L3 cache insufficiency/misses. In another words, doing 2 cores on 4C CPU is the most efficient way, and even -mt1 is not that bad as I originally thought. However, since I strongly suspect high dependency on L2/L3 cache, you would be better advised to do your own tests and setup lolz's -mt param. to benefit most of YOUR OWN CPU. It wouldn't surprise me at all for example if AMD CPU's behaved differently and by same token is to consider(and test) Hyper Threading type of CPU's. Maybe on more than 4C CPU's, it will be necessary to set to -2 cores less, not just 1, especially if CPU cache is a concern. Anyway, I thought you may want to see this... oh and lets throw some -ldmf of most recent lolz version to the mix, but please consider this test may or may not be good enough for such thing as file size was too small. But anyway, -ldmf with default -ldl8: ldmf8.png and -ldl5: ldmf5.png ^These tests were done on -mt1 so to compare with previous -mt1, its a few seconds difference. So yeah, thats it. EDIT: on bigger files, ldmf time will extend to couple of minutes. On ~16gb it added around ~12-24min. That makes it slower than srep but, in the picture of whole compression which took more than 1h:30min(1:34min was quickess with -mt3, mt1 took 3h+) its doable. Also ldmf doesnt do full pre-pass of data like srep - which, if you compressed big things like 60gb+ on regular HDD would add another 6-12min. If added time of ldmf concerns you, you really better just use lzma and FA's 4x4 is simply unbeatable when it comes to ratio vs speed/memory(srep+xlzma:lc8 compressed those 16g in ~16min with no worse than ~8% loss == ~700% speedup vs ~8%ratio). But between external lzma's(which are slower than xlzma) and lolz, I would rather then use lolz. I mean either I want max compresson, or max efficiency. EDIT2: unfortunately I have bad news, ldmf really doesn't work well with multiple cores, even if mtt=0. From that 16gb dataset(inflated "We Happy Few" game .pak's): Quote:
Quote:
- if you want to use more than 1T in lolz, srep is the only good option and then ldmf is pointless and would only add to compression time - make sure you disable it if you use srep with -mt > 1 - srep+lolz[MT] can still achieve very similar results to lolz[1T]+ldmf, with only slightly worse compression but significant speedup, in my case 3x - even with srep and -mt3 though, lolz is still 5x slower than srep+xlzma, now with only about ~6.5% better cmp ratio. - between lolz[1T]+ldmf and lolz[3T]+srep is only ~1.75% cmp ratio difference, but 300% speed difference - of course, there will be variation with different data/game but my past experience was that 90+% of the time there was no more than ~2% variation Then again, this is just for extra info. Main topic is about lolz and its best threading settings + ldmf and on that, conclusion is that it may be better to set MAX_CPU-1 or -2 depending on type, and ldmf only if you use -mt1 AND -mtt0, otherwise go srep(for now at least). If you want to keep best ratio with lolz[1T]+ldmf, there is one way around it: process more games separately at once, overall it will be more efficient use of CPU cores to cut time. Last edited by elit; 18-02-2019 at 12:30. |
| The Following 11 Users Say Thank You to elit For This Useful Post: | ||
78372 (17-02-2019), BLACKFIRE69 (12-09-2019), COPyCAT (20-10-2020), DedSec (16-02-2019), Gehrman (02-06-2022), Harsh ojha (01-10-2019), JRD! (22-02-2019), mubbii (16-02-2019), Pantsi (16-02-2019), Razor12911 (17-02-2019), shazzla (06-10-2019) | ||
| Sponsored Links |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Best Compression Methods for 'Specific' Games INDEX | JustFun | Conversion Tutorials | 58 | 30-03-2026 18:03 |
| NEW LOLZ V22c4 | doofoo24 | Conversion Tutorials | 61 | 08-06-2024 08:49 |
| Bench Test (LOLZ vs RAZOR vs MCM vs LZMA2) | felice2011 | Conversion Tutorials | 5 | 19-10-2020 07:40 |
| LZMA vs LOLZ & Scan Compress Method | yasitha | Conversion Tutorials | 58 | 11-01-2019 09:01 |
| problem with lolz | Kitsune1982 | Conversion Tutorials | 6 | 11-06-2018 13:04 |