xnLZ - parallel lzma with entropy scan intended as 64bit 4x4:lzma replacement [Archive]

View Full Version : xnLZ - parallel lzma with entropy scan intended as 64bit 4x4:lzma replacement

elit

26-03-2018, 07:07

xnLZ is a multi-threaded lzma (de)compressor heavily based on plzip, of which it is fork of and NOT compatible with anymore.
Its main purpose is to offer alternative to existing 4x4:lzma chain(also known as xlzma) in FreeArc(or other archivers that
can utilize external compressors), but with 64bit support and same cool features that made 4x4 so great. Although xnLZ was
intended to be used for <stdio> mode with FreeArc, it can also work on its own with files, or can be chained with other tools
that support external compressors.

Here are the main features:

- xnLZ utilize entropy check that scan incoming data blocks during compression and can copy them directly if they are not
compresseable enough. All this can be defined by user. This means some data that do not benefit from typical lzma compression
can bypass compress routine completely, speeding whole process. On combined data sets this could usually be anywhere
between 50-300% speedup and decompression speed will benefit as well.

- Data blocks that are copied directly still have their CRC attached and are verified during test or decompression.

- xnLZ was designed to utilize all CPU cores during decompression through <stdio> for maximum speed. This feature actually
existed(as of time of fork at least) in original plzip in complete infrastructure, all that was needed was to increase
buffer/packet limit, which was artificially hardcoded to low value. Now user can directly define how much buffer to allocate,
but it comes at the cost of utilizing more memory during decompression. As example, to fully utilize 4 core CPU with data
blocks of 256MB, ~1300MB-2GB may be needed during decompression, depending on settings. In my tests I was able to load my
CPU fully with slots value 128 on data blocks of 256MB size, which used only around ~1GB RAM.

- xnLZ use hardcoded lzma:lc(literal context) parameter set to 8 instead of default 3 for maximum compression.
- xnLZ expose additional lzma:mc(matchfinder cycles count) option for the user to set, which tend to have big impact on compression speed
- Not dependent on Cygwin libraries anymore

More info here:
https://encode.ru/threads/2932-xnLZ-parallel-lzma-based-on-plzip-with-entropy-scan?p=56404#post56404

Quick random benchmark on combined data folder with lot of zip files, few big exes and dlls:
xlzma(4x4:lzma) & xnlz = d64m, b64m, fb32, mc32, E99% <- both use lzma1:lc8
7z 16.04 = 0=lzma2:d=64m:fb=32:lc=4:mc=32 <- use lzma2

orig data: 3.24g

xlzma(4x4)- c 1:15m, d 5s, 2.76g
xnlz - c 1:40m, d 13s, 2.74g
xnlz:E98 - c 1:23m, d 11s, 2.76g
7z - c 8:12m, d 26s, 2.70g
xnlz:E100 - c 6:15m, d 49s, 2.71g

EDIT: here is the result on uncompressed data:
orig file: 660.3mb
7z: c 1:15m, d 12s, 270.19mb
xnlz: c 1:17m, d 7s, 270.85mb

The reason why xnlz was slower during decompression in first test when entropy skipping was disabled and despite using all cores was because archive contained compressed data(zip files) and entropy skipping was disabled. Lzma2 was quicker because it does better job handling compressed data than lzma1. However, first it is no match on whole block entropy skipping in terms of speed when compared to xlzma or xnlz, and second here it was finally slower with inflated data decompression.

elit

02-04-2018, 10:57

This is xnlz with lzma:pb4 (and lzma:lc8 like previous release). This tend to give slightly better compression in games(0.5-2%) from what I have seen. However, this and last release are not compatible, keep that in mind. I plan to stick with this setup, therefore its better to release now than later when people adopt it for too long.
Sources are same only change was "pos_state_bits" param located in LZ dir in xnlz.h file. You can also test it and let me know if it improved ratio for you.

21578

Simorq

02-04-2018, 11:55

xnlz.exe has stopped working

elit

02-04-2018, 17:43

xnlz.exe has stopped working
Oh, I forgot to change gcc flags, I compiled it with -march=native which is for Haswell. I re-uploaded more general one(-mtune=native -msse2) hopefully it work now, my apology.

elit

03-04-2018, 05:59

Pls let me know how it fare against original one especially in games, compress few and post, if it shows continuous gain and/or only minimal loss occasionally this will be permanent standard setting for future.

elit

06-04-2018, 08:37

So I made more tests on multiple data. My conclusion is that its better to stick to original one, it is very good already and lzma:pb4 did hurt a bit too often, even if only negligible. Difference was tiny anyway.

Simorq

15-04-2018, 12:18

x86 compiled?

elit

16-04-2018, 15:11

x86 compiled?

Simorq you mean 32bit? You can compile under msys2 if you need it, use -m32 in makefile instead -m64. But what would be the point? Its IMO better to use internal FA's lzma them.

kj911

03-01-2025, 06:15

Bumping the old thread....

Is the lp parameter fixed to lp0?? Or can it be parameterized? In many cases, the lp0:pb0/pb2 pair is the best, including lc8, and values between pb4 and lp1-4 are rarely seen in compressed archives. With DLZ, the lp/pb switches are chained together, where lp0 the best works. Does the mc switch support max. 1023/1024 or more? Just like, the maximum value of fb is only 273? It's strange, it's not often that you see a method like xnLZ used among the finished conversions, instead of the classic 4x4:lzma.

The x86 edition would make sense so that when decompressing we are not limited to using Vista/Win7 x64 or newer OS, it could also be installed on 32bit XP, although there is also the ARC built-in xlzma method or DLZ.

elit

20-04-2025, 13:07

Bumping the old thread....
xnLZ is a modded plzip with support for skipping uncompresseable blocks, just like xlzma method of Freearc does. Maybe thats why it is used by some despite not having updates. For LZMA parameters refer to original plzip documentation you can find on internet. If I remember original author made pb to be set automatically based on input data.