elit
26-03-2018, 07:07
xnLZ is a multi-threaded lzma (de)compressor heavily based on plzip, of which it is fork of and NOT compatible with anymore.
Its main purpose is to offer alternative to existing 4x4:lzma chain(also known as xlzma) in FreeArc(or other archivers that
can utilize external compressors), but with 64bit support and same cool features that made 4x4 so great. Although xnLZ was
intended to be used for <stdio> mode with FreeArc, it can also work on its own with files, or can be chained with other tools
that support external compressors.
Here are the main features:
- xnLZ utilize entropy check that scan incoming data blocks during compression and can copy them directly if they are not
compresseable enough. All this can be defined by user. This means some data that do not benefit from typical lzma compression
can bypass compress routine completely, speeding whole process. On combined data sets this could usually be anywhere
between 50-300% speedup and decompression speed will benefit as well.
- Data blocks that are copied directly still have their CRC attached and are verified during test or decompression.
- xnLZ was designed to utilize all CPU cores during decompression through <stdio> for maximum speed. This feature actually
existed(as of time of fork at least) in original plzip in complete infrastructure, all that was needed was to increase
buffer/packet limit, which was artificially hardcoded to low value. Now user can directly define how much buffer to allocate,
but it comes at the cost of utilizing more memory during decompression. As example, to fully utilize 4 core CPU with data
blocks of 256MB, ~1300MB-2GB may be needed during decompression, depending on settings. In my tests I was able to load my
CPU fully with slots value 128 on data blocks of 256MB size, which used only around ~1GB RAM.
- xnLZ use hardcoded lzma:lc(literal context) parameter set to 8 instead of default 3 for maximum compression.
- xnLZ expose additional lzma:mc(matchfinder cycles count) option for the user to set, which tend to have big impact on compression speed
- Not dependent on Cygwin libraries anymore
More info here:
https://encode.ru/threads/2932-xnLZ-parallel-lzma-based-on-plzip-with-entropy-scan?p=56404#post56404
Quick random benchmark on combined data folder with lot of zip files, few big exes and dlls:
xlzma(4x4:lzma) & xnlz = d64m, b64m, fb32, mc32, E99% <- both use lzma1:lc8
7z 16.04 = 0=lzma2:d=64m:fb=32:lc=4:mc=32 <- use lzma2
orig data: 3.24g
xlzma(4x4)- c 1:15m, d 5s, 2.76g
xnlz - c 1:40m, d 13s, 2.74g
xnlz:E98 - c 1:23m, d 11s, 2.76g
7z - c 8:12m, d 26s, 2.70g
xnlz:E100 - c 6:15m, d 49s, 2.71g
EDIT: here is the result on uncompressed data:
orig file: 660.3mb
7z: c 1:15m, d 12s, 270.19mb
xnlz: c 1:17m, d 7s, 270.85mb
The reason why xnlz was slower during decompression in first test when entropy skipping was disabled and despite using all cores was because archive contained compressed data(zip files) and entropy skipping was disabled. Lzma2 was quicker because it does better job handling compressed data than lzma1. However, first it is no match on whole block entropy skipping in terms of speed when compared to xlzma or xnlz, and second here it was finally slower with inflated data decompression.
Its main purpose is to offer alternative to existing 4x4:lzma chain(also known as xlzma) in FreeArc(or other archivers that
can utilize external compressors), but with 64bit support and same cool features that made 4x4 so great. Although xnLZ was
intended to be used for <stdio> mode with FreeArc, it can also work on its own with files, or can be chained with other tools
that support external compressors.
Here are the main features:
- xnLZ utilize entropy check that scan incoming data blocks during compression and can copy them directly if they are not
compresseable enough. All this can be defined by user. This means some data that do not benefit from typical lzma compression
can bypass compress routine completely, speeding whole process. On combined data sets this could usually be anywhere
between 50-300% speedup and decompression speed will benefit as well.
- Data blocks that are copied directly still have their CRC attached and are verified during test or decompression.
- xnLZ was designed to utilize all CPU cores during decompression through <stdio> for maximum speed. This feature actually
existed(as of time of fork at least) in original plzip in complete infrastructure, all that was needed was to increase
buffer/packet limit, which was artificially hardcoded to low value. Now user can directly define how much buffer to allocate,
but it comes at the cost of utilizing more memory during decompression. As example, to fully utilize 4 core CPU with data
blocks of 256MB, ~1300MB-2GB may be needed during decompression, depending on settings. In my tests I was able to load my
CPU fully with slots value 128 on data blocks of 256MB size, which used only around ~1GB RAM.
- xnLZ use hardcoded lzma:lc(literal context) parameter set to 8 instead of default 3 for maximum compression.
- xnLZ expose additional lzma:mc(matchfinder cycles count) option for the user to set, which tend to have big impact on compression speed
- Not dependent on Cygwin libraries anymore
More info here:
https://encode.ru/threads/2932-xnLZ-parallel-lzma-based-on-plzip-with-entropy-scan?p=56404#post56404
Quick random benchmark on combined data folder with lot of zip files, few big exes and dlls:
xlzma(4x4:lzma) & xnlz = d64m, b64m, fb32, mc32, E99% <- both use lzma1:lc8
7z 16.04 = 0=lzma2:d=64m:fb=32:lc=4:mc=32 <- use lzma2
orig data: 3.24g
xlzma(4x4)- c 1:15m, d 5s, 2.76g
xnlz - c 1:40m, d 13s, 2.74g
xnlz:E98 - c 1:23m, d 11s, 2.76g
7z - c 8:12m, d 26s, 2.70g
xnlz:E100 - c 6:15m, d 49s, 2.71g
EDIT: here is the result on uncompressed data:
orig file: 660.3mb
7z: c 1:15m, d 12s, 270.19mb
xnlz: c 1:17m, d 7s, 270.85mb
The reason why xnlz was slower during decompression in first test when entropy skipping was disabled and despite using all cores was because archive contained compressed data(zip files) and entropy skipping was disabled. Lzma2 was quicker because it does better job handling compressed data than lzma1. However, first it is no match on whole block entropy skipping in terms of speed when compared to xlzma or xnlz, and second here it was finally slower with inflated data decompression.