|
Could someone write a "seemingly simple" CLS filter, a WIM format* (files store only compressed, aka hiddenly deduplicated) compressor with streaming functionality? (It would work exactly as if we were packing with the REP algorithm, as far as writing the ARC file is concerned, without temporary TEMP files) Many old games have a feature that a file exists in multiple copies, so we could avoid the classic "duped files removed before compression" and "Duping files... during the unpacking process" solutions.
*WIM-compression level: Store
This is available in 7-Zip, but a CLS filter would be a better solution. When preparing a set of WAV files of nearly 5GB in this way, it consumes only 10-12MB of memory, while compressing to 3.2GB. (This is how much less needs to be copied/written before running the next algorithm.) In other words, if we manually deleted all duplicate files, the starting state would be this much smaller before running the other algorithms. I know, this is not typical for today's games. In this case, REP/SREP is not a viable option if MSC were the next algorithm or, for example, XTOOL. (The size of the deflated data file would be smaller.)
We could also use ZPAQ, but it deduplicates at the file level, "with the -m0 or -m01 switches". In the example above, the prepared dataset will be around 3GB in size, but the final compression will be worse. (1.24GB vs. 1.35GB) With the -m0 switch, ~1.05GB, with -m01 it will consume ~150-170MB of memory during packaging. Compared to the 10-12MB used when 7-Zip/WIM compression.
|