Go Back   FileForums > Game Backup > PC Games > PC Games - CD/DVD Conversions > Conversion Tutorials
Register FAQ Community Calendar Today's Posts Search

 
 
Thread Tools Search this Thread Display Modes
Prev Previous Post   Next Post Next
  #1  
Old 29-03-2017, 14:43
felice2011's Avatar
felice2011 felice2011 is offline
Registered User
 
Join Date: Feb 2011
Location: italy
Posts: 836
Thanks: 357
Thanked 1,158 Times in 390 Posts
felice2011 is on a distinguished road
Lightbulb Pseudorandom Number Sequence Test + LZ Bench

( Overview to Entropy of a File )

Simplistically entropy is disorder, or better in computing the density of information that a data stream can contain.
So the more the content of a file will be predictable, the higher the entropy.

The software ENT (Pseudorandom Number Sequence Test Program, http://www.fourmilab.ch/random)
performs various statistical tests, providing output in the following report:

1) Entropy

is the density of information contained in a file expressed in number of bits per character. The maximum entropy is 8, when we find a file with entropy or 8 means that it is perfectly random, or is compressed.
In fact, taken a bitmap, its entropy is 4.724502 bits per byte, if you turned it into jpeg becomes 7.938038 bits per byte. If you compress the bmp with winrar I get 7.996259 bits per byte. This is clear.
I take a text file where I have written a thousand times the same word we get is that its entropy Entropy = 2.545246 bits per byte. If we compress with winrar we get Entropy = 6.747827, and winrar maximum compression Entropy = 6.756800.

2) Test of Chi square

Used for the study of random data streams. If we apply it to image files as a result they are random data.
In practice, it occupies the deviation percentage of the flow of data from a real random sequence.
However, if the result is> 99% or <1% of the data stream is not random. If it is between <5% and> 95% of the flow is random suspiciously, if intermediate then we are on random.

3) Arithmetic Mean

Sum all the bytes and divides them for the length: it is a type of arithmetic mean. The closer the number is 127.5 more random.

4) Test of Pi-Greco Montecarlo

The more the value is close to pi-greco (3.14 ..) plus the data stream is random / compressed.

5) Coefficient of Correlation

Ie how predictable a byte knowing his previous. More the value is close to 1 and more is predictable, more and more close to 0 is random.

SOME EXAMPLES ON THE IMAGES :

We analyze a bitmap file ...

Quote:
Entropy = 4.724502 bits per byte.
Optimum compression would reduce the size
of this 5443254 byte file by 40 percent.

Chi square distribution for 5443254 samples is 373809870.53, and randomly
would exceed this value less than 0.01 percent of the times.

So definitely not random ...

Arithmetic mean value of data bytes is 160.4207 (127.5 = random).
Monte Carlo value for Pi is 1.912926349 (error 39.11 percent).

So far from pi greco

Serial correlation coefficient is 0.818560 (totally uncorrelated = 0.0).
Note that the file is not full of information because the maximum entropy is 8 is in fact the information is compressible.
The chi-square gives us a value of 0.01 it says that the flow is not accidental but it is a picture is not reliable.
The average is 160 and deviates from 125 and is therefore not random. Even the monte carlo is far from 3.14.
The correlation is 0.81: bitmaps are always close correlation to 1. If they were random data would be 0.

Now let's look at the same bitmap converted to jpg ...

Quote:
Entropy = 7.938038 bits per byte.

Chi square distribution for 231584 samples is 27771.22, and randomly
would exceed this value less than 0.01 percent of the times.(non random?)

Arithmetic mean value of data bytes is 122.7712 (127.5 = random).
Monte Carlo value for Pi is 3.215483069 (error 2.35 percent).
Serial correlation coefficient is 0.039713 (totally uncorrelated = 0.0).
Also here very close to 0, identifies it as random.

Here the entropy is very high (7.9), the file is very compressed.
Test Monte Carlo 3.21, quite close to the pi greek, so close to random.
Correlation coefficient close to 0. understand that it is compressed.







After this short overview this GUI natively leverages the application ENT, to calculate the entropy of a file, or an full data folder, providing a report based on the reduction of the file or folder data, and its total compression ratio, for know quickly if a file type and / or folder, will have a high or low compression.

Classification of the file or group of files:

The scanning of the file or folder data, it's divided into 5 blocks with a calculation of the entropy range from 1.0 to 7.0 for Deflate e Text, and 1.0 to 7.5 for Void and Msrsolid, through direct reading of the file arc.group during the scanning, and based on the reading of the extensions of the 4 masks "Void, deflate, Msrsolid and Text" and the basic method.

The files with higher entropy than 7.0 or 7.5 or an extension not set in arc.group file are classified and added to the basic method.

I chose the level 7 and 7.5 on the basis of various tests performed out on individual files of various formats, a file with the entropy level from 7.0 to 7.5, with a strong compression carried out with different samples of compressors you get a reduction of 20-25% and a compression ratio of 75-80%.

Each block contains additional information, according to the main method and masks, number of files scanned and belonging, percentage of size reduction, percentage of compression ratio, total size of files added in the belonging block.



Creating a masked method estimate in based on the entropy and scanning of files:

With a choice of 44 compressors on 4 masks, these will be activated or deactivated, in based on the scan and the entropy of the previously evaluated files, in order to speed up and simplify the creation of the final masked method with a correct of compression estimated of 90% on the compressed files.



LZbench v1.7.1 by Inikep : " Benchmark Compressors LZ77/LZSS/LZMA " https://github.com/inikep/lzbench

Thank Inikep from encode.ru, the application incorporates a modified and adapted for the complete benchmark on a single file or entire directory of 63 compressors of the Family LZ77/LZSS/LZMA, with a full report out for each file (compression speed in Mb/s, decompression speed in Mb/s, original size, compressed size, ratio and file name).
In the same way we will have a final scan with the reduction in size and compression ratio, to compare the various compressors and choose based to the speed of compression, decompression and ratio, on the types of scanned files.

In LZBench no size file limit, even using a low amount of memory the average of the ratio is calculated for the number of divided parts obtaining the overall result of the compression ratio.

Quote:
blosclz 2015-11-10 [1-9]
brieflz 1.1.0
brotli 2017-03-10 [0-11]
brotli22 2017-03-10 [0-11]
brotli24 2017-03-10 [0-11]
crush 1.0 [0-2]
csc 2016-10-13 [1-5]
density 0.12.5 beta [1-3]
fastlz 0.1 [1-2]
gipfeli 2016-07-13
glza 0.8
libdeflate 0.7 [1-12]
lz4 1.7.5
lz4fast 1.7.5 [1-99]
lz4hc 1.7.5 [1-12]
lizard 1.0 [10-49]
lzf 3.6 [0-1]
lzfse 2017-03-08
lzg 1.0.8 [1-9]
lzham 1.0 -d26 [0-4]
lzham22 1.0 [0-4]
lzham24 1.0 [0-4]
lzjb 2010
lzlib 1.8 [0-9]
lzma 16.04 [0-9]
lzmat 1.01
lzo1 2.09
lzo1a 2.09
lzo1b 2.09
lzo1c 2.09
lzo1f 2.09
lzo1x 2.09
lzo1y 2.09
lzo1z 2.09
lzo2a 2.09
lzrw 15-Jul-1991 [1-5]
lzsse2 2016-05-14 [0-17]
lzsse4 2016-05-14 [0-17]
lzsse4fast 2016-05-14
lzsse8 2016-05-14 [0-17]
lzsse8fast 2016-05-14
lzvn 2017-03-08
pithy 2011-12-24 [0-9]
quicklz 1.5.0 [1-3]
shrinker 0.1
slz_deflate 1.0.0 [1-3]
slz_gzip 1.0.0 [1-3]
slz_zlib 1.0.0 [1-3]
snappy 1.1.4
tornado 0.6a [1-16]
ucl_nrv2b 1.03 [1-9]
ucl_nrv2d 1.03 [1-9]
ucl_nrv2e 1.03 [1-9]
wflz 2015-09-16
xpack 2016-06-02 [1-9]
xz 5.2.3 [0-9]
yalz77 2015-09-19 [1-12]
yappy 2014-03-22 [0-99]
zlib 1.2.11 [1-9]
zling 2016-01-10 [0-4]
zstd 1.1.4 [1-22]
zstd22 1.1.4 [1-22]
zstd24 1.1.4 [1-22]
NOTE : (The entropy and classification of the files and the estimated masked methods, is not enabled for the moment with the use of LZ-BENCH).



"arc.groups" updated to version 3.0, based on version 2.5 of Panker1992, they were added over 200 popular formats used in the area of gaming.

We Avoid the Vultures, those who not give credit and thanks for all the work, please do not use the application and not download...

UPDATED : BE_Parent_Dir

The file parent directory is displayed in the masks box.
Other minor fix.

In Down.
Attached Files
File Type: 7z Bench_Entropy_v1.0.7z (1.69 MB, 260 views)
File Type: 7z Bench_Entropy_Parent_Dir.7z (1.69 MB, 83 views)
__________________
≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈ ≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈
« I Mediocri Imitano, I Geni Copiano, Dio Crea & Distrugge » (Io Ridefinisco & Perfeziono le Loro Opere Rendendole Uniche)
≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈ ≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈
« Mediocrities Imitate, Genius Copy, God Creates & Destroys » (I Reconsider & Improve Their Works, Rending Them One And Only)

Last edited by felice2011; 25-04-2017 at 00:59. Reason: Added BE_Parent_Dir
Reply With Quote
The Following 17 Users Say Thank You to felice2011 For This Useful Post:
-XCX- (29-03-2017), arkantos7 (01-04-2017), ChronoCross (29-03-2017), COPyCAT (24-01-2018), elit (28-09-2017), EzzEldin16 (29-03-2017), gozarck (04-04-2017), JRD! (04-04-2017), kassane (29-09-2017), knife16 (29-03-2017), mikey26 (29-03-2017), ramazan19833 (29-03-2017), Razor12911 (04-04-2017), rinaldo (04-04-2017), romulus_ut3 (03-04-2017), Simorq (29-03-2017), Stor31 (21-04-2017)
Sponsored Links
 


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
(Arrow) Cmd -Info - Bench -Test felice2011 Conversion Tutorials 36 28-11-2016 12:39
Fast Brute (test) Razor12911 Conversion Tutorials 49 07-06-2016 03:44
test bench rinaldo Conversion Tutorials 8 28-02-2016 04:55
Bejeweled 2 smoggey PC Games 3 28-09-2005 17:11



All times are GMT -7. The time now is 04:49.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2026, vBulletin Solutions Inc.
FileForums @ https://fileforums.com