View Single Post
  #1  
Old 19-09-2020, 03:27
elit elit is offline
Registered User
 
Join Date: Jun 2017
Location: sun
Posts: 265
Thanks: 190
Thanked 325 Times in 119 Posts
elit is on a distinguished road
Quote:
Originally Posted by FitGirl View Post
I would never use crc32. In my repacking experience I've met three counts of FULL files crc32 match while having absolutely different content and sizes. Please use better/newer algos, otherwise there will be guaranteed collisions meaning corrupted data.
Very different content and same hash.. that's quite something. This does depend on number of files, their size and especially polynomial of that crc though. Crc32's are multiple versions and polynomial of it is extremely important. I think there were some shitty variants that had low quality and only 2 that were really solid - one use Intel in their CPU's. With those crc32 should be perfectly fine and reliable but still only up to certain counts and data sizes - hence my worry since this tool is used for huge data and games are known to have both size and really big number of files inside their packs = tons of chunks.

As for collisions, he still cannot 100% rely on *any* hash so he have to verify by content before applying dedup regardless, otherwise its too risky. That mean occasional rare collision should not be a big deal to overall size, but also only if chunks are small. If you collide on multiple chunks of 10+mb or even 100+mb then you may get few hundred mb's worse compression.

If minimum chunk size is >= couple of kb's, few extra bytes of hash size should be negligible. I would suggest crc64 or sha128(or even better VMAC that srep use).
Reply With Quote
Sponsored Links