FileForums

FileForums (https://fileforums.com/index.php)
-   Conversion Tutorials (https://fileforums.com/forumdisplay.php?f=55)
-   -   Weird DTA compressed file in Mafia (https://fileforums.com/showthread.php?t=106733)

teusma 08-05-2025 16:04

Weird DTA compressed file in Mafia
 
In Mafia there is a file called "A0.dta" with 9,439 files - all of them in wav (PCM) format, this file is compressed and encrypted... but what's interesting about that? I'll explain as best I can, I hope what I discover is clear... There is a tool that supports the extraction of these files (wav pcm) that is compatible with msc_tak and when compressing these wav files, it is 989 MB (after already extracted from the A0.dta file) and becomes 576 MB. If I run 7z lzma 2 over it and try to compress it even more, I can save only about 700 kb. If I compress the folder with these wav files without running msc_tak first, it becomes 654 MB, but where am I going with this? - That's where the trick comes in, the A0.dta file when still intact encrypted/compressed (a proprietary compression that I have no idea what it is... maybe a lz77 variant from what I've read) it consumes 495 MB of disk space and if you compress it even more with 7z lzma2 it goes down to 343 MB. Now explain to me how the hell does this happen??? recap:
the size of the intact file is: 495mb

when extracted (undoes the proprietary compression) it goes to 989MB

when I compress these extracted files (wav) with msc_tak it drops to 576MB and even if I compress it even more with lzma2 it only loses ~700kb

if I just compress these extracted files (wav), with 7z lzma2 it goes to 654mb

and if I simply compress the A0.dta file as it is (without decrypting and uncompressing the proprietary compression) it ends up with 343MB - much smaller!!!::eek:...

now? I wonder if there is a way to "usurp" this proprietary compression to create a wav pre-processor and use it in other wav files from any other game...

And I'm still going to test if the other files also behave this way to see if it's only with A0.dta or if it also happens with A1.dta, A2.dta, A3.dta... if this happens.

What do you think of all this? Am I making a mistake in some way???

I'll leave the files for download so you can see for yourself:

https://www.mediafire.com/file/zev2u...ractor.7z/file

teusma 08-05-2025 16:31

I tried with the A1.dta that contains other types of data, which are not wav pcm) and it didn't work, the file got bigger when compressed with the "unknown proprietary compression" - I mean if you extract the files from A1.dta and then compress them it will get smaller... it seems to be a specific behavior for wav (pcm) sounds. I hope some brilliant mind can unravel this mystery...

FitGirl 08-05-2025 17:28

LZMA is a solid compressor. MSC_tak operates on chunk level. So you have lots of smaller PCM chunks which are compressed individually, so later LZMA can't do much, too heavy entropy.

As for "why lzma does better on encrypted data" - it just may happen that default ordering bytes (l/b/p options for LZMA compressor) are good for this particular set of data. afair it's 3-0-2 by default. You could try to change those too in brute force and see if it will make compression better.

Try cls-razor on original and decrypted/unpacked data, it's better suited for raw-like data. lolz also may help on some raw-like stuff, but not on all.

teusma 08-05-2025 18:34

Quote:

Originally Posted by FitGirl (Post 507622)
LZMA is a solid compressor. MSC_tak operates on chunk level. So you have lots of smaller PCM chunks which are compressed individually, so later LZMA can't do much, too heavy entropy.

As for "why lzma does better on encrypted data" - it just may happen that default ordering bytes (l/b/p options for LZMA compressor) are good for this particular set of data. afair it's 3-0-2 by default. You could try to change those too in brute force and see if it will make compression better.

Try cls-razor on original and decrypted/unpacked data, it's better suited for raw-like data. lolz also may help on some raw-like stuff, but not on all.

The tests I did were applying srep first, on the files (from the sounds folder) extracted from A0.dta - 7z seems to "not like this" and then it ends up with an even "crazier" ratio of 70% vs 45% when compressing the sounds folder (from the A0.dta file) without going through srep...

But even without going through srep, compressing the original A0.dta file vs the extracted ones saves 87 MB!

By the way, if you're going to make a new repack of GTA IV, try using bzip3 (on the sound files) instead of bcm - it achieves a higher ratio than bcm does. On the file from The Godfather the Game (classic) "mx_gf_01.mus" it also stands out better than all of them (with a block size of 134 MB), except nanozip -cc. in the GUN game (audio files) it also compresses better (applying srepm3f+bzip3 to the .wad files separately

bgip3 has proven to be a good compressor, standing out more than lolz, razor, lzma, rzm in certain types of data such as unknown generic sound files.

teusma 08-05-2025 19:50

when doing more compression tests with A0.dta...
A0.dta >>> 7z(lzma2) 343 MB
A0.dta >>> lolz_tt16 335 MB
A0.dta >>> razor 334 MB
A0.dta >>> bzip3 324 MB

in the extracted files
A0.dta(sounds) >>> 7z(store)+bzip3 582 MB
A0.dta(sounds) >>> 7z(store)+razor 574 MB

teusma 08-05-2025 20:20

it will be? which is specific to this wav sound data from the game mafia or if you use this same encryption and compression in other PCM wav audios from others games and then compress it with a good compressor like lzma, lolz, bzip3, razor... maybe it could work... now the question is who will be the brilliant mind that will be able to do such a feat??? waiting for anyone who wants to volunteer to do this hahahaha.

um bom site pra quem quiser começar é esse:https://progamercity.net/ghack-tut/4...algorithm.html) :)

kj911 09-05-2025 03:58

1 Attachment(s)
If this DTA format is really a special precompressor that is specialized mostly for WAV audio, I would appreciate it if someone could write a packer for it.

Have you tested it on the "aa.dta" file? If it is true, it contains OGG files, what kind of results would you get with or without OGGRE?

Found is a "DTAs Packer". (attached)

Russian descriptions from the tool: https://lhm.fandom.com/ru/wiki/DTAs_packer

Exactly, it cannot be used for the "a0.dta" file. :/

Joe Forster/STA 09-05-2025 07:57

(Thread renamed. Please, give your threads meaningful titles so that searches for keywords can find them.)

teusma 09-05-2025 22:36

Quote:

Originally Posted by kj911 (Post 507635)
If this DTA format is really a special precompressor that is specialized mostly for WAV audio, I would appreciate it if someone could write a packer for it.

Have you tested it on the "aa.dta" file? If it is true, it contains OGG files, what kind of results would you get with or without OGGRE?

Found is a "DTAs Packer". (attached)

Russian descriptions from the tool: https://lhm.fandom.com/ru/wiki/DTAs_packer

Exactly, it cannot be used for the "a0.dta" file. :/

The files that have ogg inside are AB.dta and not AA.dta and yes, I tested this file and it doesn't influence anything whether it is compressed or extracted

FitGirl 10-05-2025 09:33

There is a heavy chance that the extractor you use for DAT is actually not simply extracting wav files, but converts them from something like ADPCM encryped wavs to simple PCM wavs, hence the inflated size after final compressor. ADPCM was pretty popular in Mafia 1 times.

And thanks for pointing to bzip3, will include it into my bruteforcing queue.

KaktoR 10-05-2025 09:57

Yep. Thanks for the bzip3 mention. Was testing around with this since I read it here two days ago and it's good enough for my use cases.

teusma 10-05-2025 11:19

Quote:

Originally Posted by FitGirl (Post 507654)
There is a heavy chance that the extractor you use for DAT is actually not simply extracting wav files, but converts them from something like ADPCM encryped wavs to simple PCM wavs, hence the inflated size after final compressor. ADPCM was pretty popular in Mafia 1 times.

And thanks for pointing to bzip3, will include it into my bruteforcing queue.

maybe, who knows... it would be strange, not to say bizarre, but the inflated size can also be explained by its own proprietary compression. since there is this same compression in the other files that bloat when extracted

teusma 10-05-2025 11:41

Quote:

Originally Posted by FitGirl (Post 507654)
There is a heavy chance that the extractor you use for DAT is actually not simply extracting wav files, but converts them from something like ADPCM encryped wavs to simple PCM wavs, hence the inflated size after final compressor. ADPCM was pretty popular in Mafia 1 times.

And thanks for pointing to bzip3, will include it into my bruteforcing queue.

There is a big chance that this is not the case, because when analyzing the acoustic spectrum (with the spek program) the audio extracted from A0.dta (the largest of them) called: "07b_SarasTheme.wav" has 22Khz - typical PCM audio quality...

teusma 10-05-2025 20:40

On second thought... maybe you're right fitgirl, because the other audios are in bad quality. it's as if it selected only the "07b_SarasTheme.wav" to not be compressed.

Are you using bzip3's <stdin> <stdout> function?

FitGirl 12-05-2025 16:15

Quote:

Originally Posted by teusma (Post 507665)
On second thought... maybe you're right fitgirl, because the other audios are in bad quality. it's as if it selected only the "07b_SarasTheme.wav" to not be compressed.

Are you using bzip3's <stdin> <stdout> function?

I've tried original and recompiled versions. Unfortunately, while being a little bit faster than BCM, it's not stable both in compression an decompression. Random errors appeared on many test files (incl. audio from GTA5) on decompression, so I would not use it at this state. BCM and BSC gave either better or very near final sizes while being a little slower or faster, depending on case. But they are stable, so...


All times are GMT -7. The time now is 07:52.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2026, vBulletin Solutions Inc.
FileForums @ https://fileforums.com