If we had the computing power to bruteforce hashing algorithms like MD5 where the original string/bin data was ~ 10MB wouldn't we essentially have created the greatest form of compression?
Why am I wrong?
If we had the computing power to bruteforce hashing algorithms like MD5 where the original string/bin data was ~ 10MB wouldn't we essentially have created the greatest form of compression?
Why am I wrong?
I guess technically it would, but it would not be time efficient, since you would have to run through all the previous combinations before finding yours.
I don't fully understand your question but I think you're trying to say that if you can reverse a md5 hash via bruteforce you will have figured out the content of file or string therefore creating "compression"
/sqt/ is that way
Just step through pi until you find your exact 10mb of data and share the position/length of that.
Due to collisions there's no guarantee that the bruteforced data is the original uncompressed data even if it matches the checksum.
No. There are 2^(10*1024*1024 - 16) = 2^10485744 different 10MB files with the same MD5 hash.
Are you sure about that
MD5 is weak to collision but that doesn't mean it's reversible.
is this some kind of troll thread?
this.
There are theoretically an infinite number of n's for MD5(n) that make a given hash Z. A hash is not lossless, ie you lose information that is unrecoverable.
No, thanks for pointing. It's actually more, 2^(8*10485744)
Collisions.
>more possibilites than the amount of atoms in the observable universe
Are you sure about that
Only off by a factor of 8.
>2^10485744 / 2^83885952 = 8
Are you sure about that
An MD5 sum is 16 bytes, or 128 bits. So, there are 2^128 different MD5 sums possible.
10MB = 10000KB = 10000000B = 80000000 bits. So, there are 2^80000000 possible 10MB files.
Therefore, on average, each MD5 sum can "decompress" to any of 2^(80000000-128) 10MB files.
are you all retarded?
this is a serious question
...
Not too sure, too tired to think it through properly
Did I get the math wrong?
There are 16^32 possible md5 hashes and 2^10485760 possible 10MB files.
I will give you a hint that there are many times the number of possible 10MB files than there are MD5 hashes.
Woops, I didn't that bytes to bits. Yeah,
He seems to be off a little bit.
number of 10MB files
2^(8*10*2^20)
number of md5sums
2^(8*16)
average number of 10MB files with a specific md5sum:
2^(8*10*2^20)/2^(8*16) = 2^(8*(10*2^20 - 16))
= 256^(10*2^20-16)
I don't think you can say they're perfectly uniform. One md5sum value might have more than another by a percent or two. The odds of even one md5sum having a unique 10MB file seem astronomical. For lossless compression to work you need for a given compressed version to map to only one uncompressed version. Reverse md5sum maps to a gorillion.
Hashing and compression are dramatically different things. Any similarities you see in them are aesthetic; the Mathematics behind a good hashing algorithm and a good compression algorithm are share almost nothing in common.
It would be more practical to divide the uncompressed data into smaller blocks (much smaller than 10MB, e.g. 256 bytes or so) and perform such a conversion on each individual block instead.
In addition, for every block, as a part of the final compressed data you would want to store key points of information; Say the original first, middle and last bytes, as well as a length value (for odd blocks) would be stored alongside the resulting hash to better confirm a match to the original data when reversed.
Sure, you wouldn't achieve 10485760:32 compression ratios like OP's scenario, but ~256:40 would still be pretty high compared to many cases of what modern compression algorithms can achieve.
>10485760:16 / 256:24*
I forgot to factor in storing the hash as byte data instead of text.
What you fail to realize is that there is no point at which it becomes viable.
No you forgot to factor in the fact that if you map m items to n items and n < m you lose information.
The vast majority of possible 10MB files map to ~10MB files when compressed using lossless compression algorithms.