Procedural Compression

Question

Procedural Compression

Josiah Martinez

Is there such thing as procedural file compression?
Like a file being compressed, then the compressed file being compressed and so on until you have compressed archive taking very little space but remaining lossless.

Sorry if the question is retarded, I don't have knowledge on the subject, it's just my curiosity talking.

September 14, 2016 - 17:13

Other urls found in this thread:

en.wikipedia.org/wiki/Entropy_(information_theory)
en.wikipedia.org/wiki/KGB_Archiver
en.wikipedia.org/wiki/Pigeonhole_principle
mattmahoney.net/dc/barf.html
github.com/Cyan4973/FiniteStateEntropy,
arxiv.org/abs/1311.2540
twitter.com/AnonBabble

Jace Wood

You should read up on how file compression works, 30 seconds of reading will explain why this simply wouldn't work.
File compression typically works by truncating redundant data. For instance, say there are 8 identical bytes in a row in a file. Compression will represent those bytes as 8*[byte] instead of taking up the space to store them all in a row.
So theoretically once that type of compression is done once, there is no more redundant data in the file to be compressed.

September 14, 2016 - 17:21

Jose Jones

Ok thanks for the answer.
Curious if this will still be the same with quantum computing thought.

September 14, 2016 - 17:25

Asher Wood

Nice trips Satan.

September 14, 2016 - 17:29

Elijah Campbell

Yes, but only if each stage uses different compression algorithms that compress the information in ways the others don't. Is it used today? Sure. See: .tar.gz, tar.bz2

September 14, 2016 - 17:42

Henry Roberts

Yes there is, see >56609799. Sometimes two compression algorithms work well together.

Although this is not completely true, lookup zopfliPNG, which allows you to specify the number of runs (the PNG gets minimally smaller till a certain point).

September 14, 2016 - 18:29

Colton Green

>implying tar does any compression

September 14, 2016 - 18:34

Hudson Gonzalez

That's not what Huffman compression does.

September 14, 2016 - 18:36

Blake Kelly

how about having an algorithm that maps from an imput number to all possible sequences of bytes and then bruteforcing that number?

September 14, 2016 - 18:37

Josiah Green

in some cases combining compression algorithms can give reduced filesize OP, but in TYOOL2016 this is often not relevant because the cpu tradeoff is not worth the bandwidth/memory/diskspace savings & any longterm/infrequent-access storage will use well optimized compression algorithms tailored to the datatypes to be stored.

he's just giving an example to illustrate a point, not a document specifying how to implement a compression algorithm.

September 14, 2016 - 18:40

Brayden Lee

>you will never be a compressed file

September 14, 2016 - 19:28

Angel Watson

Well, you could just compress everything by searching through the digits of pi until you find the sequence that represent it and pointing to that location. I don't think it would do any good to compress it further.

September 14, 2016 - 19:32

Brayden Ross

>hash your movie
>randomly generate information until a video file with the same hash is created
Imagine a world with unlimited computing power.

September 14, 2016 - 19:32

Christopher Ortiz

...

September 14, 2016 - 19:37

Nolan Sanchez

If it were possible to compute hashes that quickly then you would end up with multiple possible "movies" as a result of hash collision.

September 14, 2016 - 19:48

Thomas Powell

>Is it used today? Sure. See: .tar.gz, tar.bz2
Do you even know what tar is?

September 14, 2016 - 20:15

Tyler Brown

>get CP instead

September 14, 2016 - 20:17

Brayden Long

Dumbass.

September 14, 2016 - 20:30

Ian Harris

There is an infinitely large amount of files resulting in the same hash, and even if you had computing capacity to generate them that fast, you'd still need human intervention to evaluate results.

September 14, 2016 - 20:32

Zachary Rogers

It actually hasn't been proven that you can find arbitrary sequences in the decimals of pi, it's just a conjecture. Either way, it wouldn't do you any good as the position would require at least as much information as the sequence on average.

September 14, 2016 - 20:55

Austin Lopez

You could not. In the best case, the average size of the index number will equal the size of the data you're trying to compress.

If you're trying to compress arbitrary data, this is actually true of all compression algorithms. See "pigeonhole principle". The only reason compression works at all is because we don't compress arbitrary data. Compression algorithms exploit redundancy in compressible data, and if there's no redundancy we don't bother compressing it because it would unavoidably make it bigger. Indexing pi doesn't have this redundancy detection step, so it's completely useless.

September 14, 2016 - 20:58

Owen Long

Actually, that's the problem with his idea. pi is an infinitely continuing random string of numbers.

It isn't binary, so each single digit contains too much data for a bit, and not enough for the full range of a byte. You'd have to perform some kind of designated conversion to make up your bytes from the output of pi.

September 14, 2016 - 21:04

Gavin Collins

Pi is a number. You can represent it in whatever base you like.

September 14, 2016 - 21:06

Nathan Allen

once you reach an arbitrary amount of entropy compression algo's can't reduce the amount of space taken by the file without cheating.

en.wikipedia.org/wiki/Entropy_(information_theory)

September 14, 2016 - 21:09

Cooper Turner

Check this out
en.wikipedia.org/wiki/KGB_Archiver

September 14, 2016 - 21:17

Xavier White

You mean something extremely convoluted like converting from decimal to binary?

September 14, 2016 - 21:50

Jack Bennett

That's why it's called Huffman coding and not Huffman compression.

September 14, 2016 - 23:00

Noah Williams

Sup Forums has too long been infested with Linux ricer kiddies. It's time to purge them once and for all.

Any of the following regularly occurring threads on Sup Forums is either directly or indirectly about Linux ricing or autism:

* Intel vs. AMD threads
* GPU threads
* Various types of rig/battlestation/guts/Speccy threads ("What parts should I buy, /g?" "Is this part good, Sup Forums?" "Check out my specs, Sup Forums.")
* All Linux discussion and support threads (obviously)(which distro to use, how to rice, which anime child porn to use)

These threads only serve to discuss products for doing pointless stuff. They have nothing to do with programming, networking, security, privacy, freedom, design, usability, or anything else Sup Forums related.

These threads and the people who post within them belong in either or as they are the appropriate boards in which to discuss their hobby, along with "distro war" and "gentoo vs. arch" threads. This should be Rule #4 for Sup Forums and it should be aggressively enforced until the manchildren finally get it and move on.

This is the thread to petition Hiroshimoot to cure the /mlp/ cancer in Sup Forums. If you really care about technology and not just vapid consumerism, help by adding your voice to this discussion.

We can do it, Sup Forums.

----

lincucks don't care about technology in general.

All they care about is the rice, not the method. Pony pictures is and the method is .

Products are GPU shitstorms, speccy threads, guts threads, casing threads etc.

A typical lincuck's perception of judging a GPU is nothing because lincuck has no games or programs.

None of the ricer lincuckolds are interested in the method or process of making games. Game is not technology, programming games is.

Ricing is not technology and neither is discussing which build will look the best with anime theme.

Lincuckolds do not care about privacy and cryptology. They are more invested in anime child porn. fbfb

September 14, 2016 - 23:02

Cooper Cooper

Jesus Christ, stop posting you fucking retard

September 14, 2016 - 23:04

Isaiah Watson

This is pretty much how content-addressable P2P networks work. You throw away the file and store only its hash to use it later for retrieving the file contents. Imagine P2P distributed storage were ubiquitous. Then it would be possible to encode any file as a sequence of hashes of the (sufficiently) blocks that make up the file and decompression would consist of retrieving the corresponding data blocks.

September 14, 2016 - 23:10

Angel Jenkins

What if there was a server offsite which had a precompiled list of pi to an irrationally large digit.

And the way you would search this information is by going to the decimal point where said numbers are located in the precompiled database then count upwards from said locaion.

something something cloud computing and bruteforcing 1,000,000 decimal places.

September 14, 2016 - 23:11

Carter Collins

* sufficiently large

September 14, 2016 - 23:11

Ryan Murphy

Binary -> Octal -> Hex

Is probably the closest representation to what you are thinking of.

September 14, 2016 - 23:15

Carson Hall

You didn't read his post, or at least you didn't understand it. He even told you what to look up in order to discover why this wouldn't work. Here's the wikipedia article about the principle that he talked about
en.wikipedia.org/wiki/Pigeonhole_principle

I want you to read that article, and if there is something you do not understand, then read the wikipedia article about that thing as well, until you finally understand it. If you aren't willing to exert yourself committing to this mental labor then I suggest you keep your uneducated opinion on this subject to yourself.

September 14, 2016 - 23:33

Ethan Morales

It is a fact of maths that every single piece of data that can exist exists in Pi somewhere.

However! Any reasonable file, useful file hasn't been read from it yet.
The search-space for common files like pictures and video is a trillion times further out than anything we have calculated yet.
People really underestimate how fucking much you need to brute force to get a reasonably complex password, never mind a full file.

Until we can brute-force passwords in seconds, PiFS won't be useful for common file storage or even archiving.
By the time you read out your "archived" file, your quadrillionth ancestor will have just been born.
>Happy birthday kid, your dads tranny porn was found in Pi!

September 15, 2016 - 02:35

Easton Jones

Technically, a 0 byte file is a compressed version of any file if you allow it infinite time to decompress.

September 15, 2016 - 04:43

Juan Anderson

one big problem which he mentioned, is that storing just the /offset/ into pi may well require more space than the data you intend to extract
so even if you had a galaxy-sized computer to find your offset within a single human lifetime, it /still/ might not actually save you anything

September 15, 2016 - 04:50

Lucas Turner

That would be super rare. All you need is the start/end point and numbers are SUPER compressible. Since you can do like, 100 billion digits in x 5 + 500k x 2 etc.

The real limitation is calculating pi and finding arbitrary sequences at will.

September 15, 2016 - 05:00

Gavin Torres

>numbers are SUPER compressible.
all files can be represented as a number, your argument is invalid

September 15, 2016 - 05:03

Gavin Perez

Check this shit out: mattmahoney.net/dc/barf.html
Hope you don't mind waiting a year to compress a GB.

September 15, 2016 - 06:23

Jackson Davis

All practical compression algorithms already use one or more functions to transform the incoming data into some form that's easier to process, and then use an entropy coding technique (such as Huffman, range/arithmetic coding, ANS/Finite State Entropy coding[1] or - for some distributions - things like Golomb-Rice coding) to encode the residuals from that.

Some of these functions look for repetitions, like lz77 (deflate) and lz78 (LZW).

Some transpose the data, like the Burrows-Wheeler Transform used by bzip2.

Some perform other simple reversible transforms - the commonly-used BCJ/BCJ2 transforms on executable code essentially do de-relocation, looking for branches and jmps, so that branches to the same location produce the same byte sequence (making them easier to pack with a subsequent stage).

Some use linear predictions, such as good old delta compression for samples (FLAC uses a linear predictor plus Golomb-Rice coding). PNG, too, has several linear filters with various simple 2D matrices - depending on the data, the use of the right one can really improve the compression.

Some use more advanced prediction techniques (context modelling being the overarching name for the strongest general-purpose compressors, although they are very slow - see the PAQ family - or PPMd, for text or source code).

Depends what you want.

[1] github.com/Cyan4973/FiniteStateEntropy, arxiv.org/abs/1311.2540

September 15, 2016 - 06:33

1 2 ... 5 Next

Procedural Compression

Last threads