Procedural Compression

Is there such thing as procedural file compression?
Like a file being compressed, then the compressed file being compressed and so on until you have compressed archive taking very little space but remaining lossless.

Sorry if the question is retarded, I don't have knowledge on the subject, it's just my curiosity talking.

Other urls found in this thread:

en.wikipedia.org/wiki/Entropy_(information_theory)
en.wikipedia.org/wiki/KGB_Archiver
en.wikipedia.org/wiki/Pigeonhole_principle
mattmahoney.net/dc/barf.html
github.com/Cyan4973/FiniteStateEntropy,
arxiv.org/abs/1311.2540
twitter.com/AnonBabble

You should read up on how file compression works, 30 seconds of reading will explain why this simply wouldn't work.
File compression typically works by truncating redundant data. For instance, say there are 8 identical bytes in a row in a file. Compression will represent those bytes as 8*[byte] instead of taking up the space to store them all in a row.
So theoretically once that type of compression is done once, there is no more redundant data in the file to be compressed.

Ok thanks for the answer.
Curious if this will still be the same with quantum computing thought.

Nice trips Satan.

Yes, but only if each stage uses different compression algorithms that compress the information in ways the others don't. Is it used today? Sure. See: .tar.gz, tar.bz2

Yes there is, see >56609799. Sometimes two compression algorithms work well together.

Although this is not completely true, lookup zopfliPNG, which allows you to specify the number of runs (the PNG gets minimally smaller till a certain point).

>implying tar does any compression

That's not what Huffman compression does.

how about having an algorithm that maps from an imput number to all possible sequences of bytes and then bruteforcing that number?

in some cases combining compression algorithms can give reduced filesize OP, but in TYOOL2016 this is often not relevant because the cpu tradeoff is not worth the bandwidth/memory/diskspace savings & any longterm/infrequent-access storage will use well optimized compression algorithms tailored to the datatypes to be stored.

he's just giving an example to illustrate a point, not a document specifying how to implement a compression algorithm.

>you will never be a compressed file

Well, you could just compress everything by searching through the digits of pi until you find the sequence that represent it and pointing to that location. I don't think it would do any good to compress it further.

>hash your movie
>randomly generate information until a video file with the same hash is created
Imagine a world with unlimited computing power.

...

If it were possible to compute hashes that quickly then you would end up with multiple possible "movies" as a result of hash collision.

>Is it used today? Sure. See: .tar.gz, tar.bz2
Do you even know what tar is?

>get CP instead

Dumbass.

There is an infinitely large amount of files resulting in the same hash, and even if you had computing capacity to generate them that fast, you'd still need human intervention to evaluate results.

It actually hasn't been proven that you can find arbitrary sequences in the decimals of pi, it's just a conjecture. Either way, it wouldn't do you any good as the position would require at least as much information as the sequence on average.

You could not. In the best case, the average size of the index number will equal the size of the data you're trying to compress.

If you're trying to compress arbitrary data, this is actually true of all compression algorithms. See "pigeonhole principle". The only reason compression works at all is because we don't compress arbitrary data. Compression algorithms exploit redundancy in compressible data, and if there's no redundancy we don't bother compressing it because it would unavoidably make it bigger. Indexing pi doesn't have this redundancy detection step, so it's completely useless.

Actually, that's the problem with his idea. pi is an infinitely continuing random string of numbers.

It isn't binary, so each single digit contains too much data for a bit, and not enough for the full range of a byte. You'd have to perform some kind of designated conversion to make up your bytes from the output of pi.

Pi is a number. You can represent it in whatever base you like.

once you reach an arbitrary amount of entropy compression algo's can't reduce the amount of space taken by the file without cheating.

en.wikipedia.org/wiki/Entropy_(information_theory)

Check this out
en.wikipedia.org/wiki/KGB_Archiver

You mean something extremely convoluted like converting from decimal to binary?

That's why it's called Huffman coding and not Huffman compression.

Sup Forums has too long been infested with Linux ricer kiddies. It's time to purge them once and for all.

Any of the following regularly occurring threads on Sup Forums is either directly or indirectly about Linux ricing or autism:

* Intel vs. AMD threads
* GPU threads
* Various types of rig/battlestation/guts/Speccy threads ("What parts should I buy, /g?" "Is this part good, Sup Forums?" "Check out my specs, Sup Forums.")
* All Linux discussion and support threads (obviously)(which distro to use, how to rice, which anime child porn to use)

These threads only serve to discuss products for doing pointless stuff. They have nothing to do with programming, networking, security, privacy, freedom, design, usability, or anything else Sup Forums related.

These threads and the people who post within them belong in either or as they are the appropriate boards in which to discuss their hobby, along with "distro war" and "gentoo vs. arch" threads. This should be Rule #4 for Sup Forums and it should be aggressively enforced until the manchildren finally get it and move on.

This is the thread to petition Hiroshimoot to cure the /mlp/ cancer in Sup Forums. If you really care about technology and not just vapid consumerism, help by adding your voice to this discussion.

We can do it, Sup Forums.


----


lincucks don't care about technology in general.

All they care about is the rice, not the method. Pony pictures is and the method is .

Products are GPU shitstorms, speccy threads, guts threads, casing threads etc.

A typical lincuck's perception of judging a GPU is nothing because lincuck has no games or programs.

None of the ricer lincuckolds are interested in the method or process of making games. Game is not technology, programming games is.

Ricing is not technology and neither is discussing which build will look the best with anime theme.

Lincuckolds do not care about privacy and cryptology. They are more invested in anime child porn. fbfb

Jesus Christ, stop posting you fucking retard

This is pretty much how content-addressable P2P networks work. You throw away the file and store only its hash to use it later for retrieving the file contents. Imagine P2P distributed storage were ubiquitous. Then it would be possible to encode any file as a sequence of hashes of the (sufficiently) blocks that make up the file and decompression would consist of retrieving the corresponding data blocks.

What if there was a server offsite which had a precompiled list of pi to an irrationally large digit.

And the way you would search this information is by going to the decimal point where said numbers are located in the precompiled database then count upwards from said locaion.

something something cloud computing and bruteforcing 1,000,000 decimal places.

* sufficiently large

Binary -> Octal -> Hex

Is probably the closest representation to what you are thinking of.

You didn't read his post, or at least you didn't understand it. He even told you what to look up in order to discover why this wouldn't work. Here's the wikipedia article about the principle that he talked about
en.wikipedia.org/wiki/Pigeonhole_principle

I want you to read that article, and if there is something you do not understand, then read the wikipedia article about that thing as well, until you finally understand it. If you aren't willing to exert yourself committing to this mental labor then I suggest you keep your uneducated opinion on this subject to yourself.

It is a fact of maths that every single piece of data that can exist exists in Pi somewhere.

However! Any reasonable file, useful file hasn't been read from it yet.
The search-space for common files like pictures and video is a trillion times further out than anything we have calculated yet.
People really underestimate how fucking much you need to brute force to get a reasonably complex password, never mind a full file.

Until we can brute-force passwords in seconds, PiFS won't be useful for common file storage or even archiving.
By the time you read out your "archived" file, your quadrillionth ancestor will have just been born.
>Happy birthday kid, your dads tranny porn was found in Pi!

Technically, a 0 byte file is a compressed version of any file if you allow it infinite time to decompress.

one big problem which he mentioned, is that storing just the /offset/ into pi may well require more space than the data you intend to extract
so even if you had a galaxy-sized computer to find your offset within a single human lifetime, it /still/ might not actually save you anything

That would be super rare. All you need is the start/end point and numbers are SUPER compressible. Since you can do like, 100 billion digits in x 5 + 500k x 2 etc.

The real limitation is calculating pi and finding arbitrary sequences at will.

>numbers are SUPER compressible.
all files can be represented as a number, your argument is invalid

Check this shit out: mattmahoney.net/dc/barf.html
Hope you don't mind waiting a year to compress a GB.

All practical compression algorithms already use one or more functions to transform the incoming data into some form that's easier to process, and then use an entropy coding technique (such as Huffman, range/arithmetic coding, ANS/Finite State Entropy coding[1] or - for some distributions - things like Golomb-Rice coding) to encode the residuals from that.

Some of these functions look for repetitions, like lz77 (deflate) and lz78 (LZW).

Some transpose the data, like the Burrows-Wheeler Transform used by bzip2.

Some perform other simple reversible transforms - the commonly-used BCJ/BCJ2 transforms on executable code essentially do de-relocation, looking for branches and jmps, so that branches to the same location produce the same byte sequence (making them easier to pack with a subsequent stage).

Some use linear predictions, such as good old delta compression for samples (FLAC uses a linear predictor plus Golomb-Rice coding). PNG, too, has several linear filters with various simple 2D matrices - depending on the data, the use of the right one can really improve the compression.

Some use more advanced prediction techniques (context modelling being the overarching name for the strongest general-purpose compressors, although they are very slow - see the PAQ family - or PPMd, for text or source code).

Depends what you want.

[1] github.com/Cyan4973/FiniteStateEntropy, arxiv.org/abs/1311.2540