If all files can be expressed in binary...

If all files can be expressed in binary, couldn't you find the base [arbitrarily large number] equivalent for compression purposes. Like find whatever the numerical value of the file is in binary and then convert that to base 400 or something. As long as the characters used for the base are of equal size and there's enough of them, you could compress large files down a lot. Idk if this is mathematically not feasible once you get up to the point where you're doing 2^1000000, it was just a thought.

Other urls found in this thread:

en.wikipedia.org/wiki/Pigeonhole_principle
en.wikipedia.org/wiki/Natural_logarithm
cleavebooks.co.uk/scol/calnumba.htm
twitter.com/AnonBabble

Most retarded thing I've read all day

>phoneposter is a retard
sasuga

...

en.wikipedia.org/wiki/Pigeonhole_principle

Thanks for actually showing me why I'm retarded instead of just calling me retarded

>using the 21000000 with the carat nose

>all words are just made up of the same 26 characters
>why not give each word a unique character
>Print a book in the size of a pamphlet
>Go the opposite direction for speech by talking quickly and reducing all phonemes into ching chong ping pong

>typical cs major

If humans could "memorize" things as fast as computers, I wouldn't see a problem with this

what you just described is a hash. (though not a very good one). you can convert a file to a hash, but you can't convert a hash back to a file because information is lost. they have plenty of uses, but compression they are not.

Troll science thread?

I asked this same thing and Sup Forums was a faggot about it (no surprise there). My understanding was that it couldn't be used due to overflow error (would need a 10,000 bit architecture to uncompress a file with 10k bits), and a for x^y, both x and y would need to be ints, not floats

There is a theory that the fastest number system computationally would be the base e number system (as in the natural number e).

Why don't you try to figure out what the base e number system?

I've only been actively pursuing information about CS for a few months now, so this may be completely off base, but doesn't each number have a unique binary equivalent? For example, (going from decimal to binary) two is 10, ten is 1010, etc. So my thought process was basically "kilobyte size files would have decimal equivalents on the order of 2^1000, megabyte on the order of 2^1000000, etc. That'll produce a huge number in decimal, since there's a ton of characters that take up the same space as numbers (0-9) you could pretty easily express that number in a higher base. Like how 255 in decimal is ff in hexadecimal. A character was 'saved' in that conversion, but no data was lost because it's understood that ff is in hexadecimal." I'm guessing that there is something that makes binary numbers non-unique when converting to other bases (maybe that's what the Pidgeonhole article was saying, a lot of that went over my head) I just can't find what it is. Can you someone please explain this to me like I'm retarded, as I very well may be.

Wait, why doesn't this work?

The file contents are 0 bytes, but each character used in the file name is a byte

What do you mean it doesn't work?

What category does converting between bases fall under? For example, 255 in decimal is 11111111 in binary and ff in hexadecimal. No data is lost if one is used over another, but wouldn't the letters "ff" with some signifier that you're using hexadecimal (the relative size of which would become insignificant as the file size grew overall) take up less space than the other two? My understanding of hashes is that it's theoretically possible for two different files to produce the same hash, just mathematically unlikely given the new standards (SHA256 and SHA512). I'm still not grasping why the binary conversions aren't unique

...

An image is an array of pixels which are just numbers. So you can convert any image to a number and vice-versa and generate any image imaginable. Why not store videos as sequences of numbers of their individual frames? And you could use number theory to come up with a formula that generates all those numbers. A youtube video could be stored as a formula written on the back of a napkin.

Does this makes it clearer?

Maybe I'm making this too complicated for myself so other people can't even see all the levels of retardation present. Basically why can't you get the base 1000 output of a file, store that in another file that has a small header which basically says "hey, this is a compressed file that's in base 1000. The original file was of type [whatever] and here's any other information you'll need to know about it" so a program like 7zip could convert back to binary when you wanted to view the original file.

I'll make the logo!

Holy shit! Why aren't we funding this??!?!?

t. Ajit Pai

How can you convert it back? The system can't perform arithmetic on a number that large, and so can't decompress it.

Binary is binary, we use it as it is the most 'noise tolerant'; on or off, as opposed to something with n numbers of inbetween, which is less reliable as its difficult to maintain some partial voltage. Also, when charging a component to its nth voltage level, you will pass though all the other voltage levels, which is not desirable and also unreliable.

When Windows reports that a file is 0 bytes, it is not counting the size of internal data structures that are stored on disk, which are there regardless of the file size.

I gotta learn this shit somehow

congrats, you just invented moonrunes

So I'm gonna guess that you're shitposting™, but someone please explain why this is wrong in a way that a brain dead four year old can understand. Colorful pictures help

Yeah, your just ignoring all other helpful posts that have explain this in this thread

So the problem is a limit on how good our technology is, not a conceptual flaw (disregarding the technology thing as a conceptual flaw)?

Pretty much this.
Even the joke posts have some useful information in them.

>numbers: 0 1 2 5 17 18
>formulas: f = x (frames 1 and 2), f = 5 (frame 3), f = 17+x (frame 4 and 5)
guess what's the problem with this

Underrated

I understand the computer architecture part, but what do you mean by "x^y, both x and y need to be ints not floats". I'm currently taking a C programming class, so I know that floats allow decimal points and ints can be signed or unsigned, but can you connect the dots for me on how that relates to this?

this has nothing to do with C language types you idiots

Ternary is the most compact representation possible. The actual value is something like 2.6, so binary isn't far off. You're just thinking about it in one dimension

2^2 = 4
2^3 = 8
2^4 = 16

base 2 can only represent 4 different values in 2 bits. Base 4 can represent 16.

But you have to have 4 distinct bit values to represent it. 2^4 or 4^2, you end up with a grid of 8.

3 ^ 3 = 27 distinct numbers in 3 bits, but the grid is only 9. One more than 8, but 9 more values represented.

Get it? If you have 1 bit, but it's base 10000, you have to have 10000 symbols in a lookup table somewhere.

3 is the most optimum integer base for compact storage.

I've read through all of them, and I get generally why it's wrong (you can't convert back from a hash, not enough computational power, computer architecture isn't adequate) I'm just failing to connect that back to what I see as basically a math problem. There was an user who mentioned something about describing arrays of pixels used in images as numbers, and even though he was joking I still can't see what the problem is. I know how hashes work, this just seems fundamentally different to me

>9 more

derp
I meant 11

17 + 5 evaluates to 22, not 18?

yes, except try base 2^#of bits in your file
and then you have to represent that number with less than 2^#of bits in your file, otherwise you've compressed nothing

So is this a speed problem or a storage problem? I get what you're saying, but pretend you have a file where you don't care how long it takes to extract it so long as the information exists. Would base 10000 be better then? The lookup table is a constant, so it's relative size will decrease as you add more and more files to the archive.

it's neither a speed problem nor a storage problem, it's an information theory problem
your premise is wrong

22x3 = 66
1x10000 = 10000

3 is optimal

What does 9843 even look like in base 10000?

Can you throw out some terms for me to look up so I can stop being a brainlet

I was picturing random ass symbols. Isn't there a standard for what constitutes a byte size character? I'm gonna guess UTF-8 based off the name, so just use all of those for base 10000.

> what is arbitrary-precision arithmetic

Whatever formula you could generate to do this would be larger then just storing the video frames directly, or would take an inordinate amount of time to generate

Has it been proven that this will always be the case? Is that like an "upper boundary" on computational power?

Natural log is optimum
2.71 something
3 is the closest integer
en.wikipedia.org/wiki/Natural_logarithm

>people actually taking this retarded idea seriously
the absolute state of Sup Forums

...

/sci/ is probably smarter than Sup Forums and wouldn't fall for this
The thread should have ended here

Assuming OP is just trolling, then yeah, of course.
But if they genuinely don't get it, /sci/ is probably a better place to ask this.

cleavebooks.co.uk/scol/calnumba.htm

10000

2x14=28
3x9=27
4x7=28
5x6=30
6x6=36
7x5=35
8x5=40
9x5=45
10x5=50
11x4=44

>3 IS OPTIMAL

Thank you, I genuinely appreciate that. Once in a while I'll have some retarded thought like this and I only bother asking Sup Forums cause I know a few people will help me learn something

inodes or MFTs

hey go for it. if you succeed idk what will happen exactly but it'll probably make you rich

You just lead me to the birthday paradox. Ty. Interesting stuff.

In a room of 23 people theres a 50-50 chance that people share a birthday.

Sweet, I'm gonna do this with unicode symbols..

This post is pretty retarded but I'll try and explain it to you.
Computers only work in binary. There's no such thing as a "higher base" because computers only work in base 2. When you see hex or octal numbers, they're still stored internally as binary.
Whether you display 15 as hex F or binary 1111, it still takes up 4 bits of space.
The only reason higher bases exist are because they're quicker for humans to interact with.
There is a way to store big numbers without having to store a massive binary number though. You can store numbers as a base and power instead. Look up floating point numbers.

>convert n-bit binary file to base 1000
>file now needs log_1000(2^n) base 1000 digits to be represented
>each digit takes up log_2(1000) bits of space
>file takes up log_2(1000)*log_1000(2^n)=log_2(1000)*(log_2(2^n)/log_2(1000))=log_2(2^n)=n bits

Your intuition is tricking you. When someone says "number", you think of a number with like 20 digits, when the numbers corresponding to the individual frames would have hundreds of thousands if not millions of digits, ultimately taking up as much space as the origianl images.

Care to explain how you would make a base system based in a transcendental number?

Jesus this thread