If all files can be expressed in binary...

Question

If all files can be expressed in binary...

Blake Diaz

If all files can be expressed in binary, couldn't you find the base [arbitrarily large number] equivalent for compression purposes. Like find whatever the numerical value of the file is in binary and then convert that to base 400 or something. As long as the characters used for the base are of equal size and there's enough of them, you could compress large files down a lot. Idk if this is mathematically not feasible once you get up to the point where you're doing 2^1000000, it was just a thought.

December 21, 2017 - 05:55

Other urls found in this thread:

en.wikipedia.org/wiki/Pigeonhole_principle
en.wikipedia.org/wiki/Natural_logarithm
cleavebooks.co.uk/scol/calnumba.htm
twitter.com/AnonBabble

Cameron Reyes

Most retarded thing I've read all day

December 21, 2017 - 06:03

Luke Gray

>phoneposter is a retard
sasuga

December 21, 2017 - 06:04

Christian Fisher

...

December 21, 2017 - 06:04

Ryder Harris

en.wikipedia.org/wiki/Pigeonhole_principle

December 21, 2017 - 06:09

Kevin Cooper

Thanks for actually showing me why I'm retarded instead of just calling me retarded

December 21, 2017 - 06:24

Gavin Martinez

>using the 21000000 with the carat nose

December 21, 2017 - 07:13

Joseph Taylor

>all words are just made up of the same 26 characters
>why not give each word a unique character
>Print a book in the size of a pamphlet
>Go the opposite direction for speech by talking quickly and reducing all phonemes into ching chong ping pong

December 21, 2017 - 08:26

Dylan Powell

>typical cs major

December 21, 2017 - 08:40

Eli Cook

If humans could "memorize" things as fast as computers, I wouldn't see a problem with this

December 21, 2017 - 08:44

Joseph Flores

what you just described is a hash. (though not a very good one). you can convert a file to a hash, but you can't convert a hash back to a file because information is lost. they have plenty of uses, but compression they are not.

December 21, 2017 - 08:44

Ryder Lewis

Troll science thread?

December 21, 2017 - 08:47

Angel Cruz

I asked this same thing and Sup Forums was a faggot about it (no surprise there). My understanding was that it couldn't be used due to overflow error (would need a 10,000 bit architecture to uncompress a file with 10k bits), and a for x^y, both x and y would need to be ints, not floats

December 21, 2017 - 08:48

Austin Brown

There is a theory that the fastest number system computationally would be the base e number system (as in the natural number e).

Why don't you try to figure out what the base e number system?

December 21, 2017 - 08:54

Juan Garcia

I've only been actively pursuing information about CS for a few months now, so this may be completely off base, but doesn't each number have a unique binary equivalent? For example, (going from decimal to binary) two is 10, ten is 1010, etc. So my thought process was basically "kilobyte size files would have decimal equivalents on the order of 2^1000, megabyte on the order of 2^1000000, etc. That'll produce a huge number in decimal, since there's a ton of characters that take up the same space as numbers (0-9) you could pretty easily express that number in a higher base. Like how 255 in decimal is ff in hexadecimal. A character was 'saved' in that conversion, but no data was lost because it's understood that ff is in hexadecimal." I'm guessing that there is something that makes binary numbers non-unique when converting to other bases (maybe that's what the Pidgeonhole article was saying, a lot of that went over my head) I just can't find what it is. Can you someone please explain this to me like I'm retarded, as I very well may be.

December 21, 2017 - 09:02

Joshua Bennett

Wait, why doesn't this work?

December 21, 2017 - 09:05

Lucas King

The file contents are 0 bytes, but each character used in the file name is a byte

December 21, 2017 - 09:08

Cooper Hall

What do you mean it doesn't work?

December 21, 2017 - 09:11

Henry Bennett

What category does converting between bases fall under? For example, 255 in decimal is 11111111 in binary and ff in hexadecimal. No data is lost if one is used over another, but wouldn't the letters "ff" with some signifier that you're using hexadecimal (the relative size of which would become insignificant as the file size grew overall) take up less space than the other two? My understanding of hashes is that it's theoretically possible for two different files to produce the same hash, just mathematically unlikely given the new standards (SHA256 and SHA512). I'm still not grasping why the binary conversions aren't unique

December 21, 2017 - 09:12

Daniel Gomez

...

December 21, 2017 - 09:14

Samuel Mitchell

An image is an array of pixels which are just numbers. So you can convert any image to a number and vice-versa and generate any image imaginable. Why not store videos as sequences of numbers of their individual frames? And you could use number theory to come up with a formula that generates all those numbers. A youtube video could be stored as a formula written on the back of a napkin.

December 21, 2017 - 09:16

Noah Hughes

Does this makes it clearer?

December 21, 2017 - 09:16

Julian Phillips

Maybe I'm making this too complicated for myself so other people can't even see all the levels of retardation present. Basically why can't you get the base 1000 output of a file, store that in another file that has a small header which basically says "hey, this is a compressed file that's in base 1000. The original file was of type [whatever] and here's any other information you'll need to know about it" so a program like 7zip could convert back to binary when you wanted to view the original file.

December 21, 2017 - 09:19

John Powell

I'll make the logo!

December 21, 2017 - 09:20

Jacob Long

Holy shit! Why aren't we funding this??!?!?

December 21, 2017 - 09:20

Ryder Kelly

t. Ajit Pai

December 21, 2017 - 09:20

Adam James

How can you convert it back? The system can't perform arithmetic on a number that large, and so can't decompress it.

December 21, 2017 - 09:21

Henry Harris

Binary is binary, we use it as it is the most 'noise tolerant'; on or off, as opposed to something with n numbers of inbetween, which is less reliable as its difficult to maintain some partial voltage. Also, when charging a component to its nth voltage level, you will pass though all the other voltage levels, which is not desirable and also unreliable.

December 21, 2017 - 09:21

William Ross

When Windows reports that a file is 0 bytes, it is not counting the size of internal data structures that are stored on disk, which are there regardless of the file size.

December 21, 2017 - 09:22

Carter Sanders

I gotta learn this shit somehow

December 21, 2017 - 09:23

Robert Green

congrats, you just invented moonrunes

December 21, 2017 - 09:24

Ian Morgan

So I'm gonna guess that you're shitposting™, but someone please explain why this is wrong in a way that a brain dead four year old can understand. Colorful pictures help

December 21, 2017 - 09:25

Leo Myers

Yeah, your just ignoring all other helpful posts that have explain this in this thread

December 21, 2017 - 09:27

Ethan Miller

So the problem is a limit on how good our technology is, not a conceptual flaw (disregarding the technology thing as a conceptual flaw)?

December 21, 2017 - 09:28

Brandon Perry

Pretty much this.
Even the joke posts have some useful information in them.

December 21, 2017 - 09:28

Robert Young

>numbers: 0 1 2 5 17 18
>formulas: f = x (frames 1 and 2), f = 5 (frame 3), f = 17+x (frame 4 and 5)
guess what's the problem with this

December 21, 2017 - 09:31

Thomas Young

Underrated

December 21, 2017 - 09:32

Jack Torres

I understand the computer architecture part, but what do you mean by "x^y, both x and y need to be ints not floats". I'm currently taking a C programming class, so I know that floats allow decimal points and ints can be signed or unsigned, but can you connect the dots for me on how that relates to this?

December 21, 2017 - 09:33

Andrew Wright

this has nothing to do with C language types you idiots

December 21, 2017 - 09:34

Josiah Adams

Ternary is the most compact representation possible. The actual value is something like 2.6, so binary isn't far off. You're just thinking about it in one dimension

2^2 = 4
2^3 = 8
2^4 = 16

base 2 can only represent 4 different values in 2 bits. Base 4 can represent 16.

But you have to have 4 distinct bit values to represent it. 2^4 or 4^2, you end up with a grid of 8.

3 ^ 3 = 27 distinct numbers in 3 bits, but the grid is only 9. One more than 8, but 9 more values represented.

Get it? If you have 1 bit, but it's base 10000, you have to have 10000 symbols in a lookup table somewhere.

3 is the most optimum integer base for compact storage.

December 21, 2017 - 09:37

Hudson Adams

I've read through all of them, and I get generally why it's wrong (you can't convert back from a hash, not enough computational power, computer architecture isn't adequate) I'm just failing to connect that back to what I see as basically a math problem. There was an user who mentioned something about describing arrays of pixels used in images as numbers, and even though he was joking I still can't see what the problem is. I know how hashes work, this just seems fundamentally different to me

December 21, 2017 - 09:38

Justin Green

>9 more

derp
I meant 11

December 21, 2017 - 09:39

Jason Harris

17 + 5 evaluates to 22, not 18?

December 21, 2017 - 09:41

Aaron Edwards

yes, except try base 2^#of bits in your file
and then you have to represent that number with less than 2^#of bits in your file, otherwise you've compressed nothing

December 21, 2017 - 09:44

John Bell

So is this a speed problem or a storage problem? I get what you're saying, but pretend you have a file where you don't care how long it takes to extract it so long as the information exists. Would base 10000 be better then? The lookup table is a constant, so it's relative size will decrease as you add more and more files to the archive.

December 21, 2017 - 09:48

Jordan Sanchez

it's neither a speed problem nor a storage problem, it's an information theory problem
your premise is wrong

December 21, 2017 - 09:55

Wyatt Gomez

22x3 = 66
1x10000 = 10000

3 is optimal

What does 9843 even look like in base 10000?

December 21, 2017 - 09:57

Kayden Harris

Can you throw out some terms for me to look up so I can stop being a brainlet

December 21, 2017 - 09:57

Chase Thomas

I was picturing random ass symbols. Isn't there a standard for what constitutes a byte size character? I'm gonna guess UTF-8 based off the name, so just use all of those for base 10000.

December 21, 2017 - 10:00

Parker Anderson

> what is arbitrary-precision arithmetic

December 21, 2017 - 10:02

Joshua Rogers

Whatever formula you could generate to do this would be larger then just storing the video frames directly, or would take an inordinate amount of time to generate

December 21, 2017 - 10:04

Henry Reed

Has it been proven that this will always be the case? Is that like an "upper boundary" on computational power?

December 21, 2017 - 10:07

Bentley Wood

Natural log is optimum
2.71 something
3 is the closest integer
en.wikipedia.org/wiki/Natural_logarithm

December 21, 2017 - 10:11

Owen Evans

>people actually taking this retarded idea seriously
the absolute state of Sup Forums

December 21, 2017 - 10:11

Bentley Adams

...

December 21, 2017 - 10:12

Evan Rivera

/sci/ is probably smarter than Sup Forums and wouldn't fall for this
The thread should have ended here

December 21, 2017 - 10:14

Wyatt Mitchell

Assuming OP is just trolling, then yeah, of course.
But if they genuinely don't get it, /sci/ is probably a better place to ask this.

December 21, 2017 - 10:16

Mason Roberts

cleavebooks.co.uk/scol/calnumba.htm

10000

2x14=28
3x9=27
4x7=28
5x6=30
6x6=36
7x5=35
8x5=40
9x5=45
10x5=50
11x4=44

>3 IS OPTIMAL

December 21, 2017 - 10:29

Kayden Rivera

Thank you, I genuinely appreciate that. Once in a while I'll have some retarded thought like this and I only bother asking Sup Forums cause I know a few people will help me learn something

December 21, 2017 - 10:48

Daniel Carter

inodes or MFTs

December 21, 2017 - 12:37

Connor Thomas

hey go for it. if you succeed idk what will happen exactly but it'll probably make you rich

December 21, 2017 - 14:32

Bentley Jackson

You just lead me to the birthday paradox. Ty. Interesting stuff.

In a room of 23 people theres a 50-50 chance that people share a birthday.

December 21, 2017 - 14:58

Carson Bennett

Sweet, I'm gonna do this with unicode symbols..

December 21, 2017 - 20:27

John Powell

This post is pretty retarded but I'll try and explain it to you.
Computers only work in binary. There's no such thing as a "higher base" because computers only work in base 2. When you see hex or octal numbers, they're still stored internally as binary.
Whether you display 15 as hex F or binary 1111, it still takes up 4 bits of space.
The only reason higher bases exist are because they're quicker for humans to interact with.
There is a way to store big numbers without having to store a massive binary number though. You can store numbers as a base and power instead. Look up floating point numbers.

December 21, 2017 - 21:38

David Anderson

>convert n-bit binary file to base 1000
>file now needs log_1000(2^n) base 1000 digits to be represented
>each digit takes up log_2(1000) bits of space
>file takes up log_2(1000)*log_1000(2^n)=log_2(1000)*(log_2(2^n)/log_2(1000))=log_2(2^n)=n bits

December 21, 2017 - 22:02

Connor Reyes

Your intuition is tricking you. When someone says "number", you think of a number with like 20 digits, when the numbers corresponding to the individual frames would have hundreds of thousands if not millions of digits, ultimately taking up as much space as the origianl images.

December 21, 2017 - 22:06

Jayden Mitchell

Care to explain how you would make a base system based in a transcendental number?

December 21, 2017 - 22:11

Nicholas Garcia

Jesus this thread

December 21, 2017 - 22:22

1 2 ... 7 Next

If all files can be expressed in binary...

Last threads