Argument about Lossless Compression

I'm currently arguing with an user about the fact that true "lossless" compression isn't actually feasible, whereas his argument is that it is and legitimately occurs.

imgur.com/a/uCfSs

Here is an imgur album of a slideshow deconstructing his argument and going through my own logic.

Discuss.

Other urls found in this thread:

youtube.com/watch?v=r6Rp-uo6HmI
strawpoll.me/15027157
imgur.com/a/Uq2ui
twitter.com/NSFWRedditGif

Stop being such a faggot.

retard

back to Sup Forums with you

ooooooooooo -> [my magic compression algorithm] -> 11o

11o -> [my magic decompression algorithm] -> ooooooooooo

Holy shit it fucking worked! Lossless compression and decompression!

what if you turn the sound wave into a vector, then the formula would easily be smaller than the actual sound wave expressed through a "raster" image of the wave.

OP you don't even know how compression works. STFU

That has nothing to do with "compression" but rather analog to digital transformation, which as you described is lossy from principle because it transforms from seemingly infinite precission to finite digital precision.

OP don't be such a mong, compression is integer based not FP based which is reserved for high precision math.

01010101 will always be 01010101 when encoded losslessly

lossless is about compression itsabout the kind of compression. lossy removes unnescary parts for compression, lossless keeps that stuff but it still cant be a complete copy because there is still stuff that can never be truly copied. like if u try to copy a cassette tape or the degradation of a cd or cable transmission just because ur fucking what.cd application lies to u doesnt mean its true for once in ur life think about it

>what is gzip?
>what is md5sum?

Analog to digital is always lossy, but digital to digital conversion is not.
And if you think otherwise you don't know shit. And this is coming from a guy with PHD.

...

This. If OP is referring to analog compression, he'd be right. That is not the case with digital.

What I am trying to say is that digital is the same exact thing, albiet represented entirely in binary instruction.
When any kind of data transfer occurs, it's the same process as sampling music into a digital stream. Except now instead of going from sampling analog data, you're sampling digital data.

What if we're thinking about it the wrong way. What if we reverse the problem and ask is there a waveform that could perfectly match this waveform that was created by the compression algorithm.

Except that's only how it works in concept
In actual execution in a computer system, those os are represented in binary code. This binary code is then fed to an algorithm, that will inherently produce an imprecise result at some point simply because it can not totally precisely measure the 11os. 11os isn't stored as 11os, it's stored as an bitstream, an integer, that can never be truly replicated.

Fucking KYS already retard.

YES
EXACTLY
A TRULY EXACT SIGNAL CAN NEVER BE CONSTRUCTED
IT WILL ALWAYS BE AN APPROXIMATION
YOU NEED TO THINK MORE LOW-LEVEL

Is there a single slide in this that doesn't blatantly show OP's lack of understanding of the difference between compression and sampling?

The point is that compression IS sampling. You can't get away from sampling because it is fundamentally how it always works.

Fuck me your bait hurts.

True, though its interesting to think about "analog to digital is always lossy" - perhaps it won't always be in the distant future due to fundamental quantum being measured.

The algorithm itself can be lossless (ie what goes in comes back out the other end).

Cameras/audio equipment are not part of the compression algorithm. That is sample aliasing.

Compression isn't sampling retard. Sampling is translating analog audio to digital and compression is reducing the size of digital audio.

The act of compression itself requires sampling data.
You are literally doing the same exact thing, you are reading and interpreting a stream of data and then generating a new stream that is smaller but derived from the larger, original stream.

It's really not, at the level of abstraction where the transformation is taking place, there exist no sampling - only bits. If you want to sperg out like a 12 year old discovering abstraction you can imagine the fluctuations in the voltages representing the bits, but trying to undermine the definition of things by throwing away the entire digital paradigm is really stupid.

I would say that there could exist a waveform that can be perfectly compressed without losing any information. This waveform would be constructed from examining the system (think system analysis here not actual hardware) that is compressing the signal and creating a waveform that isn't limited by any aspect of the system.

Can you do that in the real world. Probably not.

I know nothing about compression but even I know that you're retarded.

Right? You can tell because he made a fucking powerpoint and uploaded it to imgur. Even if he was right about what he's saying, he's still retarded.

You're retarded. Stop posting.

You're conflating two totally different ideas, do you not see why? Information theory is one topic. Transmitting codes in a noisy medium is another topic. Of course, in a real world, you don't have perfectly zero noise, but it's essentially zero for all intents and purposes for a computer logic gates.

That's my point.
It is entirely feasible in concept, on paper.
But the actual execution of the concept is not feasible in real-world logic.
Sure, you can write it all out on paper, but that's not how the computer does things.

But you admit that it's never perfectly zero, and it's always an approximation, however we choose to cut off the data at some point because the precision gets to such a point as to have no noteable effects.

it sounds like you missed 5th grade OP

let's say youve got an image with 10x10 pixels and it's just the color #000000 (black)

you could label each one of the individual pixels black if you want

pixel 1x1 is #000000
pixel 1x2 is #000000
pixel 1x3 is #000000

and so on, which would end up being more lines and thus more data

or you can write

pixels 1x1 to 10x10 are all #000000

which is a little less straightforward but is almost a hundred times shorter and results in exact same fucking thign

that's the difference between .bmp and .png

you sound like someone being difficult because they just read the wikipedia article for quantum physics and are trying to explain to your mom that i can't take the trash out because when you take the trash out to the curb it'll never TRULY be on the curb because all the atoms making up the trash bin are in superposition or some shit like holy shit op

>always an approximation

The letters I struck on my keyboard were turned into bits, sent through the internet, and then appear again on your screen as the exact same letters that I typed in. Where is the loss?

Dude, what? I think that you think you sound really deep right now, but in reality, you're not.

No, your computer would not function right now if you could not have exact integer arithmetic. End of story. Learn how logic gate work.
Floating-point calculations are closer to what you're talking about, however.

how the fuck does a zip work

I can't tell if you're baiting or legitimately serious.
First off, digital is *not* the exact same thing. The binary instruction is the representation of data. You like to make the argument that because the electrons aren't quantified the exact same number on in one bit or the other, but that's like saying "this 1101 is not the same as that 1101". It's ludacris. We're not measuring the number of electrons here, we're measuring what they represent - the binary data. Each bit is either a 1 or 0 and that is all that matters - that's the advantage of having digital.

You've been working with analog the same way and are having trouble grasping just what digital representation of data really means. Here's a hint - it is the paper in your OP.

First off
Ad hominem galore
Second off

But the thing about my atoms never touching the atoms of the curb is still technically true. It doesn't stop me from taking the trash out because I can't sense the difference, but it's still there.
This occurs with any measurement that exists in our world.

But you admit that those electrons, on the most fundamental levels, are still technically different?
So how does this not count as loss? Sure, it's incredibly small, but it's still technically there, and eventually will cause noticeable degradation.

>But the thing about my atoms never touching the atoms of the curb is still technically true
and theres the fucking problem

are you the person who unironically writes technically correct is the best kind of correct because you're not correct youre just a smartass

But I'm still right, aren't I? I'm still technically correct?
I don't care about being a smartass. I'd rather be a smartass than wrong.

>what is FLAC?

It's not a loss of the data it's a loss of the fucking electrons that aren't part of the data.

The atoms of your garbage can aren't going to actually touch the curb but that isn't the important part, it's that the garbage is on the curb exactly as it is supposed to be.

The electrons make up the binary signal don't they? If those are effect how is the binary stream not also effect?

the analog to digital converter give you dataloss. So in a sense it is lossy

Sure, the signal. But the data is not lost. Lossless.

im done

yes im posting this anime reaction image because you beat me in this argument and im in damage control mode, congrats op

The analog to digital converter is not part of your compression algorithm

FLAC is only the compression algorithm (and file format, but that's irrelevant).

>You are literally doing the same exact thing
Digital to digital compression is not the same thing as sampling analog data to digital. It's a conversion. Compression is a certain type of conversion.
>you are reading and interpreting a stream of data and then generating a new stream that is smaller but derived from the larger, original stream

In digital compression (gzip, bzip2, xz, etc) you can generate a representation of the original data that maps directly back to the original by simply restructuring the bits. The LZ77 algorithm and huffman coding are direct examples of this.

buy two cds
rip both lossless
they have different file sizes
answer that dummies

It sounds like you just learned what floating point numbers are and thought it applies to everything.

the question is about is lossless compression feasible while I assume means that you have to include it as a system acting upon the signal, especially because the imgur slides suggest that rounding error and instrument inaccuracy discounts the use of typical compression algorithms.

No error correction on audio CDs.

YOUR ORIGINAL QUESTION WAS ABOUT DATA FORMAT COMPRESSION, LOSSY VS LOSSLESS. Not about analog-digital conversion. Not your bullshit 5-year-old philosophical thoughts of "well technically it's a non-zero chance of the computer exploding or the memory getting erased" or something.

It has been clearly shown that there are lossless compression schemes. No, it's very clear what the "11" in binary means. There's no tricks here.

I'll give an even simpler example. You have one almost-black pixel, of value 1.
In 32-bits that would be represented as 0 b 0000 0000 0000 0000 0000 0000 0000 0001.
Clearly, we don't need that many bits to represent equivalent data, so we could reduce it to 8-bits. 0000 0001. Same end result; less bits to represent it.

Are you trolling? Is everyone here a troll? If your CD was literal silence, or a constant frequency, that would be super easy to compress because there is no entropy. But if your CD was just playing noise, then it is very hard to compress it because there's no predictability or pattern in bits.

Actually, ignore everything I said.
Just watch Tom Scott. He's a cool dude.
youtube.com/watch?v=r6Rp-uo6HmI

>I don't understand information theory or the concepts of error correcting code and then go on to conflate my lack of understanding to make a tirade about compression.

That video is so wrong it hurts.

>But you admit that those electrons, on the most fundamental levels, are still technically different?

*EVERY* electron is technically different, if you want to play that game.

>So how does this not count as loss? Sure, it's incredibly small, but it's still technically there, and eventually will cause noticeable degradation.

No. A thousand times no. I could copy the same data to billions of computers and, unless there is an issue with the actual algorithm at play or the process of copying (i.e. not using the tcp protocol, messing with virtual memory via a third party program or hardware, etc), there will never be any issue, because a certain threshold *must* be met for those electrons to represent a 0 or a 1.
If what you thought was actually a thing in the real world, your computer would not function right now because it wouldn't have exact integer arithmetic, which is necessary for something as complex as an operating system to function properly, let alone file transfers between multiple processors.

strawpoll.me/15027157

Sure. I completely agree that you can't have a lossless reproduction of an analog audio/video source.

But lossless compression algorithms exist, as defined by 2 requirements:
1. The output data after decompression is exactly the same as the input data before compression
2. The intermediary data (after compression) is smaller than the input data

Mostly looking at point 1

>lossless compression isn't actually feasible
If that were true, then your computer wouldn't be working right now.

big breained nibba right here!

...

more like 0 +- .00000000001 int

The waveform is just a visual representation of the recording, it is not THE recording.

kek

fortunately computers handle discrete digital data unlike your gay fantasy scenario

You got no idea how a computer works do you. This is why we have branch predictors and error correction. No fucking way can data be intact at the level computers are at now

There are no approximations in digital data. Sure, in the computer, there may be give or take 100 electrons, but the *digital data* that they represent is all the same. That is why "010" is equal to "010" despite the different electrons. If you copy that "010" to another processor, it may be the same or different. Heck, maybe on that different processor the absence of electrons represents a 1 and the reverse represents a 0. But that doesn't matter with digital. All that matters is that the threshold is met, and that what is being represented is "010". And it's this representation of data that is being measured. A computer does not perform arithmetic on the electrons directly. And, unless there's an issue with the hardware, it will never fail.

>The waveform is just a visual representation of the recording
>visual representation
>waveform
>visual

Nigger what? It's audio

>A waveform is the shape and form of a signal such as a wave moving in a physical medium or an abstract representation.
It's not the actual audio, it's just an image representation of the audio. Basically just an oscilloscope readout.

Niggers arguing with niggers.

Welcome to Sup Forums

Also you don't know what the fuck you're talking about. Lossless doesn't produce files as small as lossy. Lossless merely finds patterns in data and represents these patterns in a more concise manner.

This guy was actually right You're just too much of a faggot to realize it.

Your friend is an idiot too. We can't have infinite precision.

I'll take the Planck length for $100, Alex.

>This is why we have branch predictors and error correction.

Stop trying to sound like you know shit. Do you even know what a branch predictor is? It's just a way for a computer to optimize its speed and has absolutely nothing to do with compression save for optimizing its runtime.
As for correction algorithms, shit like SHA1 and md5sum exist, but that's not because digital media somehow becomes "lossy". It's because, more often than not, a message can be truncated or some packet corrupted during the transfer. And unless you have an archaic magnetic based storage and a huge fucking magnet lying around, it won't get corrupted by just sitting there. As I said before, an operating system needs exact precision to operate properly. Your computer will most likely fail to boot before you'll notice any changes in your wave file.

>it's just an image representation of the audio
It's a *digital* representation of the audio, consisting of 1s and 0s. It's not a painting.

You are attempting to prove a negative. That's retarded. If he makes a fantastic claim, tell him to demonstrate it. The onus would be on him instead of you.

Are you the same user arguing that you could trace the waveform into a vector outline to compress it. If I posted an image of a big waveform from Audacity or what have you, you couldn't use that to infer much about the actual timbre and content of the audio. Such a thing probably exists with spectral analysis but that's not what this is.

>This is why we have branch predictors
PLEASE, enlighten me as to what a branch predictor has to do with compression and data loss.

>error correction
There are this things called parity bits or parity data. Basically, they're data that accompanies the data that you want in order strengthen it against any type of corruption. The thing about parity is that they designed to ALWAYS REPRODUCE THE EXACT DATA THAT GOT LOST up to a certain threshold. Once you go beyond that certain threshold, then the parity data and your data becomes useless.

A simple bitflip is enough to devastate an entire system. It's enough to turn your carefully written document into a garbled mess. That ECC is there so that, no matter what happens physically that would cause a bitflip, it will always reproduce the same exact data that you started out with.

You can losslessly compress data that's already digital, i.e. camera sensors, or digital waveforms that already went through an a2d conversion.

We don't have the infinite precision to represent real-life phenomena though. But for the things we're trying to convert or compress, the limitations are much more obvious elsewhere, i.e. recording hardware (resolution, framerate, color reproduction, grain, etc of a sensor and its A2D conversion hardware).

It definitely is, as far as digital data goes. You can verify this yourself just by compressing then decompressing a file.

If you are doing any conversion between some analog data to digital, there will always be some level of loss. Analog to digital conversion is inherently lossy in modern technology.

Digital to digital compression however, is not.
I could easily compress some digital WAVE file to FLAC and back again, and get the exact same data back.

The same is true for gzip, bzip2, xz, fuck, even RAR. Stop thinking in terms of analog in digital land. Once that data gets converted to 1s and 0s, they're reliably always 1s and 0s, not 1.0000...425 or 0.000...324 or what have you.

He's trying to argue that it'll never be an exact copy because the number of electrons aren't the same or some shit. I think he's an autist.

I think you're confusing two things.

Reading analog input from the real world.
(IE, recording via a mic.)
Yes, even with say flac, there is some "loss" from the real world. (Although vinyl is still far more lossy!!)

However, compressing say, mobo dick is different.

There is no loss.

Any noise on the chipset only matters if it flips a 1 or a 0.

And then crc codes and such are used to detect and correct such errors.

It is possible for a bit to become flipped via the noise, and cause a losslessly compressed file to break.
But until that happens that is a successfull lossless compression.

Come on guys. This is bait, and you're all taking it.

you need to look up what digital is and how it works
digital is a logical concept, applied using analog/physical means
for example, in a 5v digital circuit, 2.5v is a logical 1. small disturbances like if a '1' was sent as 4v won't affect the end result. the 1 is still over 2.5v and so is interpreted as a 1. this cannot be compounded either, as once the receiver interprets it as a '1', it is stored as a '1', not 4v.

when sampling analog information, such as recording a microphone, what you said does apply, a digital recording of an analog signal is band-limited, and can only be so accurate
but sampling digital information from an analog source is not the same thing
that's not to say it always works, the real world is analog, and things like weak/interrupted signals can corrupt digital data too, though there are ways to mitigate that (ECC), you are able to detect and even correct for errors in digital data. so outside of hash collisions, you can be certain that what was sent/stored is 100% identical to what was received/retrieved

even if not op, i wouldn't doubt someone has considered this

You are all not focusing on the process of data moving from point a to point b itself. You just say "we put the original data stream into a compression algorithm".
Think about HOW you put it through the compression algorithm, think about HOW data is moved through other data.

imgur.com/a/Uq2ui

Notice how when you come back up from the fragmented memory, the individual chunks are the same size to each other but smaller from before the compression.
This is because on the electron level and beyond there will always be a difference. If there is always a difference on the electron level, then there will always eventually be data loss. It might take a hundred billion compressions, but it'll eventually occur. Eventually a zero will become a one.

>This is because on the electron level and beyond there will always be a difference.
yes
>then there will always eventually be data loss. It might take a hundred billion compressions, but it'll eventually occur.
no. analog inaccuracy in a digital system does not compound, as the information is 'rounded' to a 0 or 1 every time it's interacted with
in a normally-functioning system, you can transfer a file from one machine to another over and over and the file will remain perfectly intact until something physically breaks down
there is no such thing as generational loss in a digital system

The whole point of data compression is to reduce the data but to keep the information intact.
Learn the difference between data and information first, not from wikipedia.

You're not thinking about HOW the file moves around in a system
Sampling IS movement
Sampling is HOW DATA MOVES
BECAUSE IT'S NOT MOVEMENT
IT'S JUST A SERIES OF SWITCHES

>the analog to digital converter give you dataloss
t. never read the sampling theorem

Are you the timecube guy btw?

See and small disturbances like if a '1' was sent as 1.0000121 won't affect the end-result, and it cannot be compounded because the receiver interprets it as a '1', not a 1.0000121.

In addition, any noise in the chipset only matters if it flips a 1 or a 0. Everything else is negligible in digital world.

Plus, your image is the wrong analogy. You're still thinking in terms of analog data.

I FUCKING KNEW SOMEONE WAS GONNA SAY SOMETHING
NO
I'M NOT THE FUCKING TIME CUBE GUY
I AM NOT ATTEMPTING TO FUNDAMENTALLY RESTRUCTURE LOGIC ITSELF
I AM OPERATING ON ALREADY ACCEPTED LOGIC THAT IS COMMONLY ESTABLISHED AND UNDERSTOOD
I JUST WENT A LITTLE FURTHER AND FIGURED OUT THAT INFINITE RECURSION IS HOW WE EXIST IN THE FIRST PLACE
THERE WILL ALWAYS BE A SMALLER FISH

You sound like an autist to me. Everyone here pretty much butchered your "logic".

You're ignoring a lot of good points here too.

yea, i understand that moving digital information through a wire is functionally similar to analog information, you put a voltage on the wire, and sample it on the other side
the difference is that a purely analog system sends a 'value' direct as the voltage level, and the receiver uses or stores that voltage level as-is
while a digital system sends 0's and 1's as voltages under or above a certain threshold, and the receiver interprets the 0's and 1's based on what side of the threshold the signal is on, so there is room for error, and as long as the signal is strong enough that 0's are low enough and 1's are high enough, the transfer is entirely lossless

But let's say the variable stored in a C++ program will always store it as 1.0000121, no matter if it's digital or analog. We just choose to return it to other things by only what comes before the decimal point.

If you're that worried about losing a single bit, compare the copy to a hash of the original I mean XD what an idiot

Eventually however that voltage will change enough to cause binary switches to flip and create a noticeable different binary stream!
It's already happening on incredibly tiny levels, but eventually those tiny details build up.

Have you guys noticed that it should be impossible to run software? Bitrot!

that C++ program isn't sending "1.0000121" over a wire as 1.0000121 volts, it's encoded in some way, such as a floating point number
floating point numbers can have a compounding error, but they're not relevant to this conversation, they're higher-level
you don't put a floating point number over a wire as a voltage level, you put the bits that make up that floating point number over the wire