Is GPU programming the future?

Is GPU programming the future?

Other urls found in this thread:

youtu.be/C0uJ4sZelio?t=20m35s
twitter.com/AnonBabble

it's nothing new - super computing involves a combination of both processor types, optimizing use toward their strengths.

GPGPU is.

muh corez

It's not the future, it's right now.

...

whoa

...

>my GPU
thousands of cores
>my cpu
16 cores

why do i even need a cpu lol, serious question

Yes, as well as programming on specialized circuits (e.g. processors designed for neural networks or possion solving or whatever)

A CPU is like the manager(s) at a company, they can make important decisions, but have limited work capability themselves.
A GPU is like the assembly line at the company, none of the workers have any say in what they do, but each one does their assigned task very quickly and efficiently.

Quite simply, the GPU only gets an input and produces an output, something else still has to send and receive the results of that work.

cute analogy

ma nigger

I hate how people make these abstract bs thing up.

Explain the science or stfu about it

cpu processes instructions and works on doing sequential tasks quickly

gpu does quick calculations with highly parallelized data by having thousands of cores each work on their own thread

CPU ON SUICIDE WATCH

I want fucking stupid people to leave.

That's better.


GPU's will never replace CPU's, they have a very important role in the operation of a computer and that will not change any time soon.

Aren't GPUs good at doing only certain kinds of task like linear algebra?

No

Only things that can be highly parallelized, which isn't as many things as most people seem to think.

A CPU core can do calculations that are extremely long and complex (think a massive chain of +es, -es, *es and /es that's millions of operations long), but a GPU core can only do one that's a couple operations long.
So if you want, say, 1+1 calculated six million times, the GPU is for you. If you want 1+1+1+1+1...(repeat six million times), then the CPU is for you.

This is the dumbest fucking and most wrong analogy I have seen in my life

You have literally no clue about what you're saying

Jokes on you, it was an analogy for XOR, NAND, etc operations

Why dont you step to the plate and enlighten us fagtron?

GPU + AI is trend. I like Cuda but OpenCL with SPIR is comming. Everything is already FU with crappy libraries. You will always need support from AMD or Nvidia. They learned from Oracle.

>So if you want, say, 1+1 calculated six million times, the GPU is for you. If you want 1+1+1+1+1...(repeat six million times), then the CPU is for you

this is actually the exact opposite of true

source: i've actually written a cuda kernel before

>the GPU only gets an input and produces an output
Are you fucking retarded?
Do you think the CPU is different? How the hell do you think a CPU operates? Protip: it only fetches input and produces an output.

>this is neo-Sup Forums

Fine.

There are two diffences, the first is computational and the other is “architectural” (for lack of a better word).

First the computational differences: GPUs are mainly good at SIMD. The cores are slow and feature a very limited instruction set, but they make up for it by being able to run the same program on thousands of threads simultaneously. The cores aren't really independent though; if you find them branching and desynchronizing your performance will go to shit. The GPU is also good at stuff it has dedicated hardware for, like linear interpolation and rasterization. Even ignoring the clock difference, CPU cores are still *significantly* more optimized towards programs that branch heavily, because they have all sorts of instruction prefetching and cache prediction and pipelines that GPU cores simply don't have.

Now the architectural difference: The CPU has a special role in that it pretty much controls the other hardware, including the GPU. You can't just take the CPU out of a system and expect it to work for multiple reasons. A hypothetical “GPU-only” system would require a significant redesign of the current core computer architecture.

I think what he meant to imply is that the GPU relies on the CPU to feed it instructions and process t results. Your GPU doesn't access the NIC, or the USB hub, or the SATA controller, etc.; it doesn't even access main memory except via DMA.

You still need a CPU in there to manage your PCI devices etc., including the GPU itself.

If you have a big array of the same datastructure and want to process every element then yes.

Every core is basically a huge SIMD unit. Each compute unit can execute lets say 32 threads at the same time but all of them execute the same instruction on a different piece of data. If you have conditional then all threads must take the same branch. The problem with GPGPU programming is that if the individual threads take different branches then you have divergence. The compute unit now has to execute every different branch seperately slowing it down to the performance of a single thread. Optimizing for GPGPU simply means reducing divergence.

>Runs Mathf.pow(0.4,1.8) 1920x1080 times on the cpu

>5fps

>puts the exact same instruction into frag shader

>2000fps

sounds like the 1+1 analogy is spot on bub

Can you please just stop ironically shitposting / pretending to be retarded? Somebody might come along and confuse this for a place where it's genuinely accepted.

cow2beef.exe is only funny the first time

Whats wrong with what he said?

youtu.be/C0uJ4sZelio?t=20m35s

Alright let me just take my practical tried and true methods elsewhere. I'm just guess that you're trying to mislead people for fun

absolutely nothing

Pretty much everything? What is *right* about what he said?

See

>YouTube “game dev” tutorials
>by a fucking UNITY developer no less
Remind me why I should even bother clicking on this?

Unity devs can't even create a GLX context correctly, I have to LD_PRELOAD fixed versions of their GL calls to make them work.

Everything about their code base is just so fucking wrong. Unity is pretty much the shitty “JavaScript+PHP web monkey who thinks base64 is an encryption algorithm” equivalent of the game developer world

literally time stamped to show that doing something really simple ~2000000 times on the CPU is slower than sending it to the frag shader.

>Unity devs can't even create a GLX context correctly, I have to LD_PRELOAD fixed versions of their GL calls to make them work.

>self-awareness.exe

Everyone isn't fiddling with shit like you are, so advice that helps 99% of people like "send it to the frag shader" or "use a compute shader to do it" is perfectly reasonable advice and no amount of elitist bs will make you any more correct.

You're calling one of the most widely used game engines shit and telling people advice that when implemented will actually HURT them because they're on unity.

>This damage control.

Do you know if there is any research for a GPU-only system?

>literally time stamped to show that doing something really simple ~2000000 times on the CPU is slower than sending it to the frag shader.
nice moving the goalposts retard

Yeah no shit SIMD is faster than SISD. Just fuck off please

This

>So if you want, say, 1+1 calculated six million times, the GPU is for you. If you want 1+1+1+1+1...(repeat six million times), then the CPU is for you.

how is that different to this?

>literally time stamped to show that doing something really simple ~2000000 times on the CPU is slower than sending it to the frag shader.

please kill yourself

What is it with Sup Forums and not posting relevant counter points?

And you talked about changing the goal posts

You literally try to trainwreck the discussion because you dislike Unity

Not for general purpose computing (e.g. home computers/laptops/whatever)

But weird architectures like that are usually a thing for stuff like embedded devices and specialized platforms, so that's where I'd go looking.

That said, I personally can't find any examples of real-world architectures that *don't* still have some weak CPU to perform device management and stuff, even if they offload most if not all of their actual processing to the GPU.

Not the original guy but, you seriously don't know jack shit if you don't know what's wrong with unity.

But him bringing up his dislike of unity isn't relevant for the discussion

it was clearly damage control because he didn't have a legitimate counterpoint

Well for starters, the two things you described are pretty much the same thing. Six million additions are six million additions.

Addition is a constant-time operation in any ALU I know of.

Here's a better analogy to explain what SIMD is: Imagine you have two arrays, each with 1000 elements, and you want to multiply them together elementwise.

On a CPU, you would do that like this:

c[0] = a[0] + b[0]
c[1] = a[1] + b[1]
...
c[999] = a[999] + b[999]

or
for (int i = 0; i < 1000; i++)
c[i] = a[i] + b[i]


This is SISD because you are running a single-integer “addition” operation a thousand times in sequence.

The SIMD equivalent would be issuing a single “wide add” instruction that adds all of the 1000 elements at the same time

i.e.
c[0:999] = a[0:999] + b[0:999]


On a CPU, an operation like this would require a 1000-int wide ALU (i.e. 1000 ALUs), which is just not realistic (CPUs have like 4 ALUs at best, for AVX/AVX2-style SIMD on 4 or 8 ints)

But GPUs are pretty much just that: 1000 ALUs fed the same instructions, but with different data

This is in essence what a fragment shader does: You run the same shader instruction on millions of pixels at the same time, in a single clock. It's very wide SIMD

>multiply
add*

A baseless assertion can be baselessly dismissed.

If anything, the burden of proof is on you for proving why adding six million integers together is somehow going to be faster on the GPU than the CPU

(I'll give you a hint: You could start by exploiting the fact that addition is a monoidal operation and get a logarithmic speedup)

But the way you phrased it implies that you have pretty much no fucking clue about what you're talking, and you're just parroting shit you heart badly explained from some youtuber and promptly misunderstood entirely.

>proof is on you for proving why adding six million integers together is somehow going to be faster on the GPU than the CPU

That video he linked literally shows that?

Are you being ignorant on purpose?

He already explained why are you wrong before and shat on unity because you brought it up.

HPC is dead.
Run, run away.

>So if you want, say, 1+1 calculated six million times, the GPU is for you. If you want 1+1+1+1+1...(repeat six million times), then the CPU is for you

I'm saying the above statement is correct in general, in that if you have a simple instruction/ set of instructions you should sent it to the GPU.

>This is the dumbest fucking and most wrong analogy I have seen in my life

>this is actually the exact opposite of true

Obviously people don't think it's correct.

What you just explained seems to support the 1+1 statement also.

How is the statement wrong? or rather, the exact opposite of true?

I can tell you without opening the video that it doesn't, and you're misunderstanding what is being demonstrated.

What happened to AMD and Nvidia's plans to put an ARM CPU on the GPU?

So on purpose

>I'm saying the above statement is correct in general, in that if you have a simple instruction/ set of instructions you should sent it to the GPU.
No, just fuck off with your baseless unhelpful “advice”. All you're doing is potentially misleading others into having a stupidly warped idea of what SIMD is and how GPUs actually work.

I'll give you a hint: Maybe you should do OpenGL development and optimization for a few years before coming back to join in on this discussion again. (Assuming the thread hasn't 404'd by then)

what part of that is incorrect?

If you have to perform the operation 1+1 6 million times

you could have a for loop that loops 6 million times on the CPU, or you could send the instruction to the GPU and have it done much faster.

Is that incorrect? because that's what you just said:

>This is in essence what a fragment shader does: You run the same shader instruction on millions of pixels at the same time, in a single clock. It's very wide SIMD

why is the advice shitty? rather than just saying it is without any explanation?

Yeah, thousands of """"""Cores""""""
Now the actual physical amount of Cores GPUs have is probably the same as CPUs for their price range.
They do have something like hundreds of logical cores on each of them.

These aren't actual cores. They're more like SIMD. They can't execute different instructions at the same time, but they can execute the same instructions on multiple data.
Which is great if you want to execute the same shader on thousands of vertices or pixels.

Ever heard that 'If'' statements are bad for performance on GPUs? This is why, if logical core 1 wants to execute branch A and the other cores want to execute branch B it has to do so sequentially.
First do branch A and make the others wait, then execute branch B. (There is no performance hit these days if they all take the same branch)


The amount of things that actually benefit from multi threading is small. And the amount of those things that would benefit from being done on GPUs is even smaller

...

>you could have a for loop that loops 6 million times on the CPU, or you could send the instruction to the GPU and have it done much faster.
(Or you could just not do either because the result is constant. SIMD is only a thing because of the _multiple data_ aspect, which the array example exemplifies. Your stupid 1+1 does not.)

The key principle behind SIMD, which you seem to fail to understand, is that you can run the same string of operations (now matter how long and complex!!) on multiple data at the same time. Basically parallel vs serial

Your assertion that the CPU is somehow going to be better at a “long and complex” string of operations like incrementing a counter a billion times (lol) is just nonsensical - the GPU is just as capable of it, as long as you can split your work into identical slices. Also, ALU operations is pretty much what the GPU does the best, so your example makes even less sense. (Try branches if you want examples of what makes a GPU cry)

Also, a GPU can still compute a gigantic string of monoidal operations quickly (it's called mapreduce, fucking google it)

This image is perfectly true, and it also doesn't support your flawed example at all. Can you please stop quoting sources that you fail to understand?

>Your assertion that the CPU is somehow going to be better at a “long and complex” string of operations like incrementing a counter a billion times

I said the opposite actually, incrementing a counter isn't complex at all. You've obviously misunderstood and/or tried to argue semantics.

My assertion about ONE CPU core performing something like pow and sin functions being faster than ONE GPU core is completely unfounded?

>So if you want, say, 1+1 calculated six million times, the GPU is for you. If you want 1+1+1+1+1...(repeat six million times), then the CPU is for you

is the logic which everyone uses and is how the differences in strength between the CPU and GPU are explained to everyone. GPU is great at doing the lots of little things, CPU is great at doing a few big things, that's literally what the analogy boils down too I'm sorry if you have autism and can't take an analogy figuratively but there's really no other way of saying it.

>is the logic which everyone uses and is how the differences in strength between the CPU and GPU are explained to everyone
No it isn't. Go take a GPU programming class at university or something for crying out loud

What makes you think that programmers can do something useful with thousands of cores when they can't even handle the few cores on your average cpu properly?

What do you think an analogy is?

Can you come up with a better analogy explaining in which situation you would use a GPU and CPU for computing in one sentence?

>I said the opposite actually
Uh huh, you're literally contradicting yourself in the same post here. (Tip: “1+1+1+1+1...” is incrementing a counter)

Anyway, this is my last reply to your shitty bait. Either learn how GPUs actually work (try writing a CUDA or OpenCL kernel or an OpenGL compute shader or something some time, and then try optimizing it) or just fuck off and stop commenting about subjects you don't know anything about.

This thread has enough good advice in it ( ) that any innocent bystanders will hopefully be capable of filtering out the signal from the irrelevant noise, no need to shit it up further replying to clueless idiots.

>itt anons who have never taken a computer architecture course trying to explain why CPUs are necessary

Try doing conditional branch prediction on a GPU and you'll have your answer.

>(Tip: “1+1+1+1+1...” is incrementing a counter)

>Analogy

Wow you people are literal, you understand the point of an analogy is to get someone thinking "hey, I'm doing this simple instruction 6 million times and it's identical except for one variable, if only there were some simple analogy that could let me know if I should stick this on the GPU or not"

all of that "good advise" just goes in depth about what that handy one sentence analogy already does, go figure.

Hopefully someone won't read "1+1 6 million times is for GPU" and think "I'm doing something simple 6 million times maybe-" then reading "that's the exact opposite of true". Because that's exactly what's going to happen you fucking autist.

>GPU's will never replace CPU's, they have a very important role in the operation of a computer and that will not change any time soon.
great explination.
gpu's are important (i wont say why) and that will not change.
themoreyouknow.jpg

Lol, try to handle i/o of a PC with a GPU.

Or even worse, anons ITT who have never taken a CE course try to assert that the CPU is unnecessary.

If a gpu 'replaced' the cpu it would by definition become a cpu.

Would it be wrong to say that each GPU core is tied to one pixel (at least in graphics rendering)?
Or can multiple cores work on one pixel?

Not everything benefits from parallel operations. I can give you circuit simulations as an example. In these simulations you need to know the previous state of the circuit to calculate the next step. While it is possible to distribute the calculations done in a single step, distributing also has its own overhead and not feasible for smaller simulations, so there you go a significant portion of engineering programs need to do similar things. I think we'll need CPUs with instructions sets that can be better optimized for general applications for a while.

A question would be, is ARM core significantly larger or smaller than a single core in a GPU? Because if it is about the same size (and similar power consumption) cores replaced by ARM might be nice. I remember there were papers with 1k ARM cores and stuff but my memory is blurry.

Completely wrong.

Then how does it work?

The quality of your first assumption was so low that it's not worth the effort to explain it to you.

CPU
Central
Processing
Unit
same reason why it takes an extremely long time for a cpu to mine bitcoins but an ASIC miner can do it relatively easy.
instruction sets

i also have 0 knowledge in this field.
If you want the GPU to replace the cpu then the GPU will have to learn a lot of new things.

>Runs Mathf.pow(0.4,1.8) 1920x1080 times on the cpu
calculate LUT for 256 elements
swap bytes
>500 fps on cpu

>2016
>not running your operating system in a VRAM disk

This, a THOUSAND times THIS
CPU = Central Processing Unit
GPU = Graphics Processing Unit
If you replace the CPU with a GPU, it literally becomes a CPU.

As always first post best post.

Fuck off

it was the future in 2010 or so.

A gpu breaks up primitives(triangles, lines , points) into fragments(kinda like pixels) and sends them to be processed by a shader core(a simplified CPU capable of SIMD or something similar). A gpu core can process multiple fragments at a time.

GPUs are the future because of muh machine learning

>is ARM core significantly larger or smaller than a single core in a GPU
A better comparison would be a Xeon Phi

Ov vey dont forget the six million!

No, fpgas are if we can scale them.

Not today, maybe tomorrow.

Most programmers can't even write good multithreaded code for CPUs, forget writing massively parallel software.

Most multithreaded shit for CPUs have all sorts of data dependencies between threads, and the hard part is coordinating them properly.
With GPGPU, you would only use it for massively parallel shit, and you don't have to worry about coordination.

go back to pol, faggot. You Alt-right children need a fucking containment website.

GPUs are beyond the point of just SIMD operations, since you can do branching and loops on modern ones; though, it's inefficient.

Thats homophobic

Really makes you think.

I think the xeon phi has a mode that runs the os off the coprocessor