Can they put a GPU next to 2 CPU's?

Can they put a GPU next to 2 CPU's?

Other urls found in this thread:

hsafoundation.com
youtu.be/NoelgG8JoyQ?t=7m29s
techpowerup.com/235092/intel-says-amd-epyc-processors-glued-together-in-official-slide-deck
twitter.com/AnonBabble

Yep,HSA comes to live,hsafoundation.com

dunno, but im more interested if they can put some HBM2 stacks next to it

yes. Custom processor for the CERN is on track.

you can glue anything together with infinity fabric, user.

DELET or else.

It would be cool if you only needed "one" chip to run the most important parts of the computer.

That's what Vega is.

next to 2 CPU's to act as a giant L4 cache, not next to a GPU

Apparently AMD can glue them but Intel can't.

AMD puts the glue *between* dies to connect them together.
Intel puts the glue *on* the die to save 4 cents on TIM.

Fuck indonesia

What?

Kek

>ryzen will enable CERN to rule the world

yes

What's that?

that's a 32 core APU with memory on die

Looks like a Final Solution for "Intel in HPC" problem.

if they manage to make it comparable to a real gpu in compute, yes

at least for organizations that don't want to buy separate cpu and gpus this could work

Beats anything Intel has to offer, in any case.

fair enough

oi vey, anudda shoa!

its called an igpu

so yeah

It's designed for exascale aka a billion-billion calculations per second. Imagine entire farms running cabinets stuffed full of dual socket chips like this in every node. It would be near the top if not the first of the top 500.

There are like three places in the world where something like that would even be stressed.

so like an APU?

Nigga they can do what ever the fuck they want.

That is not the same thing at all. Integrated GPU with it's own on-die RAM would be better than shared system RAM.

implementing a gpu on-die with the cpu is called an igpu

it's integrated
how are you missing the point

TR4 socket APUs would probably cannibalise their discrete GPU market and need alot of cooling

what are some examples of organizations that'd need/want this?

Goverments that want VERY powerful supercomputers.

DARPA, CERN, any National Lab whether it's for aeronautics, nuclear energy, or space exploration, Department of Energy, anything that needs the computational power of millions of CPU/GPU cores.

sorry for all those questions, but why those APUS are better than that server AMD showed that has 100 Tflops of computing power on a 2U form factor?

Because an Exaflop is 1000 Petaflops, which is 1000 Teraflops.

I understand this, what I don't understand is why someone would take these over dedicated gpus and cpus, which are arguably better at their own jobs

I see the future! CPU + GPU in one chip! Nvidia BTFO. SSD so fast that it competes with memory ram! I need more ram! Just create a larger partition!

Less latency getting shit to and from the GPU portion. Right now they're punished by PCI-E latency in getting data from the CPU and main memory to the GPU.

>SSD so fast that it competes with memory ram!
You mean SCM?

Read up on HSA.

With your solution you have either serial(CPU) OR parallel(GPU).

With HSA you have serial AND parallel at the same time. The A in APU stands for accelerated so it could complete tasks MUCH faster than one or the other.

so literally squeezing every last drop of performance out of their equipment?

makes sense actually, every microsecond saved adds up to a lot of time when you consider the amount of nodes they're using

Exactly. Also, not sure if it's true or not, but if such a "super-APU" with the HBM is possible, there's the rumor going around that with HSA the CPU cores would be able to directly access and use the HBM as an L4 cache.

Fuck no they wouldn't.

Noice.

They can test this with EPYC2.
Slapping some HBM2 into SP3 package, like a 8-hi stack per die.

>CPU +GPU in one chip
>wat is APU

nice read, now I understand, thanks

fast L4 cache would be a nice thing to have

Better than that, if HBM is acting as an L4$, that means the data from the GPU is dropping in to the L4 and it becomes truly heterogeneous. There is zero delay between parallel and serial compute.

They'd have to install the GPU though because of the requirement of an interposer to mount the HBM to in the first place.

>There is zero delay between parallel and serial compute.
Well, significantly less vs bouncing it from GPU memory to main memory and back for processing. Still non-zero due to the unavoidable latency from sending the data out across the IF link then through the GPU to the HBM, which IIRC has a latency penalty of its own simply due to how it works.

what size should that L4 cache have so it can do this properly?

>They'd have to install the GPU though because of the requirement of an interposer to mount the HBM to in the first place.
You can use silicon interposer without GPUs, dummy.

So how does HBM compare in speed to RAM?

Yes, that's what an APU is. It's hard to buy an Intel CPU that isn't one.

There are two problems with what you're specifically thinking:
- Power and cooling. Performance GPUs are expected to nominally cap out at 300W a chip, while even the most housefire of CPUs top out around 150. GPUs specifically live on riser cards to move that heat away to somewhere more manageable while providing room for a lot of power circuitry; without wildly reengineering everything from sockets to PSU designs the best you can integrate is a budget gaming GPU plus a truly garbage CPU or a good CPU plus an >intel integrated tier GPU.

- Workload scalability: if your work is being done on GPUs, by definition it scales amazingly; you're using a GPU for its parallelization. Which means you don't just want a budget GPU, you want the beefiest one you can possibly fit. And by "one", I mean "four dual-socket parts per machine for whiteboxes, with a serious look at engineering your own backplanes to fit more".
Meanwhile, all this can be controlled by a single CPU. So if you integrate the whole mess on a single die, you're sacrificing money and performance for seven (or more!) useless CPUs, even before we get into the problem of cramming several kilowatts of TDP into a physical standard designed for around fifty and later "supplemented" to 150.

HBM2 caps at 256GB/s per stack.
That's VERY fast.

True, but it would require spending space on the CPU dies for the HBM interface that would otherwise not be used in most cases (Do you see AMD making Ryzen2 with a fat lump of still expensive as fuck HBM2?). IMO it would be more efficient to use the GPU for the HBM interface on its own interposer, and just link the CPUs and GPU via IF.

HBM PHY is relatively small and full node shrink that is 7nm LP makes it possible.
>expensive as fuck HBM2
Meme. The volume is not there, HBM itself is relatively cheap, since the dies are peanuts-sized.

why are we still using normal ram when this exists then? just make some hbm modules that can be popped into the motherboard and cooled with a heatsink

Capacity.
Upgradability.

damn, that's almost L3 cache levels of fast

So... A SoC?

Yes.
Did i tell you it was made by AMD?
ATi/AMD are historically good at inventing fucking memory and i don't really know why.
They don't even fab it.

>capacity
aren't there 8GB hbm2 stacks? just slap a bunch of those under a heatsink and done

There are, but that's still nowhere near enough memory.

For what?

Facebook on chrome?

No, some database on server.

There's also an old idea of stacking SRAM under the chip itself.
Intel did that in Polaris.

it could still be used as "L3.5" cache, even 1GB of that would help a lot in some workloads

You mean L4 cache?
Also AMD needs to make IF faster and even lower latency to leverage advantages of on-package HBM.
We'll see.
It's their tech, i'm sure they'll find a good use for it.

yes, but with speeds that close to L3, it's not that far off really

IF can hold on it's own up to 512GB/s, the problem is that it runs at too low frequency on ryzen

>the problem is that it runs at too low frequency on ryzen
IF is a protocol.
The physical layer speed depends on implementation.
GMI caps at 42.6GBps bidirectional.
IF going through PCI-E root complex capts at 37.9GBps bidirectional.
If Navi truly is MCMed GPUs, we'll see what kind of PHY they will engineer for it.

I stopped giving a shit about amd and intel and nvidia two years ago.

Quick rundown?

Is it athlon 64 all over again?

Is my 2500k still good?

>Quick rundown?
Intel is panicking and screaming
>EPYC IS ANNUDA SHOAH
in official SKL-SP slides.
Vega may or may not be R300 2.0: electric boogaloo.

yes
also yes

Jokes aside, what has amd done?

Did they kill bulldozer?

Did they release their fucking arm+x86 soc?

Is intel TRULLY JOKES ASIDE NO HOMO NOT A PRANK NOT RUSING NOT BAMBOOZLING doomed?

>Did they kill bulldozer?
Yes.
>Did they release their fucking arm+x86 soc?
There was never one.
>Is intel TRULLY JOKES ASIDE NO HOMO NOT A PRANK NOT RUSING NOT BAMBOOZLING doomed?
If their new x86 uarch is shit they will die like the DEC did.

AMD has roughly caught up to Intel (Broadwell-E/Skylake IPC, but 20% better virtualized thread performance versus HT), but has undercut Intel in price while not resorting to Jew tactics to artificially segment their product line.
Intel is doing really poor damage control as a result.

AMD has almost caught up to Nvidia, but Vega is still not good enough. Blame poor drivers (again) rather than a shitty architecture. Nvidia is laughing at Vega's unoptimized state and not giving a single fuck.

>Vega is still not good enough
Looks like Vega is doing mighty fine where it works.

AMD released a scaleable architecture, with which they can just slap 4x8 core dies together and get 90% scaling. Intel's monolithic dies with shit yields, low clockspeeds and high price tags can't compete against this. So they resorted to screeching like a little kid that AMD's arch is "4 desktop dies glued together", the result was massive hilarity and laughter all around.

youtu.be/NoelgG8JoyQ?t=7m29s

It has a larger die size than Fiji, but only 1.15% of the performance at similar clock speeds. The FE card can only beat a GTX 1070. Something has gone horribly wrong with Vega, since Nvidia's pushing 12 TFLOPs on a similar-sized die

tl;dr?

>It has a larger die size than Fiji
What are you smoking?
>1.15% of the performance at similar clock speeds

What kind of jewish tricks has intel done?

The only one i actually fell for was a "binned" 2500k

...

>What kind of jewish tricks has intel done?
Spreading FUD riiiiight in the official SKL-SP launch slides.

they released 56 cpus that are basically 15 different models with certain features on/off and they cost a fuckton of money

they're also resorting to FUDding on AMD products because they're desperate

techpowerup.com/235092/intel-says-amd-epyc-processors-glued-together-in-official-slide-deck

Now about gpu's, is amd still powerfull but also hot and an energy hog

We don't know anything substantional about Vega.
Also GPUs are inherently housefires.

now this is some good marketing, not that ""marketing"" from intel

Yes.
There was a video about IF but they removed it.

he said "tomorrow" a lot of times, what is actually happening today?

Nothing. Looks like it was filmed a day before EPYC launch.

>8c/16thread + vega with hbm2
muh dick

Samsung needs to hurry with low-cost HBM already.
GDDR really-really needs to die already.

but he mentioned "glued together", "FUD" and "ecosystem" a number of times, wasn't he mentioning those intel slides?

These slides are from June.
AMD knew about them.
You know that Intel has no friends left anymore?
Price gouging is bad. Bad!

7:29

oh well, now intel doesn't look that much of a retard anymore, I wonder what they did when they saw epyc's presentation

They look even more retarded, user.
It was a closed door presentation for chosen few. And Intel was showing THAT.