AMD reveals a Exascale MEGA APU in a new academic paper

Question

AMD reveals a Exascale MEGA APU in a new academic paper

Landon Myers

>According to this paper, AMD wants to get around this "large die issue" by making their Exascale APUs using a large number of smaller dies, which are connected via a silicon interposer. This is similar to how AMD GPUs connect to HBM memory and can, in theory, be used to connect two or more GPU, or in this case CPU and GPU dies, to create what is effectively a larger final chip using several smaller parts.

>In the image below you can see that this APU uses eight different CPU dies/chiplets and eight different GPU dies/chiplets to create an exascale APU that can effectively act like a single unit. If these CPU chiplets use AMD's Ryzen CPU architecture they will have a minimum of 4 CPU cores, giving this hypothetical APU a total of 32 CPU cores and 64 threads.

>This new APU type will also use onboard memory, using a next-generation memory type that can be stacked directly onto a GPU die, rather than be stacked beside a GPU like HBM. Combine this with an external bank of memory (perhaps DDR4) and AMD's new GPU memory architecture and you will have a single APU that can work with a seemingly endless amount of memory and easily compute using both CPU and GPU resources using HSA (Heterogeneous System Architecture).

>In this chip both the CPU and GPU portions can use the packages onboard memory as well as an external memory, opening up a lot of interesting possibilities for the HPC market, possibilities that neither Intel or Nvidia can provide themselves.

overclock3d.net/news/cpu_mainboard/amd_reveals_a_exascale_mega_apu_in_a_new_academic_paper/1

OMG, OMG, OMG! Intel is fucked again!

February 21, 2017 - 09:56

Nolan Baker

>tfw dreams come true

Finally, I was wondering why GPUs were first to receive on-chip memory, considering there are far fewer products which use them.

My only question is, what will board partners do with all the space created by removing RAM?

February 21, 2017 - 10:03

Dylan Bennett

That thing is gonna be fuckhuge. I wanna see one fully assembled.

February 21, 2017 - 10:06

Ryan Jenkins

Wouldn't clustering all those components together increase the thermal load by a fuckton, though?

Interesting concept, nonetheless.

February 21, 2017 - 10:09

Cooper Gomez

>Combine this with an external bank of memory (perhaps DDR4)

HPC need a lot RAM

February 21, 2017 - 10:10

Elijah Myers

just slap a big ass heatsink on it

February 21, 2017 - 10:22

David Baker

>Right now this new "Mega APU" is currently in early design stages, with no planned release date. It is clear that this design uses a new GPU design that is beyond Vega, using a next-generation memory standard which offers advantages over both GDDR and HBM.

February 21, 2017 - 10:25

Sebastian Davis

Put 3 - RX480 gpus together via interposer
system uses it as one gpu with 100 scaling

wew.

February 21, 2017 - 10:28

Blake Bailey

>no grid
>horsefly gets in the fan

February 21, 2017 - 10:32

Julian Taylor

>chiplets
When will they learn?

February 21, 2017 - 10:33

Henry Davis

Nvidiots on suicide watch

February 21, 2017 - 10:33

Aaron Murphy

>MEGA APU

I want one now. No, yesterday. GIVE ME THE FUCKING APU.

February 21, 2017 - 10:35

Julian Morris

not everyone lives in a stable.

February 21, 2017 - 10:35

Camden Reed

>>Combine this with an external bank of memory (perhaps DDR4)
Yeah, NOW. There's no reason it can't simply be migrated onto the chip.

This is a stop-gap to something which could be amazing.

February 21, 2017 - 10:36

Charles Russell

>what will board partners do with all the space created by removing RAM?
batteries
no more laptops

February 21, 2017 - 10:37

Nathaniel Miller

>horsefly
>not housecat
it's like you live in a barn
sonypony detected

February 21, 2017 - 10:48

Brayden Price

>80 chiplets combined into one mega chip
gattai!

February 21, 2017 - 10:48

Jack Moore

>My only question is, what will board partners do with all the space created by removing RAM?

More space for on-bus flash memory.

February 21, 2017 - 10:52

Gabriel King

don't forget the six million barns

February 21, 2017 - 10:57

Landon Taylor

>tfw too intelligent too kek

February 21, 2017 - 11:03

Joshua Flores

> what is a cache

February 21, 2017 - 11:03

Owen Reyes

>This new APU type will also use onboard memory, using a next-generation memory type that can be stacked directly onto a GPU die

Oh, so that's that "next gen memory" that they mentioned to use for Navi, their successor for Vega. In 2019.

February 21, 2017 - 11:05

Austin Cruz

INTEL IS FINISHED

February 21, 2017 - 11:05

Anthony Ross

>tfw I was right about AMD moving towards an APU that acts like a desktop grade SoC
>tfw there will come a time when all you need is a very barebones motherboard -acting as a glorified mounting place for VRM, I/O, audio, and optionally additionsl PCIe lanes and memory dimms- and a single chip.
>tfw by nature OEMs will love this for laptops and AiOs and buy them in droves
>tfw we'll eventually be able to pack console level performance into a mac mini sized enclosure

This shit is so fucking cool. I know it'll be quite a while before consumers will be able to get their hands on these, but at least it's something genuinely new.
I also have to wonder how this will affect CPU design and production in general. Using this strategy on a bigger scale could lead to massive yields per wafer, being able to salvage even dies with only single function cores and pairing them together to make up a super budget offering. Depending on how streamlined they can make the interposer mounting process, this could very well revolutionize the way CPUs are designed and produced.

To me, this is even more exciting than Zen and Vega combined, even if I'll likely continue using high end separates for the foreseeable future.

February 21, 2017 - 11:20

Brody Miller

ADORED TV WAS RIGHT NAVI CONFIRMED TO BTFO NVIDIA

February 21, 2017 - 11:31

Eli Rogers

There is a very good reason AMD chose to refer to the HBM on Vega as a "high bandwidth cache" and why they've invested in designing a memory controller that can access many parts of the system independent of the CPU's memory controller. Vega, like Fiji, is a test bed, proof of concept, and flashy advertisement for new, forward-thinking technology they plan to use in future designs.

This is what I like about AMD. With each new product, they introduce new stuff to rethink the way computing and architecture design and production is currently done. Tessellation, async, HBM, GCN, mantel, freesync, etc. They're working to alter the entire industry in their favor one feature and one architecture at a time. They're the literal antithesis of Intel.

February 21, 2017 - 11:36

Julian Garcia

More battry space my man.

Ronnie will go for days

February 21, 2017 - 11:42

Grayson Stewart

>what if we put the chips.... ON ANOTHER CHIP

fucking AMD

February 21, 2017 - 11:43

Hudson Scott

This not goin to happen anytime soon, prob 2030

Amd should focus on how to crossfire apu and gpu . this could happen in few year if they try hard

February 21, 2017 - 11:43

Xavier Bennett

Should I buy AMD when the markets open tomorrow?

February 21, 2017 - 11:51

Oliver Walker

I recommend not fucking with the stock market at all.

February 21, 2017 - 12:03

Charles Morales

holy fuc

February 21, 2017 - 12:06

Jaxson Roberts

why not shortsell intel?

February 21, 2017 - 12:18

Dominic Hughes

They already did that. It's called dual graphics. And it sucked.

February 21, 2017 - 12:22

Kayden Campbell

Holy shit couldn't this tech improve consumer CPU yields by a lot?

>Take a Ryzen "core unit" of 4 cores
>Just make a shit ton of those as their own dies
>Bin defective ones, glue some together and glue an iGPU to it
>Arbitrarily large chips made from small, high yield dies

February 21, 2017 - 12:23

Robert Price

>crossfire API+GPU
That's already a thing. Can crossfire APU with R7 250.
Also, the multi GPU in DX12/Vulkan will accomplish this sort of task in software.

February 21, 2017 - 12:27

Justin Morales

navi is rumored to be two gpus working on the same transistor

but it's all speculation

get dem heatsinks ready boys

February 21, 2017 - 12:32

Caleb Watson

Just suck my dick and I'll give you 20 bucks, which is probably more than you'd make from AMD after tax.

February 21, 2017 - 12:35

Chase Allen

>chiplets

c-cute!!!!

February 21, 2017 - 12:36

Jayden Young

yes, intel is also experimenting with it.
though it's not just binning yields, since the cuts are physically smaller they can more efficiently use each wafer.

February 21, 2017 - 12:46

Nathaniel King

>That's already a thing.

But it probably won't suck this time.

February 21, 2017 - 12:53

Caleb Russell

that's very nearly what AMD's new server platform, Naples, will be.

> each Zeppelin die has 8c/16t, 16 MB L3, 2 DDR4 channels, and 32 PCIe lanes
> sell 2x MCMs with 16c/32t, 32 MB L3, 4 DDR channels, 64 PCIe lanes
> sell 4x MCMs with 32c/64t, 64 MB L3, 8 DDR channels, 128 PCIe lanes
> support 2 sockets of either of the above

they won't have great AVX capacity for HPC/simulation stuff, but they will be beastly NVMe file and web servers

February 21, 2017 - 12:54

Gabriel Parker

Once saw a horsefly an inch long

Not in my home though, I don't take my noctua systems to rural sectors

February 21, 2017 - 12:56

Aaron Hill

fucking chiplets, when will they learn?

February 21, 2017 - 13:02

Matthew Evans

>> sell 2x MCMs with 16c/32t, 32 MB L3, 4 DDR channels, 64 PCIe lanes
consumer naples when

February 21, 2017 - 13:03

Jonathan Robinson

that barn looks really happy/angry

February 21, 2017 - 13:09

Cameron Ward

I'm a huge nvidiot, but if AMD could actually demonstrate that working well, with real 100% scaling efficiency, I think nvidia would be btfo.

February 21, 2017 - 13:19

Grayson Flores

it not goin to work like that, gpu had alot wasted transistor during "crossfire". they need to redesign everything

February 21, 2017 - 13:26

Cameron Rivera

As someone who tried it, can confirm.

February 21, 2017 - 13:26

Levi Davis

high inter-die bandwidth is necessary but not sufficient for proper GPU scaling.

you need, at the absolute least, geometry setup engines that can feed a rasterizer on not just another compute block but on potentially a different die.

control issues like this mean that successful designs will take several years to design and validate. navi might still try something like this, but Vega will almost certainly not.

February 21, 2017 - 13:27

Ethan Sanders

so, the madman are actually going to make navi scaleable?
I'd be really surprised if nvidia doesn't rebrand volta this year, they wanted to make same thing for very long time, can't imagine them being ahead on this.

February 21, 2017 - 13:27

Lincoln Sullivan

well, technically that's how it would work
how it would talk with each other that's another story, no idea, if they pull it off it's going to change GPU as we know it

February 21, 2017 - 13:29

Henry Turner

AMD hasn't had Navi described as "scalable" in the last few roadmap slides, so who the fuck knows.

Nvidia does great work with their fixed function units (color compression, tiling rasterizers, etc.) but lagged so much on async compute/graphics shaders that I can't imagine them being first with MCM GPUs.

February 21, 2017 - 13:31

Zachary Cruz

Volta was just slated to have Hybrid Memory Cube memory

Maxwell was going to have an ARM processor integrated

Of course neither one of those is gonna materialize and they pulled Pascal out their ass later on.

February 21, 2017 - 13:32

Elijah Collins

So it's just stacked dies 3D ICs that everyone already has been looking at and testing already? Yeah Intel sure is fucked.

February 21, 2017 - 13:32

Christian Perez

do you think they will sue what they developed for zen with "infinite fabric"?

2 years back it was scaleable in the slides, they changed it, so was navi, that navi pushed back to 2019 is what makes me think they plan on something

February 21, 2017 - 13:33

Ryan Jones

Reminder: Intel is GPU subsidiary of AMD.

February 21, 2017 - 13:35

Landon Hernandez

SLI/CF are fundamentally fucked nowadays since basically every modern engine pipeline uses previous frame data for effects, and AFR rendering was invented with the assumption that frames could be rendered independently.

Every new game with SLI/CF support is basically a gigantic one-off hack, which is not the way to make MCM GPUs succeed.

February 21, 2017 - 13:35

Ryan Gomez

I want everyone drop CF/SLi support altogether, it wastes development time for damn 1.5% of users skimping on optimization for single GPUs.
Problem is those users are the most obnoxious vocal minority.

February 21, 2017 - 13:37

Justin Ortiz

They did spend twice on a GPU, I'd be pretty pissed being that stupid too.

Direct X 12 explicit multi-adapter sounds interesting though.

February 21, 2017 - 13:41

Lincoln Reyes

>that chip
THICC

February 21, 2017 - 13:41

Carson Gray

I really love technology. Can't wait to see how much progress is made in 10 years.

February 21, 2017 - 13:43

Ethan Johnson

>explicit multi-adapter sounds interesting though.
it's hard to program for, again leads to wasted dev time

February 21, 2017 - 13:44

Cameron Evans

HMC was always questionable for GPUs. It was always more about capacity and design flexibility than bandwidth or power.
Pascal (the real GP100 one, not the Maxwell shrink GTX 10x0 series) is most of what Volta was supposed to be, assuming substituting HBM for HMC.

Infinity Fabric will be used for Zeppelin-Vega MCM APUs late this year or early next year. But technically IF is more a set of communication design libraries that are even used internally in their newer chips, so it's not clear what capabilities or operational characteristics/semantics it has. A HyperTransport successor doesn't have the same needs as internal pipeline structuring or control paths, etc.

February 21, 2017 - 13:44

Isaac Richardson

how possible do you think it is to integrate neural net inside a chip to handle all complicated intercommunication? will latency be unbearable or other way around?

February 21, 2017 - 13:47

Bentley Thompson

If only I could set my Apu to do only "Shadow/effect" and leave the rest for main Gpu.

February 21, 2017 - 13:49

Zachary Sullivan

>tfw I bought AMD stocks at $12
hopefully they go up enough to fund my next upgrade

February 21, 2017 - 13:51

Matthew Hill

dude, they will either jump to 20 or drop to 5 after 28th
be careful

February 21, 2017 - 13:52

Zachary James

But is that natty?

February 21, 2017 - 13:53

Anthony James

Eh, I only dropped $300 on them. Not a huge deal if the price drops a heap. If they go below $5 with no I'll just sit on them for the long run.

February 21, 2017 - 14:07

Andrew Torres

That looks like a cool office park, where is it at?

February 21, 2017 - 14:19

Carter Howard

"neural nets" if you're generous enough to call them that, are only good in a CPU for speculative decisions, which boil down to branch prediction and prefetching at most. something like cache eviction could be done like this in principle too but wouldn't be worth it.

intra- and inter-chip communications are just protocols for buffer management and state transitions, where heuristics don't really have much of a place.

February 21, 2017 - 14:28

Henry Hughes

THEY AREN'T MAKING THIS.

February 21, 2017 - 14:33

Joseph Davis

>32 CPU cores
*"mOar corez!!1!!" off in the distance*

February 21, 2017 - 14:37

Jaxson Cook

Wouldn't it be possible to simply build a scheduler into the interposer (say at 28nm to make the traces small but the scheduler not totally suck) and have that run the inter-die communication. I suspect that you could even break it down further with smaller schedulers to tie together multiple gpu dies, and that scheduler+infinity fabric then talks to the cpu scheduler+infinity fabric.

I'm clearly not a CE or EE, but would such a design be workable?

February 21, 2017 - 14:50

Hudson Lewis

>chiplets

are these like the manlets of the CPU world? im sceptic in that case!

February 21, 2017 - 14:50

Ryder Powell

Is it bad, gaymer?

February 21, 2017 - 14:57

Jaxson Wilson

>chiplets
when wil they ever learn

February 21, 2017 - 15:22

Oliver Walker

>next-generation memory type that can be stacked directly onto a GPU die
doesn't this cause heat transfer issues?

February 21, 2017 - 16:21

John Wright

Should've been called subchip

February 21, 2017 - 16:28

Jonathan White

The term chiplet has a specific meaning, it implies die stacking. Not to be confused with 3D stacking. Die stacking means you fab small parts and connect them via interproser to reduce cost on a low yielding extremely dense node.
Instead of one CPU die with 16 cores they're proposing 4 smaller dies each with 4 cores.
A couple years ago AMD proposed even breaking up the CPU itself into individual components to fab them all on separate processes tailored specifically for the part.

February 21, 2017 - 16:30

Jaxson Barnes

>chiplets

When will they learn?

February 21, 2017 - 16:43

Christopher Cruz

Neat

2022 it says.

2030 is when Raja says discrete GPUs won't even exist anymore.

February 21, 2017 - 16:52

Ayden Perry

>we're moving towards fully modular CPU fabrication

This shit is so cool. This is how technology markets are supposed to work. Everyone's so concerned with reaching the limit in shrinking transistors they've forgotten how many other ways there are to innovate products from the initial design phase up. This is what it looks like when there's real competition: the potential for methods that turn entire pre-established norms on their fucking heads.

February 21, 2017 - 18:29

Charles Wood

would massively increase yield/wafer

cash money

February 21, 2017 - 18:31

Hudson Diaz

It will require expensive MB and come with WC unit like that 220W CPU ;^)

February 21, 2017 - 18:46

Aiden Cook

If they could reliably package it then it would be an enormous cost savings. Individual parts costing perhaps pennies instead of monolithic dies costing upwards of $20-$50 each.
2.5D integration would be pretty big.

February 21, 2017 - 18:52

Ryan Sullivan

Raja is wrong. Gpu would last longer. powerlimit/vrm capacity would be huge factor. unless you willing to pay premium for MB

February 21, 2017 - 20:23

Noah Evans

When you can do 2 TFLOPs of full precision, which is a bit more powerful than a 1080, on a single APU iGPU die (there's 8 of these dies in the server HBM), and that is only 5 years from now, then 13 years from now he's probably right.

February 21, 2017 - 22:11

Easton Russell

If everyone loses money playing the stock market then who gains money? Checkmate atheists.

February 21, 2017 - 23:06

Zachary Brooks

...

February 21, 2017 - 23:06

Michael Kelly

>then who gains money

February 21, 2017 - 23:34

Cameron Richardson

I wonder how long the PS3 and 360 could've lasted commercially if MS and Sony had waited for something like this instead of going with the Bobcat family, since it was obvious they wanted an APU solution for the PS4 and XB1, but both opted for the only low-power moar coar option they had at the time

Consoles could ACTUALLY be competitive if they had a Ryzen-based HBM APU

February 21, 2017 - 23:42

Evan Wilson

i remember jewlander, good movie

February 21, 2017 - 23:47

Jack Reed

Consoles went with the lowest bidder for a credible platform, which ended up being mostly AMD.

Bulldozer was too high-power for the platform, AMD had nothing else to sell except for Bobcat/Jaguar, Intel didn't have credible GPU performance, and Jen-Hsun was still crying salty tears over never being able to get an x86 license.

February 22, 2017 - 00:03

Evan Carter

that what they say with fusion back then, just think 300 watt apu lol hope. discrete GPU will stay like how ddr still use even with hbm

February 22, 2017 - 00:19

Isaiah Parker

kek

February 22, 2017 - 00:22

Christian Nelson

Um. They're projecting 200watts for a 16 TFLOP full precision APU, which sounds reasonable. A 4 TFLOP one won't be 300 watts. Especially in 2030 when it's on 4nm fab or smaller instead of 7nm.

February 22, 2017 - 00:54

1 2 ... 10 Next

AMD reveals a Exascale MEGA APU in a new academic paper

Last threads