AMD reveals a Exascale MEGA APU in a new academic paper

>According to this paper, AMD wants to get around this "large die issue" by making their Exascale APUs using a large number of smaller dies, which are connected via a silicon interposer. This is similar to how AMD GPUs connect to HBM memory and can, in theory, be used to connect two or more GPU, or in this case CPU and GPU dies, to create what is effectively a larger final chip using several smaller parts.

>In the image below you can see that this APU uses eight different CPU dies/chiplets and eight different GPU dies/chiplets to create an exascale APU that can effectively act like a single unit. If these CPU chiplets use AMD's Ryzen CPU architecture they will have a minimum of 4 CPU cores, giving this hypothetical APU a total of 32 CPU cores and 64 threads.

>This new APU type will also use onboard memory, using a next-generation memory type that can be stacked directly onto a GPU die, rather than be stacked beside a GPU like HBM. Combine this with an external bank of memory (perhaps DDR4) and AMD's new GPU memory architecture and you will have a single APU that can work with a seemingly endless amount of memory and easily compute using both CPU and GPU resources using HSA (Heterogeneous System Architecture).

>In this chip both the CPU and GPU portions can use the packages onboard memory as well as an external memory, opening up a lot of interesting possibilities for the HPC market, possibilities that neither Intel or Nvidia can provide themselves.

overclock3d.net/news/cpu_mainboard/amd_reveals_a_exascale_mega_apu_in_a_new_academic_paper/1

OMG, OMG, OMG! Intel is fucked again!

>tfw dreams come true

Finally, I was wondering why GPUs were first to receive on-chip memory, considering there are far fewer products which use them.

My only question is, what will board partners do with all the space created by removing RAM?

That thing is gonna be fuckhuge. I wanna see one fully assembled.

Wouldn't clustering all those components together increase the thermal load by a fuckton, though?

Interesting concept, nonetheless.

>Combine this with an external bank of memory (perhaps DDR4)

HPC need a lot RAM

just slap a big ass heatsink on it

>Right now this new "Mega APU" is currently in early design stages, with no planned release date. It is clear that this design uses a new GPU design that is beyond Vega, using a next-generation memory standard which offers advantages over both GDDR and HBM.

Put 3 - RX480 gpus together via interposer
system uses it as one gpu with 100 scaling

wew.

>no grid
>horsefly gets in the fan

>chiplets
When will they learn?

Nvidiots on suicide watch

>MEGA APU

I want one now. No, yesterday. GIVE ME THE FUCKING APU.

not everyone lives in a stable.

>>Combine this with an external bank of memory (perhaps DDR4)
Yeah, NOW. There's no reason it can't simply be migrated onto the chip.

This is a stop-gap to something which could be amazing.

>what will board partners do with all the space created by removing RAM?
batteries
no more laptops

>horsefly
>not housecat
it's like you live in a barn
sonypony detected

>80 chiplets combined into one mega chip
gattai!

>My only question is, what will board partners do with all the space created by removing RAM?

More space for on-bus flash memory.

don't forget the six million barns

>tfw too intelligent too kek

> what is a cache

>This new APU type will also use onboard memory, using a next-generation memory type that can be stacked directly onto a GPU die

Oh, so that's that "next gen memory" that they mentioned to use for Navi, their successor for Vega. In 2019.

INTEL IS FINISHED

>tfw I was right about AMD moving towards an APU that acts like a desktop grade SoC
>tfw there will come a time when all you need is a very barebones motherboard -acting as a glorified mounting place for VRM, I/O, audio, and optionally additionsl PCIe lanes and memory dimms- and a single chip.
>tfw by nature OEMs will love this for laptops and AiOs and buy them in droves
>tfw we'll eventually be able to pack console level performance into a mac mini sized enclosure

This shit is so fucking cool. I know it'll be quite a while before consumers will be able to get their hands on these, but at least it's something genuinely new.
I also have to wonder how this will affect CPU design and production in general. Using this strategy on a bigger scale could lead to massive yields per wafer, being able to salvage even dies with only single function cores and pairing them together to make up a super budget offering. Depending on how streamlined they can make the interposer mounting process, this could very well revolutionize the way CPUs are designed and produced.

To me, this is even more exciting than Zen and Vega combined, even if I'll likely continue using high end separates for the foreseeable future.

ADORED TV WAS RIGHT NAVI CONFIRMED TO BTFO NVIDIA

There is a very good reason AMD chose to refer to the HBM on Vega as a "high bandwidth cache" and why they've invested in designing a memory controller that can access many parts of the system independent of the CPU's memory controller. Vega, like Fiji, is a test bed, proof of concept, and flashy advertisement for new, forward-thinking technology they plan to use in future designs.

This is what I like about AMD. With each new product, they introduce new stuff to rethink the way computing and architecture design and production is currently done. Tessellation, async, HBM, GCN, mantel, freesync, etc. They're working to alter the entire industry in their favor one feature and one architecture at a time. They're the literal antithesis of Intel.

More battry space my man.

Ronnie will go for days

>what if we put the chips.... ON ANOTHER CHIP

fucking AMD

This not goin to happen anytime soon, prob 2030

Amd should focus on how to crossfire apu and gpu . this could happen in few year if they try hard

Should I buy AMD when the markets open tomorrow?

I recommend not fucking with the stock market at all.

holy fuc

why not shortsell intel?

They already did that. It's called dual graphics. And it sucked.

Holy shit couldn't this tech improve consumer CPU yields by a lot?

>Take a Ryzen "core unit" of 4 cores
>Just make a shit ton of those as their own dies
>Bin defective ones, glue some together and glue an iGPU to it
>Arbitrarily large chips made from small, high yield dies

>crossfire API+GPU
That's already a thing. Can crossfire APU with R7 250.
Also, the multi GPU in DX12/Vulkan will accomplish this sort of task in software.

navi is rumored to be two gpus working on the same transistor

but it's all speculation

get dem heatsinks ready boys

Just suck my dick and I'll give you 20 bucks, which is probably more than you'd make from AMD after tax.

>chiplets

c-cute!!!!

yes, intel is also experimenting with it.
though it's not just binning yields, since the cuts are physically smaller they can more efficiently use each wafer.

>That's already a thing.

But it probably won't suck this time.

that's very nearly what AMD's new server platform, Naples, will be.

> each Zeppelin die has 8c/16t, 16 MB L3, 2 DDR4 channels, and 32 PCIe lanes
> sell 2x MCMs with 16c/32t, 32 MB L3, 4 DDR channels, 64 PCIe lanes
> sell 4x MCMs with 32c/64t, 64 MB L3, 8 DDR channels, 128 PCIe lanes
> support 2 sockets of either of the above

they won't have great AVX capacity for HPC/simulation stuff, but they will be beastly NVMe file and web servers

Once saw a horsefly an inch long

Not in my home though, I don't take my noctua systems to rural sectors

fucking chiplets, when will they learn?

>> sell 2x MCMs with 16c/32t, 32 MB L3, 4 DDR channels, 64 PCIe lanes
consumer naples when

that barn looks really happy/angry

I'm a huge nvidiot, but if AMD could actually demonstrate that working well, with real 100% scaling efficiency, I think nvidia would be btfo.

it not goin to work like that, gpu had alot wasted transistor during "crossfire". they need to redesign everything

As someone who tried it, can confirm.

high inter-die bandwidth is necessary but not sufficient for proper GPU scaling.

you need, at the absolute least, geometry setup engines that can feed a rasterizer on not just another compute block but on potentially a different die.

control issues like this mean that successful designs will take several years to design and validate. navi might still try something like this, but Vega will almost certainly not.

so, the madman are actually going to make navi scaleable?
I'd be really surprised if nvidia doesn't rebrand volta this year, they wanted to make same thing for very long time, can't imagine them being ahead on this.

well, technically that's how it would work
how it would talk with each other that's another story, no idea, if they pull it off it's going to change GPU as we know it

AMD hasn't had Navi described as "scalable" in the last few roadmap slides, so who the fuck knows.

Nvidia does great work with their fixed function units (color compression, tiling rasterizers, etc.) but lagged so much on async compute/graphics shaders that I can't imagine them being first with MCM GPUs.

Volta was just slated to have Hybrid Memory Cube memory

Maxwell was going to have an ARM processor integrated

Of course neither one of those is gonna materialize and they pulled Pascal out their ass later on.

So it's just stacked dies 3D ICs that everyone already has been looking at and testing already? Yeah Intel sure is fucked.

do you think they will sue what they developed for zen with "infinite fabric"?

2 years back it was scaleable in the slides, they changed it, so was navi, that navi pushed back to 2019 is what makes me think they plan on something

Reminder: Intel is GPU subsidiary of AMD.

SLI/CF are fundamentally fucked nowadays since basically every modern engine pipeline uses previous frame data for effects, and AFR rendering was invented with the assumption that frames could be rendered independently.

Every new game with SLI/CF support is basically a gigantic one-off hack, which is not the way to make MCM GPUs succeed.

I want everyone drop CF/SLi support altogether, it wastes development time for damn 1.5% of users skimping on optimization for single GPUs.
Problem is those users are the most obnoxious vocal minority.

They did spend twice on a GPU, I'd be pretty pissed being that stupid too.

Direct X 12 explicit multi-adapter sounds interesting though.

>that chip
THICC

I really love technology. Can't wait to see how much progress is made in 10 years.

>explicit multi-adapter sounds interesting though.
it's hard to program for, again leads to wasted dev time

HMC was always questionable for GPUs. It was always more about capacity and design flexibility than bandwidth or power.
Pascal (the real GP100 one, not the Maxwell shrink GTX 10x0 series) is most of what Volta was supposed to be, assuming substituting HBM for HMC.

Infinity Fabric will be used for Zeppelin-Vega MCM APUs late this year or early next year. But technically IF is more a set of communication design libraries that are even used internally in their newer chips, so it's not clear what capabilities or operational characteristics/semantics it has. A HyperTransport successor doesn't have the same needs as internal pipeline structuring or control paths, etc.

how possible do you think it is to integrate neural net inside a chip to handle all complicated intercommunication? will latency be unbearable or other way around?

If only I could set my Apu to do only "Shadow/effect" and leave the rest for main Gpu.

>tfw I bought AMD stocks at $12
hopefully they go up enough to fund my next upgrade

dude, they will either jump to 20 or drop to 5 after 28th
be careful

But is that natty?

Eh, I only dropped $300 on them. Not a huge deal if the price drops a heap. If they go below $5 with no I'll just sit on them for the long run.

That looks like a cool office park, where is it at?

"neural nets" if you're generous enough to call them that, are only good in a CPU for speculative decisions, which boil down to branch prediction and prefetching at most. something like cache eviction could be done like this in principle too but wouldn't be worth it.

intra- and inter-chip communications are just protocols for buffer management and state transitions, where heuristics don't really have much of a place.

THEY AREN'T MAKING THIS.

>32 CPU cores
*"mOar corez!!1!!" off in the distance*

Wouldn't it be possible to simply build a scheduler into the interposer (say at 28nm to make the traces small but the scheduler not totally suck) and have that run the inter-die communication. I suspect that you could even break it down further with smaller schedulers to tie together multiple gpu dies, and that scheduler+infinity fabric then talks to the cpu scheduler+infinity fabric.

I'm clearly not a CE or EE, but would such a design be workable?

>chiplets

are these like the manlets of the CPU world? im sceptic in that case!

Is it bad, gaymer?

>chiplets
when wil they ever learn

>next-generation memory type that can be stacked directly onto a GPU die
doesn't this cause heat transfer issues?

Should've been called subchip

The term chiplet has a specific meaning, it implies die stacking. Not to be confused with 3D stacking. Die stacking means you fab small parts and connect them via interproser to reduce cost on a low yielding extremely dense node.
Instead of one CPU die with 16 cores they're proposing 4 smaller dies each with 4 cores.
A couple years ago AMD proposed even breaking up the CPU itself into individual components to fab them all on separate processes tailored specifically for the part.

>chiplets

When will they learn?

Neat

2022 it says.

2030 is when Raja says discrete GPUs won't even exist anymore.

>we're moving towards fully modular CPU fabrication

This shit is so cool. This is how technology markets are supposed to work. Everyone's so concerned with reaching the limit in shrinking transistors they've forgotten how many other ways there are to innovate products from the initial design phase up. This is what it looks like when there's real competition: the potential for methods that turn entire pre-established norms on their fucking heads.

would massively increase yield/wafer

cash money

It will require expensive MB and come with WC unit like that 220W CPU ;^)

If they could reliably package it then it would be an enormous cost savings. Individual parts costing perhaps pennies instead of monolithic dies costing upwards of $20-$50 each.
2.5D integration would be pretty big.

Raja is wrong. Gpu would last longer. powerlimit/vrm capacity would be huge factor. unless you willing to pay premium for MB

When you can do 2 TFLOPs of full precision, which is a bit more powerful than a 1080, on a single APU iGPU die (there's 8 of these dies in the server HBM), and that is only 5 years from now, then 13 years from now he's probably right.

If everyone loses money playing the stock market then who gains money? Checkmate atheists.

...

>then who gains money

I wonder how long the PS3 and 360 could've lasted commercially if MS and Sony had waited for something like this instead of going with the Bobcat family, since it was obvious they wanted an APU solution for the PS4 and XB1, but both opted for the only low-power moar coar option they had at the time

Consoles could ACTUALLY be competitive if they had a Ryzen-based HBM APU

i remember jewlander, good movie

Consoles went with the lowest bidder for a credible platform, which ended up being mostly AMD.

Bulldozer was too high-power for the platform, AMD had nothing else to sell except for Bobcat/Jaguar, Intel didn't have credible GPU performance, and Jen-Hsun was still crying salty tears over never being able to get an x86 license.

that what they say with fusion back then, just think 300 watt apu lol hope. discrete GPU will stay like how ddr still use even with hbm

kek

Um. They're projecting 200watts for a 16 TFLOP full precision APU, which sounds reasonable. A 4 TFLOP one won't be 300 watts. Especially in 2030 when it's on 4nm fab or smaller instead of 7nm.