Mesh vs Ringbus vs Infinity Fabric

Mesh (Skylake X) vs bingbus (Kaby Lake, Coffee lake) vs Infinity Fabric (Ryzen)

Discuss what the benefits of each architecture are and why, and what would be best suited for each purpose (gaymen, productivity etc).

So far it seems like bingbus based CPUs stomp on mesh and IF based CPUs in gaymen, but is this because gaymen software is written for bingbus based CPUs, or is bingbus inherently better at single core/thread and thus better at gaymen?

Other urls found in this thread:

community.amd.com/community/gaming/blog/2017/06/23/even-more-performance-updates-for-ryzen-customers
twitter.com/NSFWRedditImage

idk

>So far it seems like bingbus based CPUs stomp on mesh and IF based CPUs in gaymen

That's entirely unrelated.
Ringbus is shit but does work on low core count CPUs. Low core count CPUs can afford to pump more power/heat through each individual core because there's less of them.
Most shit games are written by the retarded monkeys and only run on one or two cores anyways.

It IS related, but it may not be what causes it. Read the rest of what i said, it exactly addresses what you're talking about. The thing here seems to be that not only is it a matter of power delivery and MC vs SC, but that each individual ringbus core outperforms each individual mesh core in gaymen, even at the same clockspeeds (see 8700k vs 7800x). Im wondering if this is caused by code being optimized to run on ringbus CPUs, or if if RB is inherently better at SC than mesh is. The former would mean a software difference, the latter would mean a hardware difference.

Bingbus for LCC, IF for anything else.
Though intra-CCX performance is so good you might as well choose IF as the winner.
Mesh a shit.

Mesh was designed for massive throughtput over latency, SKL-SP is a server first uAarch for fucks sake.
Of course mesh will be worse at low core counts.

Software develoreps don't operate on such a low level to care about bus type.

bingbus is simply inferior to crossbar bus but it happens to be on the CPUs that are most fitting for current games.

is there some youtube bideo explaining all this?

So you're saying Mesh is basically useless for desktop? If Mesh really is worse at SC on a hardware level, there'd be no point for a desktop user to buy it over ringbus. Same goes for IF as its basically the same thing as Mesh, just executed differently.

>Same goes for IF as its basically the same thing as Mesh, just executed differently.
H-what?
Zeppelin and LCC/HCC/XCC dies are fundamentally different, Zeppelin has no logically unified L3, instead it's partitioned into 8MB chunks.
Everything inside CCX is blazing fast.

>bingbus
Gotta admit OP, I didn't expect to get promoted for laughing

>bingbus is simply inferior to crossbar bus but it happens to be on the CPUs that are most fitting for current games.

So what does cause differing results on different platforms then? Im seeing CPUs with the same core count, roughly the same IPC, same clockspeed and same RAM get vastly differing results, where ringbus CPUs clearly outperform Mesh and IF cpus in SC workloads. If its not the bus, then what is it that causes this?

I mean its the same general concept, isnt it. Sure there are differences but they're fairly similar

>vs IF
optimization and clock speed
>vs Mesh
L3 cache capacity and optimization

>Sure there are differences but they're fairly similar
FUNDAMENTALLY
FUCKING
DIFFERENT
Heck we don't even know what internal interconnect Zeppelin uses.
Infinity Fabric is merely a protocol.

I disagree. They even perform similarly and are made for similar purposes. Both are made with MC performance in mind primarily and aimed towards workstation and server use first and foremost. Furthermore they both underperform in SC compared to their much better MC performance. Though there are differences, i will readily agree with you on that, i think you can definitely bunch them together as similar archs.

Infinity Fabric allows me to make a cheaper cpu farm to compile my Nim programming language files faster than I can say Nim programming language.

>So what does cause differing results on different platforms then?
> differing results on different platforms then?

The different platforms FFS.
Skylake X sucks balls in single thread performance because it has fuckloads of slow cores that cant go fast because of heat.
Skylake has only 4 cores that can run @ 5Ghz on all cores. Skylake X cant do that.

Also the more cores you have the longer is ICC latency.

Long story short THE LESS CORES THE BETTER SINGLE CORE PERFORMANCE

>optimization

Could optimization also cause lower SC performance in benchmarks? Sounds a bit unlikely to me.

>clock speed

Tests were done with Ryzen and Kaby lake at both 4ghz, still got drastically different results, so i doubt its this.

>L3 cache capacity and optimization

Elaborate

>Elaborate
Skylel-X changes cache hierarchy from 256KB L2$ (inclusive of L1D) and 2MB of L3$ (inclusive of L3$) to 1MB L2$ (inclusive of L1D) and 1.375MB L3$ (mostly exclusive of L2$, basically like L3 in Zen).

>Could optimization also cause lower SC performance in benchmarks
lack of optimization? yes.
>Tests were done with Ryzen and Kaby lake at both 4ghz, still got drastically different results
different archs need different optimizations.
>Elaborate
mesh arch has smaller l3 cache than bingbus but bigger L2 cache than bingbus.
all of the optimizations thus far were made for bingbus, so mesh has problems if there is no optimizations.

so everything comes to optimizations. period.

Its not this either. I am talking about the same core count and same clock speed but wildly differing SC results. As an example i gave 8700k vs 7800x, where the 8700k outperformed the 7800x in SC at roughly the same clock speed. Its something with the arch's themselves that causes it

you really don't understand anything. if the software isn't there, hardware cannot work.

You dont even understand what im talking about. You talk about variables that i already told you have been ruled out by tests. Its not a matter of MC vs SC in general or power consumption/heat (assuming you're the poster i replied to since this was his argument).

7800x is L3 starved.

8700K has way more cache and only 6 cores at which bingbus is still manageable but it's not a good thing.

If anything bingbus takes less die space allowing them to put more cache in the same size.

That's the only remotely positive quality of bingbus.

If it really is an optimization issue it would mean that the r5 1600 would be the best CPU to buy for le future proofing meme as it seems to be the most popular CPU lately, but this would mean lower perf in already existing and recent games.

Seems like whatever way you look at it you have to make a major compromise if you buy a CPU in 2017.

>You talk about variables that i already told you have been ruled out by tests
and who programmed those tests? what kind of a compiler did they use? which arch did they have in mind while programming?
read some Agner Fog. you cannot isolate software from hardware.
it would be definitely. I don't know if you have seen the rie of the tomb raider tests.
at first ryzen was shit performing on the game. around may-june an update from the game developers fix the problems.
Nvidia's drivers still causes problems for ryzen, because nvidia refuses to fix its driver accoring to zeppelin arch.
it is all about programmes themselves and how the developers code it for the existing hardware.

8700k is just better than 1600
BUT 1600 is the best kick for your buck ever and offers you future upgrades to pinnacle ridge and Ryzen2

So what you're saying is the 7800x is just a bad CPU in general and its better to get the 8700k at this core count?

YES

>and who programmed those tests? what kind of a compiler did they use? which arch did they have in mind while programming?

Good points but you need to understand i am not making a point here, im asking a question. So far it does not seem to me to be a software only problem. I'd say this is the case if the benchmark results came in high but the real world scores came in lower, but we're seeing lower SC performance in IF and Mesh based CPUs in ALL applications. Im willing to believe its purely a software issue if you can back that up with evidence, but for now that doesnt seem to be the case.

>scalability, cost, yields, power, compartmentalization(less due to IF more due to EPYC design)
IF, also is not only a core interconnect, but DRAM and GDDR/HBM interconnect as well, it's basically an all purpose bus for most parts of a IC needing high bandwidth

>consistency
Mesh

>ḷow core count
Ringbus, though IF also works just as well inside one CCX

Define low core count. Up to 4? 6? 8? 10?

first you say this,
>So far it does not seem to me to be a software only problem.
then this,
>I'd say this is the case if the benchmark results came in high but the real world scores came in lower.
what do you mean by real world? games?
you can bet your ass the reason is developers.
look at far cry primal (single core optimized), then crysis 3(multicore optimized). digital foundry have lots of videos about this. look it up. they did one with 8700k recently.
>Im willing to believe its purely a software issue if you can back that up with evidence,
gave one above, and add one more. Agner Fog has found out many years ago that intel compiler cripples amd processors. many developers was using intel compiler at that time. intel was de-optimizing the software in case it runs on a non-intel cpu.
and again, recently Agner Fog tested ryzen, found out that clock by clock ryzen has higher IPC than Skylake. check his blog.

everything comes down to software.
but for now that doesnt seem to be the case.

>define low
2
crossbar would work better than bingbus even on 4 cores too. but the margin of improvement is small and does not justify the costs.

*but for now that's doesn't seem to be the case is your words, I forgot to delete it.

Alright but do we have any way of confirming this? I am just trying to work with the little information the public gets, and where im looking from this does not seem to be a software issue only.

Furthermore, if everyone uses an intel compiler and this wont change anytime soon then it wont matter if a given CPU is better or not since it will be fucked by the compiler anyways. These are things to consider as well.

To clarify i actually hope its just a software problem and will be fixed soon as AMD catching up to Intels peak SC performance would be great for the market, i just dont think its the case.

>Alright but do we have any way of confirming this? I am just trying to work with the little information the public gets, and where im looking from this does not seem to be a software issue only.

To clarify with this i mean if this is the case for the specific arch's we're talking about now (Zen, KL and SL-X)

ok, you can check one of the things I mentioned here: community.amd.com/community/gaming/blog/2017/06/23/even-more-performance-updates-for-ryzen-customers
30% performance uplift with an update says there is something wrong with the code.

Alright then, but can we expect to see this happen to most software anytime soon? Seems like it will be better to get a Kaby/Coffee Lake CPU now and get Ryzen 2 or even 3 later if it will take a year or two for developers to catch.

Catch up*

It depends on AMD's marketshare. it is going up, that means the code needs adjusting for the market.
Bingbus is a arch. intel still tries to squeeze it but this is the end, IPC is the same for 2 years, only clock speeds go up and it is marginal.
Ryzen is new and rough. it is open to get polished. Zen 2 targets 5ghz for base clock. for Zen 1, it was 3ghz.
AMD is coming like a freight train, developers will catch up eventually. But Nvidia needs to fucking update its shit drivers.

*bingbus is dead arch.

Actually I suspect software will get worse or stay the same because stronger hardware allows to run code that is badly written, that means "diverse" code needs less unfucking and is thusly cheaper... I hope I'm wrong, though

Low core count is the number where a ring has a shorter longest path between cores than the other architecture.

>"diverse" code
nice

Infinity Fabric is inm every single way superior to ring bus and ring bus mesh edition

more software supports bing bus more than mesh, skylake x was an utter failure and gave the entire HEDT market to AMD

where is the picture for infinity fabric
that name is cringy shit btw

>more software supports bing bus more than mesh
because bing bus is around a nearly a decade. its time has come and there is nowhere to go with it.
it is a great name. you are just a retard.

Because it scales into infinity

that is physically impossible

Don't tell marketing departament that

They just mean it can scale a lot

i know that doesn't change that the name is bullshit

IF is really interesting because it brings ringbus latencies to super high core counts, ringbus can't scale.

It has a catch, cross CCX hopping, can be alleviated by improving the IF, faster RAM or making bigger CCXs.
This is perfect for VMs though, lets say Skylake-X makes a 4 core VM, it will still have 80ns core to core, but make a 4 core VM on EPYC and it will have around 40ns, just like a ringbus design.
Thankfully AMD did their research, most vendors make use of multiple smaller VMs instead of big core ones.

The latencies are not that crippling for most workloads, even.
Both mesh and IF are throughtput over latency (but Zen designs also have low latency inside the CCX so it's the best of both worlds).
Keller, Clark and the rest of the Zen team should get a fucking medal and their own religion for that.

Relational databases, though I guess that's legacy shit at this point.

Ye, most scale-up workloads might as well be legacy shit in the age of ebin Cloud.

8 core ccx's when

IF is a general purpose interconnect that happens to be a great for high core counts, it can serve cores, VRAM, DRAM, GPUs, I/O, dies and sockets and god knows what else.

Only thing it can't be used in its current iteration is private caches

>It has a catch, cross CCX hopping
wasn't cross-ccx latency like 10ns higher than mesh is between any two cores with 3200mhz memory?

the inter-ccx latency was lower than ringbus

There's no 3200MHz quad rank ECC RAM

sure but the desktop platform gives you a peek into IF's capabilities. we're a long way off 3200MHz ECC, we barely just got 2667MHz ECC and 99% of DDR4 on the desktop side is overclocked ICs with the original 2133MHz JEDEC spec, I believe they only just started releasing sticks thst ship with proper 2400MHz 'stock' or fallback speeds outside the XMP profiles.

Something tells me they wont ever make a +4 core ccx. I think the architecture isnt made to handle thah

If they can get cross CCX latencies down on then, staying with 4 core CCX is optimal.

I think they'll just ride it out until DDR5 or something, when that hits latency will no longer be a problem due to the sheer DDR5 bandwidth
Also PCIe4/5, AMD gains the most from them due to their architecture.

DDR5 is a meme user, the frequencies are increased, which will help some workloads, but CL will double too, which will fuck with others

CL is not real latency, a highend kit DDR2 or 3 is not lower latency than a highend DDR4 kit.

Case in point.

>have 1066mhz 4-4-4-12 DDR2
>7.5ns latency

Does anyone know how cores within a ccx communicate? It will depend on this wether we'll ever see +4 core ccx's.

stfu nigga
even a ddr2 800mhz cl3 has lower absolute latency than your ddr4 shit. and no rowhammer vulnerabilities either

Through some internal bus, unknown which one.
AMD and Intel have never given info about going that deep, they mostly limited it to shared chip wide interconnects

see

HEY HEY
HO HO
THIS FUCKING BINGBUS HAS GOT TO GO

Ringbus is literally "make useless loopty loops" tier, mesh makes more sense but still why didn't they fucking do it from the beginning
But infinity fabric is fairly kino and I don't see either mesh or ring reaching it's speeds

In low core count, it faster to ask "Hey is this the shit you want" than "I need deliver those package to #3 when turn left."

Hence why I'm confused at to why they continued it with higher core counts as well.
I mean, I'm an engineer but obviously I can't even come close to touching the qualifications these Intel guys have, and obviously they might have a good reason but I still question why?

Cost. Its easier and cheaper to make one single die and just disable features for lower end models than to make several different dies. Plus they had no competition at that market segment so there really was no incentive for them to improve it.

Mesh is garbage though, its barely an improvement