SMT vs CMT

SMT vs CMT
which is better, and why?

Discuss.

They serve different purposes.

Which is better than the average user.

Discuss.

>wips out dick
Ugh, here we go again

Wut

I assume you mean better for*

CMT architectures provide consistent workload uplifts across the board because you're executing the additional thread on physical hardware. Thats the entire basis of the concept. Achieve many threads by use of small dense cores, having them share some resources to maximize throughput per mm2.

SMT provides a less consistent performance uplift because its dependent upon whats in the pipeline. It relies on taging ops in registers and utilizing load/stores to simulate a second thread inside one physical core. By design you need a somewhat wide core to do this as you could never have much of any performance gains in a core with 2 ALUs that are being 90%~ utilized.

Most consumer workloads aren't going to make use of tons of threads, so CMT in the most practical sense is less useful. Larger high IPC cores utilizing SMT on the other hand line up perfectly with their needs.

Post more of her. I need it for academic purposes

So long as people base the purchase of a CPU on single threaded performance alone, CMT will never thrive. As a technology it relies on its acceptance within the marketspace, and while CMT is certainly dominant within the server market, outside of that it's not a profitable venture.

Big cores won the consumer market, until technology or the attitudes surrounding it changes, then that will not change.

If only amd's cmt implementation debuted at excavator. It will not be known as the worst uarch design ever known since netburst. I have no idea how amd managed to push down power consumption that low in the same clock speed and number of cores

I prefer Persona.

How about for servers?

Excavator and Steamroller cores in their respective product lines both enjoy the benefit of being fabbed on fairly refined 28nm nodes. Kaveri on a custom process variant called SHP, and Carrizo on a cheaper generic HPP.

On the IP side of things AMD was forced to develop an ungodly amount of power saving tech to deal with GloFo's 32nm PD-SOI process showing horrendous leakage current. Resonant clock mesh first scene in Piledriver based chips, new pstates with finer control over switching into boost states, fine grain power gating, the latest AVFS shown in Carrizo and Bristol Ridge. The same IP is used in their latest GPUs. Its how they managed to get the Fury X down to 220w average draw in gaming workloads.

Its translates into Excavator modules needing about 16w of power to reach 3.5ghz.
The Steamroller module needs right about 20w, and the differential between the two opens up in Excavators favor at lower clocks. Thats pretty damn impressive considering their transistor count and being generic 28nm HPP. Apparently its slightly better still in Bristol Ridge, and these are nominal figures. Its what an average yielding chip can do, not whats capable for golden silicon.

On the arch itself, if AMD would have had the time, talent, and money they could have produced something fairly competitive out of the Bulldozer family. The decision was made to shitcan Steamroller on the desktop, and they started seriously cutting costs to slow the financial hemorrhaging. Steamroller based Kaveri APUs were a full year late to market, and they showed up with a 3rd party cheap memory controller, and tweaked cache which favored bandwidth for CPU-GPU transfers, but seriously hurt CPU performance. Excavator has the same 3rd party memory controller, and the L2 was cut in half to save a bit of power, but primarily to reduce the increased latency seen in Steamroller. In a few ops Excavator is a tiny bit slower than Steamroller from hitting the small L2 so heavily.

For a server one of the most important metrics is perf/watt. Power consumption is a major consideration because of the magnitude you're dealing with, and doubly the cost to keep everything cooled since power consumption directly translates to heat waste.

A 4 module/8 thread CMT chip could out perform a 4 core/8 thread SMT chip, and it would stay in CMT's favor the more cores were added. That being the case a room with 20 racks of 4 socket blades would have a solid advantage in throughput.
Ultimately it comes down to how each concept were implemented. Neither approach uses more power inherently so its entirely up to each specific architecture's implementation and the process its fabbed on.

Wouldn't a more powerful chip also be more efficient, since it'll do the same task with a lower load and energy consumption? Fx 6300 and i5 2500 were both 95w?

When it comes to efficiency you start looking at power drawn vs time to complete op.
For example a chip drawing 100w for 60 seconds is less efficient than a chip which needs 105w but finishes the work in 50 seconds. Drawing more power isn't necessarily worse if the performance can justify it. This is basically how IBM gets away with POWER8 chips pulling 300w + the draw from off die cache.

In the specific case of the FX 6300 vs the i5 2500k, the i5 is far more efficient. Though this isn't about SMT and CMT as concepts, this is just AMD implementation of an idea vs intel implementation of another idea. The nuances of the core architecture used by each extend far beyond the base approach to multithreading.

You're very thorough with your explanations, thanks

Im honestly interested for another successor of excavator. Amd has finally reached ivy bridge performance (within the same power envelope) despite being fabbed at larger lithography. Seems like amd's cmt uarch haven't reached their peak possible performance yet.

The fetch and branch prediction still need tons of work, it would probably be best to gut them and replace them with a new design entirely. The arch still hits its L2 way too hard, the large caches were a bandaid to begin with for the entire family. The front end of a core really is everything. Just look at how ARM had a wider core with the A72, but still managed to have an IPC uplift in the narrower A73 thanks to front end changes and shorter pipeline.

A reworked front end, a halfway decent cache system, and decent internally designed memory controller would give the Construction core family new life. Though of course that is all far more work than it sounds. I've always wondered what an Excavator+ core could do on 14nm. 3.5ghz at 5w~ is a real possibility. Shame we'll never know.

So AMDs previous arch had a lot more untapped potential?

Absolutely. Performance of the Bulldozer family was hindered by bugs which were never addressed, and by process woes. The process issues were resolved, and a handful of outstanding kinks in the arch are still present. Addressing those would net a big performance uplift, but its still thousands of thousands of man hours worth of work.

For AMD its a matter of whether or not fixing Excavator was worth the money vs creating a new arch. Seeing as how they went through with developing Zen its clear they thought it was the better route in the long run.

Perhaps because of the bad reputation cmt built up?

Its possible, it definitely has left a bad taste in the mouth of their enterprise clients. They have around 1% market share in that segment right now, and I could audibly hear eyes roll if AMD gave a presentation aimed at the data center with yet another Bulldozer based chip.

Zen being a purported 40% IPC uplift over Excavator gives it a very solid basis to work on. Its likely that the successors to Zen will end up with superior performance to anything you could have pulled from refining a Construction core. Though Excavator and any theoretical derivatives may still hold value in niche places, similar to how intel has their big Core i architectures, and the smaller Atoms. AMD did replace the Cat core family with a single Excavator module in the new Stoney Ridge. Thats why I wonder how it would perform if scaled down further. Zen being a bigger core might not shine at the same super low power targets, similar to how intel's CoreM chips throttle like mad and draw way more power than their stated TDP.

AMD cut down their desktop chips for mobile? So all their laptops are cmt?

Also should I just grab zen, or will an fx 8 core or 6 core still offer competitive enough performance for dx12 and vulkan titles? Because once zen hits, Vishera should start selling cheap on the used market.

Their mobile chips are simply 2 modules and a GPU unit on the same chip. As such, all of their mobile chips that arent based on the Cat cores design are CMT, as they're based on the Construction Cores architecture, which is CMT.

As for Zen, it fucking well better. Its going to be 8 true cores with SMT (the generic name for hyperthreading), with the cores themselves hitting at absolute minimum Haswell level performance.

Vishera does sell cheap, you can find FX-8350 chips under $150 on ebay, and FX-6300 for under $80. Pretty damn cheap if you ask me.

bump

Kek

...

As previously said - with the right workload CMT can offer a huge advantage over SMT.