Can someone explain why a multi threaded core outperforms a single threaded one?

Can someone explain why a multi threaded core outperforms a single threaded one?

If the core is maxed out, how will shoving more information into it help the situation?

It's not maxed out.

Assume you have 2 tasks to do

If you put 2 of them on 1 single-threaded core, the core will have to switch back and forth between the two tasks until they're done, or do one first and then the other, depending if you have preemption or not

If you put 2 of them on a multi-threaded core, one thread will do 1 task, other thread will do the other task and finish it faster because it's doing them in parallel and becuase there's no context switching between the tasks etc.

It's like, imagine you're ambidextrous and you wanna draw a square and a triangle, if you draw both at once with both arms, you'll do it faster than if you draw the triangle first and then the square

without hyperthreading the core "pauses" basically every other hertz
Hyperthreading squeezes another signal in on that down time

that's bullshit

it still pauses every other hertz there's just two signals instead of one

If you're referring to simultaneous multithreading (SMT, a.k.a. Hyper-Threading):

Modern CPUs are immensely complex machines. Inside them, there's a lot of trickery that goes on to enable instructions to run faster, but without breaking the intended behavior of the program. For example, nearly every modern CPU executes some instructions out-of-order. This ends up being faster overall for reasons discussed below.

CPUs, like anything in a computer, are themselves systems made up of smaller interworking parts. CPUs use what are called "pipeline stages" to process instructions.

For example, the MIPS family of processors use five pipeline stages: Instruction Fetch, Instruction Decode, Execute, Memory read/write, and Register Writeback. These are often abbreviated [IF], [ID], [X], [M], and [W].

Imagine that your program needs to execute both an ADD instruction and a memory store (SW) instruction. It turns out that since these instructions use different parts of the pipeline, they can actually execute at the exact same time!

Finally, you ask, why is SMT faster? Well, it's only faster in certain circumstances, but the main idea is this. SMT presents two hardware threads to the operating system, which means that the CPU has twice as many instructions to choose from when it's doing its out-of-order reordering. So it has more chances to simultaneously execute instructions that don't use the same part of the pipeline. Thus, theoretically, if thread one and thread two always fed the CPU such instructions -- ones that had no pipeline overlap -- the CPU's overall rate of instruction execution would be 2X as fast.

If the core were truly maxed out, hyperthreading would lose performance. But the core is almost never maxed out.

My rather limited understanding of SMT is that it takes advantage of the many nanosecond delays or idle moments between executing instructions. Normally these delays would be essentially written off but SMT allows the core(s) to execute another instruction during these idle moments instead. It improves efficientcy by utilizing resources that would otherwise be "wasted" at the cost of power usage and heat generated but overall the processor will run more efficiently. The gain in performance varies considerably but it typically brings a 30% increase IIRC

Because of a concept called "pipelining."
Suppose you have 3 things to do to a bunch of baskets of laundry: washing, drying, and ironing. A single threaded core would be taking each basket of laundry, washing the clothes, taking the clothes and putting them in the dryer (without putting another basket in the washer), then ironing the clothes (without putting another basket in the washer, or doing anything with the dryer). On the other hand, a core with more than one thread would similar, except you put a new load of clothes in the washer immediately after putting clothes into the dryer, and immediately moving clothes from the washer to the dryer after moving clothes from the dryer to the ironing board. In this example, assuming washing, drying, and ironing a load of laundry all take the same amount of time, the second method would be 3 times faster, as it spits out a finished load per clock cycle, whereas the first method spits out a finished load every 3 clock cycles. CPUs are different, but the benefits of hyperthreading are still there.

Depends on how the software utilizes the cores.

I have a /sqt/ question

Could programming languages perform multithreaded programming before dual core cpus?

What would happen if you plugged a power outlet into itself? Would your house literally explode?

This is also true. Consider the following scenario:

Two programs are running on an SMT CPU. One of them has just generated a LW (load word; memory read) instruction, and it can't proceed any further until that LW has completed. The other is a simple arithmetic program that contains millions of ADD instructions.

Going out to memory to load data can be many thousands of times slower than ADDing two registers. So, even while Program One is blocked on the LW instruction, Program Two can continue executing its ADD instructions.

(because ADD and LW use different stages of the pipeline)

Hot wire would connect to hot wire, neutral would connect to neutral, ground would connect to ground. All wires are at the same potential as themselves so nothing would happen. If you crossed the wires up then it would probably throw off a lot of sparks and trip the breaker or blow a fuse. If there's no fuse or breaker it would just burn everything until the wire melts.

Yes. Even an old single-core Pentium 4 on Windows XP is typically running tens or hundreds of threads "at a time."

"At a time" in quotes because single-core CPUs really just switch between threads very quickly, which gives the illusion of running them at the same time. This is called context switching.

Whereas a dual-core, etc. CPU can actually execute two threads concurrently, at exactly the same time, one on each core. Of course, since there are tens or hundreds of threads running on a typical OS, they still need to context switch rapidly too. They can just execute twice as many threads at the exact same moment.

Although please do not take this as saying you should be messing with mains voltages. Don't be a retard basically.

so, quad core does exactly the same context switching but 4x more than single cpu?

Yeah, I guess you could say that. Each of the four independent cores can chew through as many threads as the single core, so the entire quad-core processor ends up context-switching 4X as often (and getting 4X the work done).

Really, there are two types of threads: hardware threads (which are either physical cores or hyper-threaded "cores"; these are presented to the OS by the CPU) and software threads (these are created by programs; e.g. Firefox needs separate threads to download a file and render the webpage HTML).

The job of the OS scheduler is to assign software threads to hardware threads as efficiently as possible.

I find it funny that someone reused my shitty shoop pic.

>Can someone explain why a multi threaded core outperforms a single threaded one?

A multi-threaded core actually has more transistors than a single-threaded-one. When you've got multiple threads per core, some of the components of the core are duplicated in each of the threads -- so that's why more transistors are needed. Those duplicated components can work in parallel.

But there are limits to how much savings you can get with multi-threading, because the threads still share some of their parent core's resources -- which can sometimes create a bottleneck when the threads are 100% busy.

(That's why the OS is supposed to minimize the amount of time that threads inside the same core are executing code simultaneously -- that is, it should always prefer to use two different cores instead of overloading one core.)

Imagine you're asked to do a simple job of moving boxes from one room to the next. The boxes are large enough so that you can only carry one of them at a time - you have no dolly or cart to load up with multiple boxes so, it's effectively a single box job aka a single core.

Now, say there were 4 of you, you and 3 clones or whatever, and you had the same task: move the boxes from one room to another. Obviously you and your compadres/clones/whatever could do 4 times the amount of work that just one of you by your lonesome could accomplish in the same span of time.

Does this not make sense?

>so the entire quad-core processor ends up context-switching 4X as often (and getting 4X the work done).
>The job of the OS scheduler is to assign software threads to hardware threads as efficiently as possible.
Yeah. Only thing is many programs aren't optimized to use multiple cores effectively. For this reason, you'd want to have a smaller number of cores with higher clock speed (i.e. 4 cores at 4.0GHz versus 6 cores at 3.5GHz) for some tasks such as with most gaming. But for other tasks such as compressing or encrypting files, I'm fairly certain that the CPU can use all cores for that, or as many as you tell it to.

This.

Hyperthreading isn't a separate CPU or core. Hyperthreading provides a *logical* core, such that the operating system believes it's a separate core and queues up operations as if it were.

Meanwhile, the CPU is able to provide some statistical improvement in throughput by optimizing usage of the various parts of the core, as the parent describes.

So, you don't get a 2x improvement in speed. You get something like, say, 25% - 80% improvement in speed, averaging perhaps 70%.

The only valid answers here. Everything else seems to be describing what multiple cores are, or what multithreading is.

Threading is actually example 1, to do example 2 you would need multiple cores.