So i just benched my 6850k memory and processor cache and i'm baffled by how much faster L cache is compared to memory...

so i just benched my 6850k memory and processor cache and i'm baffled by how much faster L cache is compared to memory. let alone quad channel.

what makes L cache so much faster? how can we make ram as fast?

Other urls found in this thread:

intel.com/content/www/us/en/processors/core/core-i7ee-processor.html
twitter.com/SFWRedditImages

it is physically very close to the processor

mostly it's the physical distance from the CPU logic. The closer it is physically located the easier it is to have high bandwidth access. And the whole speed of light thing contributes to the latency.

3 things:
1. Clock it up
2. Widen the bus
3. Reduce the distance
That's it!

might sound extremely stupid, but could we develop something similar to hbm for system memory?

Already being done with 3D XPoint

why can't we make l cache bigger? use it as regular ram.

Defeats the purpose of fast memory.

If you want fast stuff, you have to keep it small. The time it takes to look through all the index reduces the effectiveness. L1 is 32KB, L2 is 256KB, L3 is 2.5MB.

You can already see how increasing the index/size of it reduces the speed of it directly in OP's image.

Because it's expensive in terms of die space.

Generally not worth the cost of the die space required when you can get by with L2 and L3 that is a bit slower but much larger.

Or RAM which is many times slower but much MUCH larger.

Your system memory is DRAM.
The caches on your processor are SRAM cells.

SRAM is inherently a much faster switching cell, and there is far less abstraction in addressing it vs rows in a DRAM chip.

Physical proximity isn't the largest contributing factor.
Even if the contacts on a DIMM ver physically touching the pinout on the CPU package you'd only increase latency a couple nano seconds.

The memory controller that sits between CPU logic and RAM slows down their maximum theoretical speed.
Its a necessary limitation give the large size of DRAM addresses.

A larger cache will almost always have higher latency.
Caches are balanced for hit rate vs latency and throughput.
More is not always better.

XPoint is not comparable to DRAM in speed.
It is even further away from SRAM.

The only thing comparable to HBM for general purpose memory is HMC.

ddr4 has such shit latency
ddr3 does too, but the added bandwidth makes up for it
aside from the larger ram pool ddr4 is worse for everything else tbqh senpai.

imagine stacked processor cores, and stacked cache memery, along with stacked ram.
the future should be nice, if they'd stop millking us with these worthless fucking rebrands

>ddr4 has such shit latency
68.6ns isn't great, but it's not "shit" latency

My 5820k with DDR4 3200MHz gets ~60ns latency.

What does your magical DDR3 get?

>CPU cache is way faster than RAM

Wow, what a magnificent discovery. Why did you think the very first Celerons sucked so much ass? Hint: they had no CPU cache at all.

Imagine that your fridge is your CPU cache, and RAM is a nearby grocery store. If you have no fridge, you have to walk to the grocery store each and every time you wish to have a meal. Obviously it's much more efficient to retrieve more food from the store at once and keep it in the fridge for easy and fast local access.

38.3ns

DDR4 has higher bandwidth than DDR3.
DDR4 DIMMS available now how lower latency than any prior mass produced DDR3.

lol pulling numbers out of your ass, good job.

Extending 's analogy, your idea is that of having a fridge at home the size of a whole grocery store. While it's still closer than the store, it takes much longer too find anything in there due to its size, and its maintenance costs are hugely larger.

My 4790k with some of the lowest latency DDR3 RAM you can find barely manages under 45ns, so I'm gonna call bullshit on your number.

>only believing pictures which can easily be fabricated
why even bother asking then? just wanted to be a sarcastic ass?

You could at least make up a believable number, 38ns wouldn't be world record territory for DDR3, but it'd be getting damn close.

Face it, latency isn't a huge factor with DDR4, the only reason people think that is because the CAS timings on DDR4 are almost double that of DDR3, however CAS timings are not directly related to latency, CAS timings AND memory frequency determine your actual latency. And since DDR4 has a MUCH higher frequency, the actual latency you experience is basically identical to DDR3 in most cases, and even faster in others.

3600mhz DDR4 kits now have solid 15-15-15-35-2N timings now.
Compared to 2400mhz DDR3 with CL11-10 timings.

Okay?

Lets look up the actual latency figures on those

>DDR3 2400MHz CL10
>First word
>7.5 ns
>Fourth word
>8.75 ns
>Eighth word
>10.42 ns

>DDR4 3600MHz CL15
>First word
>8.33 ns
>Fourth word
>9.17
>Eighth word
>10.28 ns


oh wow, barely any difference.
And further, the DDR4 RAM while basically just as fast in latency has MUCH more bandwidth available.

I'll take an extra ~5ns for 30% more bandwidth.

>>DDR3 2400MHz CL10
>>First word
.5 ns
>>Fourth word
.75 ns
>>Eighth word
.42 ns

I actually messed up, those are for CL9, CL10 is slightly worse

>first word
>8.33 ns
>fourth word
>9.59 ns
>eighth word
>11.26 ns


So what do you know, it's even slower than the CAS 15 DDR4 3600MHz

The difference between cache and main memory is that L1 cache is like a book on your desk, L2 is a stack of books on the other side of the room, L3 is a bookcase on the other side of your house, and main memory is a library on the other side of town. Not only is each one farther away and thus takes more time to get to than the previous, but you are also looking in a much larger bunch each time and it takes awhile to find what you're looking for.

Also, accessing from main memory would be like waiting for the library to get its latest shipment of books in.

I was pointing out that current DDR4 kits are better than high end DDR3 kits of yesteryear.

You did nothing but reinforce my point.
Good job?

To continue with this 3D Xpoint would like to be same day amazon delivery. The book is far away, but amazon has special contracts to get things shipped quickly.

I was already saying that and you replied
>3600mhz DDR4 kits now have solid 15-15-15-35-2N timings now.
>Compared to 2400mhz DDR3 with CL11-10 timings.
With nothing else posted it looks like you're implying the lower CL# is faster and you're disagreeing with me.

If you were agreeing with me the whole time you should have said something.

Anyone with half a brain can work out the middle school math and calculate the latency of each word value from the frequency and CL timings.

I responded with that because of this stupid line:
> the only reason people think that is because the CAS timings on DDR4 are almost double that of DDR3

15 is not a double of 10

Go take your autism pills, chump.

Wew, so you're autistic.

CAS timings when DDR4 first came out were much higher, with DDR4 3200MHz having a CAS 18 or similarly high number.

DDR3 can easily have a CAS of 7, though 9-11 are fairly common on consumer kits.

9*2 is 18, 7*2 is 14.


Just kill yourself, your comment didn't even address that specifically you just tossed it out there as if people can read your mind and follow your autistic train of thought.

>this retard is still making an ass of himself
>trying to call anyone else autistic

You made a stupid comment with no context.
I posted two clear examples of high end kits, showing frequency and CAS timings.
Now you're trying to argue, because your malformed inferior autistic brain doesn't know how to handle interacting with neurotypical people.

Take your pills.

>You made a stupid comment with no context
That was you

>I posted two clear examples of high end kits, showing frequency and CAS timings.
That's literally ALL you posted, no context as to what you wanted to say, you literally just posted two high end kits and said COMPARE!
Wew fucking lad, add some context to your posts you stupid nigger faggot.

>literal autism

I'm just a third party here, but following the comment chain you DID just jump in randomly with
You didn't say anything about it not being double the CAS and that was your issue, you only explained that later here ()

so as some random user passing by, you just look like a fucking idiot and an asshole.

>A larger cache will almost always have higher latency.
>Caches are balanced for hit rate vs latency and throughput.
>More is not always better.
What? if so how come that server processors have that ginormous amount of L cache

How about the i7's, a desptop procesor with up to 25mb or cache. By your line of thought those would be slower than those with less cache.

intel.com/content/www/us/en/processors/core/core-i7ee-processor.html

Look at those xtreme editions of i7. The only reason that the top have a lower clock its because they have more cores and they are limited by the TDP

Your argument does not compute sempai

L1 and L2 caches on those CPUs are fixed caches per core.

L1 cache is 64KB per core, so a 10-core CPU will have 640KB of L1 cache. L2 cache is the same at 256KB per core. L3 cache can differ and be anywhere between 2 and 6MB per core.

This is basically the only place you will see larger caches L3 and L4 (if there is an L4).

Server chips have larger L3 shared between all cores. They have a bigger L3 because the hit rate across all cores is high enough to warrant it.
That larger L3 is infact slower.

It computes perfectly, the problem is your failed understanding.

>the problem is your failed understanding

you werent specifying L1 L2 and L3 caches, yep I know the differences among caches, but the way you structured your explanation you were not issuing these differences.

L1 and L2 are one thing and are, as you said per core, L3 is shared among all cores, so yeah, more cache in the case on L3 is always better

I'm not the OP, but he did specify cacheS, not a single cache such as L3

>more cache in the case on L3 is always better
Again, no it isn't.
You wouldn't put 32MB of L3 for a single core. It would be slow and you'd never have enough data hitting it to use even 1/4 its capacity. You would literally create a scenario where touching the L3 at all would cause a performance regression since swapping data from it is slower than reissuing.

Caches are balanced for hit rate vs latency and throughput.
More is not always better.

also I think the big L3 caches on the intel chips is set up in a certain manner. A single module of haswell/broadwell is something like a pair of cores with a shared 5MB l3 cache. The module is on an interconnect ring which allows each module tertiary access to the other module's caches on die. the interconnect being a big part of the multicore performance scaling when the core counts get so high.