Holly fuck guys. How does this work? I want to load 3 arrays into memory, multiple 2 arrays with 3 nested loops and save the result in the third.
How is my data filling L1,L2 and L3 cache on Intel Haswell?
When I declare an array, does it load that entire array to cache or only that one memory line that it needs curently?
On start 3 memory 64b memory lines from 3 arrays for one operation in inner loop and the rest of cache is empty at this point. Is thi interpretation correct?
Is memory line copied in L1 automatically copied to L2 and L3? Or is it stored in L1 and only goes to L2 when it's evicted? (same for L2->L3).
What about TLB? After you miss the TLB address cpu looks for address in page table in RAM or asks L1 cache for data that might be there? What is the order?