Games optimization

Let's talk about pc games optimization. I'm seriously wondering the meaning of this screenshot I took. The CPU is an old phenom ii 965 @3.8, the gpu is a r9 280x and the memory is a single 8gb stick ddr3 1600 cl10. Why the crappy framerate even if nothing has maxed out utilization?

Other urls found in this thread:

forums.guru3d.com/showthread.php?t=356110
twitter.com/AnonBabble

This happens to me too in all games and I still don't know its reason.
Bump

I'm curious too, bump

Limited by your single core performance

well something's choking the pipeline then. Could be I/O, could be RAM

Do you have max performance set in your power plan and driver settings? If you do then it's probably bottlenecked by something else.

It's Skyrim. No hardware is ever going to help the poor optimization of Bethesda's engine.

deprecated x87 is at fault here, too.

Wouldn't a core be 100% if that was the reason? I'm really curious

Bethesda's engine is so bad that it ignores half of the feature set of any post 2005 CPU and leaves the entire SIMD pipeline empty, see

Micro-stalls during dependencies in the rendering pipeline

This sounds interesting, can you elaborate?

It's proprietary and there hasn't been much research done to analyzing it (because for what purpose, they're not gonna fix it anyway), but this might be interesting: forums.guru3d.com/showthread.php?t=356110

Slow connection between CPU, GPU, and memory reads.
If you looked at your busses you'd see that those where the things that were maxed.

how does one look at one's busses?

Really no answer yet? Okay, well I can answer it.

The reason you can't get 100% utilization, other than that that utilization is averaged over half a second or so and not updated every frame, is because of an uneven use of instruction sets.

Your CPU cores have various parts to them. Like for 16bit floating points, 32 bit, 64bit, ints, decoders, and so on.
If a game heavily uses one part that bottlenecks, it can't utilize the entirety of the cores. And/or it can also be stuck in "wait" states. That's where a core is doing some work, but it's waiting for an operation that takes a few cycles to finish (which is where SMT becomes useful, as it can switch to different work while otherwise waiting).

Bulldozer was notorious for this. Many games were using 128bit floats which required combining 2 cores together as they only had 64bit registrars on each and AMD's solution for 128bit support was to lock the FPUs together in parallel of two neighboring cores. This is why you often only see 55-65% usage max on a core with Bulldozer in like GTAV.

While 100% usage doesn't mean you're maxing out every arithmetic unit, there is a maximum thoroughput that I think comes from the decoder feeding it instructions that isn't being realized. More so due to poor optimization of games in Phenoms case, than the problem that happens with Bulldozer. It can also be from being bottlenecked by some other part of your system (IE RAM or GPU speed)

Which game would you guys say had the best optimization?

Best Answer in ages thanks !

Are you sure it doesnt have to do with mutexes + averaging?

You have to understand that a single "load" percentage cannot accurately describe the state of highly complex devices like CPUs or GPUs. Just because the magic number doesn't say 100% it doesn't mean there isn't a bottleneck in some part/subsystem of a CPU, GPU or computer in general. For instance you could be bottlenecked by RAM or VRAM bandwidth and that "load" wouldn't show up anywhere in the magic number on your OSD and that's just a basic example. Then software comes into play, which could legitimately be shitty. On the CPU side for instance if threads spend a long time waiting on each other for synchronization, you won't get max CPU usage across the board since they're spending a lot of time waiting.

Not necessarily, a single software thread isn't necessarily always tied to a single CPU core (or hardware thread). The OS' scheduler may decide to move the software thread around to different CPU cores, so a single thread being 100% busy might indeed show a single core at 100% load, but it could just as well show 50% on core 1 and 50% on core 2 if the OS keeps moving it back and forth between those 2 cores. It's entirely possible to be bottlenecked on single-core performance while not seeing 100% load on any single core. I'm pretty sure it's also quite common too, not just theoretically possible.

Thanks! This is really informative. In this case I know the cpu has to be the culprit, because the same gpu on my 4670k showed much higher framerates. I was interested in understanding the correlation between cpu usage shown and actual bottleneck. So in the end it all comes down to games optimization for specific architectures, do I get it right?

This is very informative too

No the cpu usage is only time spent running. Don't listen to this bullshit. If a thread has to wait for a sycnrhonization object it cannot run which will lower the overall utilization.

The reality is that if you had the cpu usage counter update ata higher rate you would see spikes of 100% on single cores

When in doubt, Bethesda.
Seriously, their extension script system (and even that could have been done better) aside, there is nothing about that game that isn't half-assed. Even their assets for the most parts.
The only thing that kept me 1000+ hours with it is being addictive enough, which on itself is a lame reason to play a game.

Optimization in general is kind of a catch-all concept too, kind of like the magic load percentage. You can optimize for an architecture in the sense of exploiting its strengths and attempting to avoid its weaknesses, which can include using or avoiding certain instruction sets which are known to perform poorly. You can also optimize in the sense of using a better algorithm to achieve the same purpose, basically less CPU/GPU work or less memory used for the same end result. You can also optimize in terms of how your engine/game works and how the (software) subsystems interact with each other, like minimizing the amount of time threads wait for other threads to finish working (synchronization) or how effectively you can divide the work that needs to be done across the CPUs which are available (scaling with the number of CPU cores).

I covered the averages. That yes it's updated over 0.5s so you could be near 100% on half the frames in that half a second and 50% on the others and it averages out. But... that's not actually going to happen.

But iirc, those monitors count wait states as CPU usage. Which is why a 7600k can be 100% usage, but a 7700k at the same clock speed can get a higher framerate at 100% usage.
So as far as mutexes, no, because waiting for access to the same resource would be a wait state which these count as utilization.

There is an article somewhat that goes into more detail about wait states and how it makes "CPU usage" really bogus since your CPU isn't really being used in a way you'd consider being used when it's just waiting.