Wasted CPU potential

It's a shame that such boosts are nearly impossible to acquire with binary distributions unless dispatching is used. The latter is very rarely used and only by programmers that know how and when to use such instructions. GCC on the other hand seems to be doing a fine job with that.

Attached: marchPower.png (2670x1780, 331K)

Autism.

Why not test unrolling loops and machine arch separately?
Which one is the relevant option in your application?

march favors encoding where loop unrolling favors the actual processing. However the gain from march is massive compared to loop unrolling. There might be some other flags that can benefit further.

Attached: comparison.png (2670x890, 160K)

Try march=native with O2. O3 can do "bad" optimizations by trying to be too clever. O2 is the sane option.

Irrelevant in real world usage

I think it doesn't do SSE if you just say -Ox or something. March should enable SSE, and that is probably where you get all the gains from in image processing or whatever this is

This is why we all need to install Gentoo.

Why is march = native ? look for it.
Are you a hacker?

I know -O3 can be dangerous, but in this case it is far from that. Thats why algorithms need testing if they can gain from it. Otherwise -O2 is the safest option that will never slow the program.

Attached: O2march.png (1335x890, 135K)

HOLY COW I'M TOTALLY GOING SO FAST
OH FUCK

O3 should always produce faster code than O2. If it doesn't then it's a bug and should be reported. The thing with O3, though, is that it assumes your program has no undefined behavior, which is why it can optimize better than O2, but it also means that if your program does have undefined behavior present in the code then it can lead to unexpected results. Bottom line is -O3 if safe for your Gentoo PC at home but for a server, or anywhere else security is a priority, you should go with -O2.

And with -Ofast?

Attached: Ofast.png (1335x890, 129K)

Nice font rendering, user. What resolution you use? 4K?

Thats on 4K with Ubuntu fonts and 163 dpi

windows devs

As a home gentoo user, I'm interested in this. If my packages have an issue with 3 would they fail to compile or compile but with weird runtime behavior?

I have experienced that a program I have written works when compiled with O2, but didn't work with O3 (runtime, both compiled).
It was quite clear that it was a bug in my code, the weird part is when it works even though my code was flawed.
I have used O3 ever since. If something doesn't work, I want to be able to catch it.

-march=btfo

>-march=btfo

it's -march=native + -Ofast

Arch is fast enough for me

-O3 can break a program if it uses things that the compiler cannot understand. for instance in my program here it supposed to show all the available instruction sets of the current cpu, however GCC for some reason optimized out the calls to cpuid, even though their result was used. For this I have to declare them as volatiles to prevent that.

What's your CPU? If it's a Haswell, Ryzen or newer chances are it's using some of the modern bit manipulation instruction sets which come with more flexible shift, rotate and multiplication instructions.

Try -O3 -mbmi2, see what kind of boost that gives you

With adjusting the "CPU potential" you open yourself up to lots and lots of headaches from things not compiling. Sometimes it's not worth the effort.

>-O3 can break a program if it uses things that the compiler cannot understand
It can only break programs in the case where the coded uses undefined behavior or a compiler bug (unlikely). -O3 is more likely to cause this than on -O2, but I've still seen in happen on -O2.

The Gentoo wiki says that many packages have trouble with -O3, likely due to devs doing dumb shit in those packages. It should work fine with conformant programs. You'd be best off enabling it on a per-package basis.