...as opposed to? Off the top of my head I can't seem to remember a single acronym I could possibly confuse it with, and now you got me wondering which common acronym I completely failed to learn.
Yes to -fprofile-use. That brought gfortran to essentially the same geometric mean as ifort with reasonable "good" flags for me, running a benchmark set which I think is supposed to show off proprietary compilers. ifort got little benefit from its equivalent.
Not sure if that would have an impact here. GCC is just unaware of the latency implications of store forwarding. I mean, it's definitely worth a shot, but you'd just be more or less hoping that your mentioned techniques disable the right optimization pass.
That's true, although I'm curious if any compiler really understands the microarchitecture at that level without being coerced by a compiler dev writing a pass (i.e. won't happen straight from a pipeline description)
Do you know of any tools for generating this, possibly with runtime data? Been wanting to do this ever since I learned about the feature but I don't want to do this by hand for dependencies.
Edit: It is possible that I just don't understand how to actually implement PGO
If the code slows down, it's usually because the compiler has generated a bunch of code because it doesn't know what hot path to schedule for.
They will generate hundreds to thousands of instructions for factorial if you let them because it turns it into a loop, then vectorizes the loop.
Undefined Behaviour pub quiz question in there ^^