Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How to keep the O3 goodness without pessimizations: Use likely/unlikely liberally, unreachable(), and PGO.

If the code slows down, it's usually because the compiler has generated a bunch of code because it doesn't know what hot path to schedule for.

They will generate hundreds to thousands of instructions for factorial if you let them because it turns it into a loop, then vectorizes the loop.

Undefined Behaviour pub quiz question in there ^^



Just to clarify, in this context PGO = Profile-Guided Optimization [1].

[1]: https://en.m.wikipedia.org/wiki/Profile-guided_optimization


...as opposed to? Off the top of my head I can't seem to remember a single acronym I could possibly confuse it with, and now you got me wondering which common acronym I completely failed to learn.


Pose graph optimization


That seems like a bit of a reach given that the thread is obviously about GCC.


Yes to -fprofile-use. That brought gfortran to essentially the same geometric mean as ifort with reasonable "good" flags for me, running a benchmark set which I think is supposed to show off proprietary compilers. ifort got little benefit from its equivalent.


Not sure if that would have an impact here. GCC is just unaware of the latency implications of store forwarding. I mean, it's definitely worth a shot, but you'd just be more or less hoping that your mentioned techniques disable the right optimization pass.


That's true, although I'm curious if any compiler really understands the microarchitecture at that level without being coerced by a compiler dev writing a pass (i.e. won't happen straight from a pipeline description)


Do you know of any tools for generating this, possibly with runtime data? Been wanting to do this ever since I learned about the feature but I don't want to do this by hand for dependencies.

Edit: It is possible that I just don't understand how to actually implement PGO


I don't know what exactly is required, but the basic business is

  gcc -fprofile-generate ...
  ./a.out ...
  gcc -fprofile-use ...
I don't know the current state of https://gcc.gnu.org/wiki/AutoFDO




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: