More

tbr1 · on April 22, 2022

I think this answer has several parts:

- I imagine the extra memory bandwidth of newer parts doesn't hurt. The example traces were taken on server-class Ice Lake machines. They just don't overflow for our typical workloads.

- We found the specific IPT configuration matters a lot. Turning off return compression is more liable to result in overflows. We allow varying this in magic-trace via the `-timing-resolution` parameter, more detail available in the wiki. We don't typically see overflows under the default configuration even on Broadwell server-class parts.

- Clark spent a week on an Intel NUC (mobile Tiger Lake part) toiling away on decode error recovery. For the most part, the data lost are uninteresting branches, and you only need one of the call in / return out of a frame to survive the decode error to be able to construct a frame for it.

We also considered the periodic stack sampling approach for error recovery, but ended up not implementing it since the decode error recovery we implemented ended up being robust enough in practice.

We ended up having more trouble with runtimes that mess with the stack pointer directly. (The kernel does this for the retpoline Spectre mitigation! But perf is smart and rewrites that part of the instruction stream into a jump for us.) There's code in magic-trace to special-case OCaml exceptions, for instance, and it's likely similar code is necessary for some other runtimes too (we have an open issue for Go's coroutine switching).

tbr1 · on April 22, 2022

DDIO operates mostly transparently to software, with the I/O controller feeding DMAs into a slice of L3. Hardware can opt out by setting PCIe TLP header hints, and you have some system-wide configurability via MSRs, but it's not something a userspace application can take into its own hands.

b20000 · on April 22, 2022

so is this taken advantage of by the OnLoad drivers of solarflare cards, for example?

isogon · on April 24, 2022

Noticed this just now. It is.

tbr1 · on April 22, 2022

Absolutely, check out https://github.com/janestreet/magic-trace#privacy-policy and https://github.com/janestreet/magic-trace/wiki/Setting-up-a-.... With a bit of extra configuration, magic-trace can host its own UI locally. You just need to build the UI from source, and point magic-trace to it (via an environment variable).

tbr1 · on April 22, 2022

Yes, in fact this is how we've been narrowing down performance problems in it and its dependencies :)

- https://github.com/let-def/owee/issues/23

- https://github.com/janestreet/magic-trace/issues/93

tbr1 · on April 22, 2022

Absolutely! This is one of the main features of magic-trace, and in fact a primary use-case.

You can select a trigger symbol for magic-trace to snapshot upon the next call of. This can be whatever you want, and you can imagine writing code like

  if (something_really_wonky_happened) { take_magic_trace(); }

and asking magic-trace to take a snapshot of the past only when `take_magic_trace` is called.

tdiff · on April 22, 2022

Sounds great, thank you!

tbr1 · on April 22, 2022

It works best on compiled programs.

We do try to support scripted languages with JITs that can emit info about what symbol is located where [1]. Notably, this more or less works for Node.js. It'll work somewhat for Python in that you'll see the Python interpreter frames (probably uninteresting), but you will see any ffi calls (e.g., numpy) with proper stacks.

[1]: https://github.com/torvalds/linux/blob/master/tools/perf/Doc...

tbr1 · on April 22, 2022

It's worth noting that aside from the overhead, function call / returns are not quite enough to reconstruct the callstack: tailcalls are just regular branch instructions.

tbr1 · on April 22, 2022

We don't have plans to add ARM support largely because we have no in-house expertise with ARM. That said, ARM has CoreSight which sounds like it could support something like magic-trace in some form, and we'd definitely be open to community contributions for CoreSight support in magic-trace.

tbr1 · on April 22, 2022

It's all OCaml, GitHub is just misclassifying it as SML :)

mananaysiempre · on April 23, 2022

Hint[1] in case you’re ever in this situation:

  echo '*.ml  linguist-language=OCaml' >> .gitattributes
  echo '*.mli linguist-language=OCaml' >> .gitattributes

[1] https://github.com/github/linguist/blob/master/docs/override...

cgaebel · on April 23, 2022

Thank you for this. I've made the change, but it looks like it may be several days before github gets around to refreshing the language statistics.

tbr1 · on April 22, 2022

We have a bit more color on compatibility in general up on <https://github.com/janestreet/magic-trace/wiki/How-could-mag...> for those interested.