"Anyway my impression is that DWARF is a large and complicated standard (and possibly the libraries people use to generate DWARF are subtly incompatible?), but it's what we have, so that's what we work with!"
Having written at least 4 complete DWARF readers and writers (GCC's location list support, GDB's expression evaluator, the thing that became google breakpad's debuginfo reader etc), it's really not that bad.
In fact, compared to pretty much any other debug format, it's wonderful. All of the forms are consistent, and outside of the index tables, and a few places where backwards compatibility was needed (ie the world moved from 32 bit to 64 bit, but before that, DWARF supported 64 bit sized debug info on 32 bit processors), the encoding is sane.
libdwarf, on the other hand, is ... not so much.
I love david a, and (AFAIK) he's been working on DWARF since SGI was at 1600 amphitheatre parkway, and keeping libdwarf up to date. It's one of those open source projects nobody ever realizes has been around 20 years and that someone has kept it working great (see ftp://ftp.sgi.com/sgi/dev/davea/objectinfo.html)
However, libdwarf is just not a pleasant interface to work with, IMHO.
It's also memory intensive.
If you just want a reader, the thing that made it into breakpad is probably a good reference (others may have better ones, i've thankfully been out of the debug info game for years):
It's mostly a callback interface, and has a function info reader meant as a demonstration
It should work without trouble outside of breakpad (when it was contributed, I made it portable to be able to just compile standalone. it doesn't look like much has changed).
it does not support DWARF4/5, but nobody should need to care.
It also has no expression evaluator but i have a bunch of them if someone needs them :)
Nice writeup! I've done some work with libdwarf and I agree that it isn't very pleasant. Do you have any suggestions for better interfaces that support writing?
If you haven't, take some time to look through the list of their blog posts at http://jvns.ca/ (yes I know you can click there from the post but I'm trying to make it really easy) because they have written an incredible amount of very interesting blog posts.
Every time I use GDB, I feel like it was almost deliberately designed to make me hate using it, and those example outputs really show why; compare WinDBG's output dumping bytes (and corresponding ASCII in the usual hexdump format, something that GDB just doesn't seem to support at all...):
interesting to see folks like this :) btw, the paper "introduction to dwarf debugging format" (available here: http://www.dwarfstd.org/doc/Debugging%20using%20DWARF.pdf) is quite approachable. also, in case folks get carried away, there is always the "linkers and loaders" book for in depth analysis.
> If we want to find the address of a global variable in our program, all we need to do is look up the name of the variable in the symbol table, and then add that to the start of the range in /proc/whatever/maps, and we're done!
what are these? The r??p look like permissions, read/write/execute/something, but how do we know which one to look for the variable in?
(And what are the numbers afterwards? 00281000 on the second line is the length of the range on the first line, but then there's a gap of 00200000 between the end of the first range and the start of the second. 00286000 on the third line is again 00200000 less than the distance from the start of the first range and the start of the second.)
The man page for /proc (yes, there is one) explains what all those fields mean.
In this case, each of those mappings corresponds to one of the sections in the binary. The permissions indicate that the first one is executable code, the second is read-only data, and the third is writable (copy-on-write). The number after the permissions is an offset into the underlying file.
I suspect the article is glossing over some details e.g. how gdb figures out which mapping corresponds to which section, but it gets the basic idea across.
X86 debug registers (DR0-3, DR6, DR7) are also useful. They don't require code changes. As an added bonus, you can set a breakpoint for a variable access, which triggers whenever a certain variable is accessed.
Although unfortunately debug registers are limited to just 4 simultaneous breakpoints in the same time.
Java stems from Sun. Sun made Solaris. And Solaris had an ideology about transparency and discoverability of running systems. They have some outright amazing tools for introspection into a binary that is running. This probably creates an environment in which the same would happen for the JVM. In fact, the `jstack` name seem very familiar to DTrace (and mdb(1)) users.
Days since I last solved a problem on the JVM through a heapDump: 2.
Great question. I wondered the same thing a while ago, and tried to build one using SystemTap (https://github.com/emfree/pystap). Couple reasons why this isn't too easy:
* "Python" in general might mean you're on Linux/Windows/whatever, and it might mean CPython, PyPy, or some other runtime. But any out-of-process instrumentation is gonna have to be pretty platform/runtime specific.
* Even if we restrict ourselves to, say, CPython on Linux, the interpreter's internals aren't super friendly to this sort of inspection from the outside. You have to rely on and also work around implementation details.
Example: to get a Python call stack, you want to look at `PyThreadState_Current` (basically the same idea as `ruby_current_thread` in that excellent linked post of Julia's, I think). But this happens to be null whenever the GIL is released, e.g. when doing network I/O, and then you're kind of out of luck. So you'll already have trouble usefully profiling a single-threaded I/O-intensive program.
* Oh and you pretty much need debug symbols in your CPython binary (I think? Tell me if this isn't true!). Most production CPython builds don't have them. So you have to get the right binary, and rebuild any application dependencies with C extensions. Not hard but annoying.
There is potential though! With some work, we definitely could have a better story for out-of-process Python profiling a la Linux perf.
Depends on how the language runtime defines its calling convention, does it follow System Linkage [1] or does it implement its own internal linkage ie Private Linkage?
If it follows standard System Linkage, its easy to point gdb or any other system debugger or profiler to debug and profile the application.
Some runtimes have a mix of System and Private linkage, ie C functions will follow System Linkage but JIT'ed code frames might follow private linkage. This makes for difficult stack-walking by system native debuggers and profilers. You'd have to teach GDB via an extension how to walk the non-standard frames.
So yea, long story short, it depends on the linkage convention the implementers of the language runtime decided to follow.
I mean, GDB already has good scripts to extract CPython stuff from inferior processes, but there's no GDB API so that anyone could make a profiler out of it. At least I haven't found anything. (So GDB's scriptability seems to be very restricted.)
PuDB is really good IMO. But I suppose it may not qualify as out-of-process. In practice, I don't end up needing it anywhere near as often as I need gdb for c/c++, but it's handy to have just in case.
I don't think I could ever use gdb without google or a cheatsheet at hand, even after years of dealing with it! I'm not even ashamed because of it. It's what it is.
Can be used that way. But should it be? I for one am all for coming up with a gender neutral singular third person pronoun. But I would love it if it wasn't one that already means something completely different.
Yes it can be, and if that were the intention here (rather than a mistake) it is a good example of poor usage. A sentence such as "Julia Evans writes on their blog" introduces a lot of unnecessary ambiguity if "their" = "her".
I assumed I was being downvoted for being off-topic, but perhaps it was because my comment is construed as gender-normative. I was speaking about the misuse of the singular they as separate from gender identification--i.e. presuming you know the gender someone prefers--and I think that point is valid.
I haven't considered the situation you bring up. I am for the singular they when it is being used to describe and indeterminate person. Are you suggesting we use singular they for a particular person until we get explicit confirmation about their pronoun preference?
I can't imagine that making a good faith guess at an appropriate gender pronoun would ever draw criticism.
I wonder if you weren't downvoted for using the subjunctive mood in your comment ("if that were the intention here") which seems to imply that you believe it was not the intention and was instead a "mistake".
You don't fucking care. You write ‘her’ if it's a girl and ‘him’ if it's a boy, then if they ask for something different you apologize and correct yourself. This isn't tumblr.
Well exactly - you don't care and you just write 'their'. You're the one suggesting that we should go out of our way to do detective work to find some evidence of whether someone looks like a man or a woman. It's easier to just say 'their'.
Also, I've seen several comments here of someone making what I presume are honest mistakes in using the wrong pronoun and being publicly shamed for it.
I don't know if it's necessarily easier to read, you are introducing a lot of ambiguity. When you say "Julia's writing on their blog," it sounds like she's writing on a blog that belongs to other people.
For example, what would you refer to me as? "Nadya wrote on her blog" or "Nadya wrote on his blog"? (hint: You'd need to dig rather deep into my post history to find the "proper" one.)
@ruraljuror's example
Further context removes that ambiguity. Although you could rephrase it as "Julia wrote on their own blog" or even remove the pronoun altogether: "Julia wrote on Julia's blog" (which is only ambiguous if there is another Julia).
In that case I like ‘their own’ best, but when it can be inferred from the name then there's no reason to complicate things. If someone's special enough to use their own pronouns then they can ask for it to be corrected.
And Nadya is hard—I would go for gender neutral on that one. :-) (It's not that I dislike singular ‘they’, but sometimes it can make sentences awkward.)
I second this. To add: she writes with an enthusiastic tone, hence entertaining and I still learn something as a bonus. If I happen to already know the topic, then it's still entertaining to read and allows me to go back to basics with whatever she writes about.
I like Julia's posts enough to subscribe to her blog. Her posts on Hacker News also routinely generate extremely interesting, technical discussions. I enjoy both. Also, I don't know Julia except through her blog and twitter.
Sometimes I find it interesting and honestly other times I don't think its in depth enough to be either interesting or discussion worthy. If I found every single post of theirs interesting I would just subscribe to their blog.
Yeah, some of her articles don't add much to what I already know, but I bet other people say the same thing about articles I find really interesting, and where I like that she starts with basics.
Many (maybe "more interesting") articles about deep details of something don't get much attention because you need a lot of previous knowledge to understand what is going on, that hurdle almost never happens with her articles. Introductory articles often get more attention that way.
And sometimes I notice that apparently I didn't know as much as I thought about the details...
It's not every single post. "How do HTTP requests get sent to the right place?" have been posted on HN (because anyone can do that) but not gotten more than a couple of votes. Some, like "PolyConf 2016," didn't get posted at all.
It's fine for you not to find every post on HN interesting or discussion-worthy. There are other people who are interested and having a good discussion.
HN's lack of sections is the real problem. So much interesting stuff gets submitted and drops off the first page of NEW because there is only so much stuff that can get to the front page. subreddits solve this, and another problem is that HN's outgrown its initially designated purpose. If HN wants to stay relevant, it deserves to have some features rethought.
It's amusing to see the downvotes for asking a question.
This just reinforces my suspicion that that this blogger's friends and coworkers are the reason a single blog constantly ends up on the front page of HN regardless of merit.
If you have genuine reason to believe this is the case, please email hn@ycombinator.com and we can investigate, but please don't make such accusations in the discussion threads.
Having written at least 4 complete DWARF readers and writers (GCC's location list support, GDB's expression evaluator, the thing that became google breakpad's debuginfo reader etc), it's really not that bad.
In fact, compared to pretty much any other debug format, it's wonderful. All of the forms are consistent, and outside of the index tables, and a few places where backwards compatibility was needed (ie the world moved from 32 bit to 64 bit, but before that, DWARF supported 64 bit sized debug info on 32 bit processors), the encoding is sane.
libdwarf, on the other hand, is ... not so much. I love david a, and (AFAIK) he's been working on DWARF since SGI was at 1600 amphitheatre parkway, and keeping libdwarf up to date. It's one of those open source projects nobody ever realizes has been around 20 years and that someone has kept it working great (see ftp://ftp.sgi.com/sgi/dev/davea/objectinfo.html)
However, libdwarf is just not a pleasant interface to work with, IMHO.
It's also memory intensive.
If you just want a reader, the thing that made it into breakpad is probably a good reference (others may have better ones, i've thankfully been out of the debug info game for years):
https://chromium.googlesource.com/breakpad/breakpad/+/master...
It's mostly a callback interface, and has a function info reader meant as a demonstration It should work without trouble outside of breakpad (when it was contributed, I made it portable to be able to just compile standalone. it doesn't look like much has changed).
it does not support DWARF4/5, but nobody should need to care.
It also has no expression evaluator but i have a bunch of them if someone needs them :)