Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How does gdb work? (jvns.ca)
320 points by deafcalculus on Aug 10, 2016 | hide | past | favorite | 56 comments


"Anyway my impression is that DWARF is a large and complicated standard (and possibly the libraries people use to generate DWARF are subtly incompatible?), but it's what we have, so that's what we work with!"

Having written at least 4 complete DWARF readers and writers (GCC's location list support, GDB's expression evaluator, the thing that became google breakpad's debuginfo reader etc), it's really not that bad.

In fact, compared to pretty much any other debug format, it's wonderful. All of the forms are consistent, and outside of the index tables, and a few places where backwards compatibility was needed (ie the world moved from 32 bit to 64 bit, but before that, DWARF supported 64 bit sized debug info on 32 bit processors), the encoding is sane.

libdwarf, on the other hand, is ... not so much. I love david a, and (AFAIK) he's been working on DWARF since SGI was at 1600 amphitheatre parkway, and keeping libdwarf up to date. It's one of those open source projects nobody ever realizes has been around 20 years and that someone has kept it working great (see ftp://ftp.sgi.com/sgi/dev/davea/objectinfo.html)

However, libdwarf is just not a pleasant interface to work with, IMHO.

It's also memory intensive.

If you just want a reader, the thing that made it into breakpad is probably a good reference (others may have better ones, i've thankfully been out of the debug info game for years):

https://chromium.googlesource.com/breakpad/breakpad/+/master...

It's mostly a callback interface, and has a function info reader meant as a demonstration It should work without trouble outside of breakpad (when it was contributed, I made it portable to be able to just compile standalone. it doesn't look like much has changed).

it does not support DWARF4/5, but nobody should need to care.

It also has no expression evaluator but i have a bunch of them if someone needs them :)


Nice writeup! I've done some work with libdwarf and I agree that it isn't very pleasant. Do you have any suggestions for better interfaces that support writing?


How much of a writer do you want?

Just something that writes DIE's?

Or do you need line info, accelerator tables, etc?


I think just DIE's would be sufficient, definitely don't need accelerator table stuff.

I've actually considered looking more at the LLVM dwarf writing components. Do you have any experience with that?


Thank you. Most informative !


If you haven't, take some time to look through the list of their blog posts at http://jvns.ca/ (yes I know you can click there from the post but I'm trying to make it really easy) because they have written an incredible amount of very interesting blog posts.


Some of my personal favorites from her blog:

http://jvns.ca/blog/2016/03/16/tcpdump-is-amazing/ quick introduction to TCP dump

http://jvns.ca/blog/2014/09/27/how-does-sqlite-work-part-1-p... diving into SQLite and sharing the findings with the readers

http://jvns.ca/blog/2014/08/12/what-happens-if-you-write-a-t... implementing TCP in Python

Most of them have valuable HN discussions as well.


Thank you! Your comment prompted me to look closer at her blog. There's some great stuff there!


"They" are in this case is one prolific lady called Julia Evans.


Every time I use GDB, I feel like it was almost deliberately designed to make me hate using it, and those example outputs really show why; compare WinDBG's output dumping bytes (and corresponding ASCII in the usual hexdump format, something that GDB just doesn't seem to support at all...):

http://3.bp.blogspot.com/-J5bsfRdkOdk/UsHqCho2huI/AAAAAAAACV...

and structures:

https://msdnshared.blob.core.windows.net/media/TNBlogsFS/Blo...


While I don't particularly like how gdb does this either, you can:

* "set print pretty on" , this makes structs readable

* "x/20b &var" to show a hex dump. But no ASCII, and you have to give it the count (e.g. 20 bytes..)



interesting to see folks like this :) btw, the paper "introduction to dwarf debugging format" (available here: http://www.dwarfstd.org/doc/Debugging%20using%20DWARF.pdf) is quite approachable. also, in case folks get carried away, there is always the "linkers and loaders" book for in depth analysis.

edit-001 : minor formatting updates


> If we want to find the address of a global variable in our program, all we need to do is look up the name of the variable in the symbol table, and then add that to the start of the range in /proc/whatever/maps, and we're done!

We just saw three ranges in ///maps:

    5598a9605000-5598a9886000 r-xp 00000000 [...]
    5598a9a86000-5598a9a8b000 r--p 00281000 [...]
    5598a9a8b000-5598a9a8d000 rw-p 00286000 [...]
what are these? The r??p look like permissions, read/write/execute/something, but how do we know which one to look for the variable in?

(And what are the numbers afterwards? 00281000 on the second line is the length of the range on the first line, but then there's a gap of 00200000 between the end of the first range and the start of the second. 00286000 on the third line is again 00200000 less than the distance from the start of the first range and the start of the second.)


The man page for /proc (yes, there is one) explains what all those fields mean.

In this case, each of those mappings corresponds to one of the sections in the binary. The permissions indicate that the first one is executable code, the second is read-only data, and the third is writable (copy-on-write). The number after the permissions is an offset into the underlying file.

I suspect the article is glossing over some details e.g. how gdb figures out which mapping corresponds to which section, but it gets the basic idea across.


Wow in all these years using Linux I never thought of looking for the man page for /proc. Lots of info there...


And the next time she will find out how gdb breakpoints work, int3.

Something like http://www.cs.columbia.edu/~junfeng/09sp-w4118/lectures/int3...


X86 debug registers (DR0-3, DR6, DR7) are also useful. They don't require code changes. As an added bonus, you can set a breakpoint for a variable access, which triggers whenever a certain variable is accessed.

Although unfortunately debug registers are limited to just 4 simultaneous breakpoints in the same time.

http://wiki.osdev.org/CPU_Registers_x86#Debug_Registers


I'd love to see a similar dissection of WinDBG, how it works and why.


Speaking of GDB and jvns, how come there's no good out of process profiler for Python?

http://jvns.ca/blog/2016/06/12/a-weird-system-call-process-v...

I love the JVM's easily trace-ability, though that involves safepoints, so that's not completely out of process either.


Pure guesswork:

Java stems from Sun. Sun made Solaris. And Solaris had an ideology about transparency and discoverability of running systems. They have some outright amazing tools for introspection into a binary that is running. This probably creates an environment in which the same would happen for the JVM. In fact, the `jstack` name seem very familiar to DTrace (and mdb(1)) users.

Days since I last solved a problem on the JVM through a heapDump: 2.


Great question. I wondered the same thing a while ago, and tried to build one using SystemTap (https://github.com/emfree/pystap). Couple reasons why this isn't too easy:

* "Python" in general might mean you're on Linux/Windows/whatever, and it might mean CPython, PyPy, or some other runtime. But any out-of-process instrumentation is gonna have to be pretty platform/runtime specific.

* Even if we restrict ourselves to, say, CPython on Linux, the interpreter's internals aren't super friendly to this sort of inspection from the outside. You have to rely on and also work around implementation details.

Example: to get a Python call stack, you want to look at `PyThreadState_Current` (basically the same idea as `ruby_current_thread` in that excellent linked post of Julia's, I think). But this happens to be null whenever the GIL is released, e.g. when doing network I/O, and then you're kind of out of luck. So you'll already have trouble usefully profiling a single-threaded I/O-intensive program.

* Oh and you pretty much need debug symbols in your CPython binary (I think? Tell me if this isn't true!). Most production CPython builds don't have them. So you have to get the right binary, and rebuild any application dependencies with C extensions. Not hard but annoying.

There is potential though! With some work, we definitely could have a better story for out-of-process Python profiling a la Linux perf.


Depends on how the language runtime defines its calling convention, does it follow System Linkage [1] or does it implement its own internal linkage ie Private Linkage?

If it follows standard System Linkage, its easy to point gdb or any other system debugger or profiler to debug and profile the application.

Some runtimes have a mix of System and Private linkage, ie C functions will follow System Linkage but JIT'ed code frames might follow private linkage. This makes for difficult stack-walking by system native debuggers and profilers. You'd have to teach GDB via an extension how to walk the non-standard frames.

So yea, long story short, it depends on the linkage convention the implementers of the language runtime decided to follow.

[1] http://www.x86-64.org/documentation/abi.pdf


I mean, GDB already has good scripts to extract CPython stuff from inferior processes, but there's no GDB API so that anyone could make a profiler out of it. At least I haven't found anything. (So GDB's scriptability seems to be very restricted.)


PuDB is really good IMO. But I suppose it may not qualify as out-of-process. In practice, I don't end up needing it anywhere near as often as I need gdb for c/c++, but it's handy to have just in case.


I'd love to know the answer to this.


It does? Well at least as long as you don't mention threads. Or try the cross host debugging "experience".


I don't think I could ever use gdb without google or a cheatsheet at hand, even after years of dealing with it! I'm not even ashamed because of it. It's what it is.


yes, and "they" can be used to describe a single person: https://en.wikipedia.org/wiki/Singular_they


We detached this subthread from https://news.ycombinator.com/item?id=12259967 and marked it off-topic.


Can be used that way. But should it be? I for one am all for coming up with a gender neutral singular third person pronoun. But I would love it if it wasn't one that already means something completely different.


Yes it can be, and if that were the intention here (rather than a mistake) it is a good example of poor usage. A sentence such as "Julia Evans writes on their blog" introduces a lot of unnecessary ambiguity if "their" = "her".


How are you supposed to find out the pronoun that someone prefers in order to write that though?


I assumed I was being downvoted for being off-topic, but perhaps it was because my comment is construed as gender-normative. I was speaking about the misuse of the singular they as separate from gender identification--i.e. presuming you know the gender someone prefers--and I think that point is valid.

I haven't considered the situation you bring up. I am for the singular they when it is being used to describe and indeterminate person. Are you suggesting we use singular they for a particular person until we get explicit confirmation about their pronoun preference?


I can't imagine that making a good faith guess at an appropriate gender pronoun would ever draw criticism.

I wonder if you weren't downvoted for using the subjunctive mood in your comment ("if that were the intention here") which seems to imply that you believe it was not the intention and was instead a "mistake".


You don't fucking care. You write ‘her’ if it's a girl and ‘him’ if it's a boy, then if they ask for something different you apologize and correct yourself. This isn't tumblr.


Well exactly - you don't care and you just write 'their'. You're the one suggesting that we should go out of our way to do detective work to find some evidence of whether someone looks like a man or a woman. It's easier to just say 'their'.

Also, I've seen several comments here of someone making what I presume are honest mistakes in using the wrong pronoun and being publicly shamed for it.


I don't know if it's necessarily easier to read, you are introducing a lot of ambiguity. When you say "Julia's writing on their blog," it sounds like she's writing on a blog that belongs to other people.


> it sounds like she's writing on a blog that belongs to other people

It doesn't to me. Maybe it varies by region.


How do you judge if it is a boy or a girl?

For example, what would you refer to me as? "Nadya wrote on her blog" or "Nadya wrote on his blog"? (hint: You'd need to dig rather deep into my post history to find the "proper" one.)

@ruraljuror's example

Further context removes that ambiguity. Although you could rephrase it as "Julia wrote on their own blog" or even remove the pronoun altogether: "Julia wrote on Julia's blog" (which is only ambiguous if there is another Julia).


In that case I like ‘their own’ best, but when it can be inferred from the name then there's no reason to complicate things. If someone's special enough to use their own pronouns then they can ask for it to be corrected.

And Nadya is hard—I would go for gender neutral on that one. :-) (It's not that I dislike singular ‘they’, but sometimes it can make sentences awkward.)


I have a question, why is it that every time this person/company updates their blog with a new entry it ends up on Hackernews?

In my opinion this starts to feels very self self-promotional.

As someone who enjoys variety and diversity I think it's a valid question.


Almost all of her posts are about demystifying the "magical" parts of a modern system usong code and hacking on things.

There are lots of books we could all read instead, but she uses a great mix of explanations and code and has an approachable reading style.

She also uses tools that we might want to use, so it's less abstract.


I second this. To add: she writes with an enthusiastic tone, hence entertaining and I still learn something as a bonus. If I happen to already know the topic, then it's still entertaining to read and allows me to go back to basics with whatever she writes about.


I like Julia's posts enough to subscribe to her blog. Her posts on Hacker News also routinely generate extremely interesting, technical discussions. I enjoy both. Also, I don't know Julia except through her blog and twitter.


Perhaps because they post a lot of interesting stuff.

This article isn't stuff I didn't already know in some form, but now it's made me want to write a debugger! :)


User deafcalculus submits stories from a variety of sources.

The domain jvns.ca gets submitted by a variety of people. (Lots from ingve, but that user submits a lot of other stuff too.)

So it's probably that people post it, and other people upvote it.

Do you think it's something that is interesting?


Sometimes I find it interesting and honestly other times I don't think its in depth enough to be either interesting or discussion worthy. If I found every single post of theirs interesting I would just subscribe to their blog.


Yeah, some of her articles don't add much to what I already know, but I bet other people say the same thing about articles I find really interesting, and where I like that she starts with basics.

Many (maybe "more interesting") articles about deep details of something don't get much attention because you need a lot of previous knowledge to understand what is going on, that hurdle almost never happens with her articles. Introductory articles often get more attention that way.

And sometimes I notice that apparently I didn't know as much as I thought about the details...


It's not every single post. "How do HTTP requests get sent to the right place?" have been posted on HN (because anyone can do that) but not gotten more than a couple of votes. Some, like "PolyConf 2016," didn't get posted at all.

It's fine for you not to find every post on HN interesting or discussion-worthy. There are other people who are interested and having a good discussion.


HN's lack of sections is the real problem. So much interesting stuff gets submitted and drops off the first page of NEW because there is only so much stuff that can get to the front page. subreddits solve this, and another problem is that HN's outgrown its initially designated purpose. If HN wants to stay relevant, it deserves to have some features rethought.


It's amusing to see the downvotes for asking a question.

This just reinforces my suspicion that that this blogger's friends and coworkers are the reason a single blog constantly ends up on the front page of HN regardless of merit.


"Why is this here" questions are very boring and always get downvotes.

"I suspect vote rigging" accusations aren't nice. You should send them to mods at the email address rather than post them to the thread.


If you have genuine reason to believe this is the case, please email hn@ycombinator.com and we can investigate, but please don't make such accusations in the discussion threads.


I don't know that person at all, but I've noticed that she writes good posts that are worth reading.


Hehe, "regardless of merit".

You really think her blog posts have undeserved merit? :-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: