Actually Useful Stack Traces

Stack traces, backtraces—whatever you call them, I never liked them. Every time I saw one, the only thought in my mind was: "And now what?" Then I discovered a particular implementation, and I felt like a kid in a candy store. I guess I know what made it so useful for me, and thus, so exciting. We will take a look at a visual demo and talk about how one can build an interactive debugger on top of it; but first, we need to discuss three issues.

The problem is simple: the usability of stack traces sucks. Let's take a look at primary suck points, and we will quickly understand how to avoid them and how to get a good UX:

Stack trace is a text
It's either convoluted or lacking
It's upside down

Why is it upside down? That's a very good question at the very right time. As you can see, the narrative of this post goes from top to bottom and most recent things appear … at the bottom. Your focus naturally follows it. The same happens, when you read output of the program. Let's take a look at this:

/home/bob/tmp/demo.js:10
  notDefined();
  ^

ReferenceError: notDefined is not defined
    at thirdFunction (/home/bob/tmp/demo.js:10:3)
    at secondFunction (/home/bob/tmp/demo.js:6:3)
    at firstFunction (/home/bob/tmp/demo.js:2:3)
    at Object.<anonymous> (/home/bob/tmp/demo.js:13:1)
    at Module._compile (node:internal/modules/cjs/loader:1565:14)
    at Object..js (node:internal/modules/cjs/loader:1708:10)
    at Module.load (node:internal/modules/cjs/loader:1318:32)
    at Function._load (node:internal/modules/cjs/loader:1128:12)
    at TracingChannel.traceSync (node:diagnostics_channel:322:14)
    at wrapModuleLoad (node:internal/modules/cjs/loader:219:24)

The recency and relevance of the information drops rapidly with every new line. Which is kind of the opposite of the reading flow direction we were discussing a couple of sentences ago. Surprisingly, most of the mainstream languages (except maybe Python) do this. From usability standpoint it doesn't make much sense: when program finished printing the trace, I have to scroll up to get to the most interesting stack frames. In this tiny example it may not seem that critical, but in real world scenarios stack traces can become beefy very quickly, so it will become a pain in the back. And last but not least: this order of frames is important for making a debugger, but we will cover it closer to the end of the post.

Are there any cases, when it makes sense to print it upside down? Yes. Will I tell you about them? No.

Okay, we covered the printing direction, now let's talk about the content. To represent the state of the stack we need to communicate a lot of information: function names, values of arguments, frame numbers, local variables, corresponding source code location. Presenting it to the developer in a meaningful way is an impossible challenge: it will be always a tradeoff between comprehensibility and readability.

Yes, there is no universal solution here: you either provide too much and it becomes noisy incomprehensible mess or not enough and dear stack trace reader can't do a thing with it. And here we come to a very important point: I want to do a thing with it and don't want to be flooded with wall of text at the same time.

Let's take a look at almost good stack trace and explore this exact problem. One small note here, for function calls it uses a slightly special notation (function-name arg1 arg2) instead of more usual function_name(arg1, arg2), be aware.

In ice-9/boot-9.scm:
  1755:12 11 (with-exception-handler _ _ #:unwind? _ # _)
In unknown file:
          10 (apply-smob/0 #<thunk 7f46684f4300>)
In ice-9/boot-9.scm:
    724:2  9 (call-with-prompt _ _ #<procedure default-prompt-handle…>)
In ice-9/eval.scm:
    619:8  8 (_ #(#(#<directory (guile-user) 7f46684f7c80>)))
In ice-9/command-line.scm:
   185:19  7 (_ #<input: custom-port 7f46684f1850>)
In unknown file:
           6 (eval (begin (use-modules (ares suitbl)) (#)) #<directo…>)
In ares/suitbl.scm:
    499:6  5 (test-runner _)
   264:31  4 (_ ((type . print-test-suite) (test-suite #<proced…> …)))
In ice-9/boot-9.scm:
   260:13  3 (for-each #<procedure 7f465be50c60 at ares/suitbl.scm:…> …)
In ares/suitbl.scm:
   325:13  2 (test-reporter-hierarchy ((type . print-test-suite) (…)))
    291:2  1 (tests->pretty-string #f)
In ice-9/boot-9.scm:
    218:9  0 (map #<procedure 7f465bbab160 at ares/suitbl.scm:292:3…> …)

ice-9/boot-9.scm:218:9: In procedure map:
In procedure map: Not a list: #f

First of all, it's printed from the beginning to the end, the most recent and relevant calls are at the bottom of the output, just above the problem message, and this is already a huge improvement over the initial example. The ordering will matter even more for debugger: when you jump from break point to break point the first part of the stack remains the same and only the tail changes and this order allows to easily redraw the tail.

Second of all, I can see arguments to the function. Here #f means False, #<procedure ...> is an object of type procedure and ((type . print-test-suite) (test-suite #<proced…> …)) is a list of two pairs: type to print-test-suite and test-suite to the list of procedures (you probably noticed omitted dot and missing parenthesis, yeah). Not the most convenient notation, but at least the arguments are here.

If you stare at this stack trace long enough, you can even understand (to some degree) what was happening before we got into troubles.

BTW, do you find it a bit noisy? Those file names, line and column numbers — visual clutter I don't care about. Yes, I would like to jump to the source code and look around, but we can do so without clogging the output. Let's go back for a moment and try to debug our problem.

From the exception message I can guess that we passed #f instead of a list to the map function. And here is the first issue: we are lucky to have useful exception message, but we don't see arguments passed to the map in the stack trace, because they were truncated to fit the terminal width.

We still can guess that the root cause is somewhere earlier because tests->pretty-string also got #f as an argument. The problem may be in the test-reporter-hierarchy function, and again, we can't see arguments passed to it, moreover, we don't see local variables. Thus it's unclear why tests->pretty-string got called with #f, so we are basically stuck and can't proceed our investigation using only the stack trace.

We try to make stack trace useful — it gets overcrowded, we try to make it readable — it becomes lacking. Are we doomed and can't do anything about it? No. The primary problem is that we try to fit everything into the text representation, but computers are much more than just text printing machines.

I will skip the part where we discuss obvious solution: to make stack trace ultimately informative and create a special tool, which parses it and provides a rich interactive GUI/TUI. Instead, we can go straight and store a structured data about stack frames in a format more suitable than the plain text and continue the discussion from here.

Okay, imagine, we discarded the text-only representation, now let's make a slick interactive UI. That means, while we still store all auxiliary information, we don't need to make it visible to the readers and flood their brain with it.

(test-runner ((type . run-scheduled-tests)))
(loop (#<procedure test-reporter-hierarchy (message)>))
(for-each #<procedure 7fd42192d3a0 at ./ares/suitbl.scm:257:14 (r)> _)
(test-reporter-hierarchy ((type . print-test-suite) (test-suite . (simple-test))))
(prettify-list #f)
(map _ #f)

Origin: "map"
Not a list: #f

This is much cleaner version of the previous stack trace. The click on the function name leads to the source code location, where this function gets called. I can use keybindings to jump back and forth between stack frames and now can easily understand the flow of the code. We've reduced feedback loop (the time from seeing a stack trace and finding a place of potential failure) from seconds and minutes to fractions of a second. No hassle, no guessing, just pure usability.

It's not a theoretical reasoning, my very smart and diligent intern Noé and I already implemented it in Ares/Arei IDE and this is a video of how it works: the video.

The tool has already saved me hours of my life and I enjoy every moment I spend with it. And yes, it has some room for improvement, so let's quickly go through the things around the corner:

Value Inspector :: instead of using a text representation for highly nested data structures, we can just put a clickable link that will bring up a value inspector.
Local Variables :: when we jump from frame to frame, it would be cool to see the lexical scope of the function in that stack frame.
Code Evaluation :: evaluating code in the context of exception is a sick functionality, almost a miracle, but you won't believe it's possible anyway. When we get it implemented, this "just a stack viewer" will turn into the most powerful debugging machine known to human race, hehe.

Ares/Arei Guile Scheme IDE is a Free and Open Source Software, so feel free to play around, study, improve or whatever you want to do with it.

TLDR and my recommendation: if you have to make text-only stack traces, provide as much information as possible and sacrifice readability: it's easy to filter out the noise, but extremely hard to get the needed info when data is missing. If you have access to computers not only through a teletype console, just provide a nice interactive UI and make your stack traces readable, actionable and actually useful.