A lightning round on debugging techniques
- Saturday March 09 2024
- golang
- Series: what-you-were-not-taught-in-undergrad
A series about topics I wish my undergraduate education had covered. These topics may seem quite elementary to many readers, but that is the point.
One thing I wish my undergraduate education had spent more time covering was effective techniques for debugging. Even in a theoretical scenario where you never introduce bugs into your own code, you still need to know effective techniques for debugging others' code.
To that end I've tried to organize the most common techniques I use as part of my day to day tasks. All examples here are in Golang. It is actually relatively rare that I actually attach a debugger to a process to debug anything, so the techniques described here are aimed at source code level modifications that can be made to identify the root cause of bugs. This has the added advantage of meaning these techniques can be used anywhere, not just in your local development environment.
Failure oriented debugging
Failure oriented debugging is the practice of introducing one or more deliberate failures into a codebase. This is done in order to determine if a particular code path ever executes in relation to a bug.
Utility
By adding a specific failure, you can identify if a code path is related to the bug or not. This technique can also be used to identify dead code. If the failure never executes, it can be concluded the code is dead.
Example code
Omission oriented debugging
Omission oriented debugging is the practice of temporarily removing a code path to determine if that code path plays any role in a software bug.
Utility
When software executes without any form of error, crash, or other significant diagnostic output it may not be obvious which section of code is responsible for actually producing the output. In the context of debugging this means the output is incorrect or otherwise wrong. By iteratively removing code paths you can identify which code paths are at fault. This description of this practice is closely related to the idea of programming by permutation but the intent is not to produce a functional piece of software. Instead this method allows you to identify the relevant sections of code.
One important thing about this technique, is that as a developer we are often predisposed to try and make a very narrow omission as part of the debugging process. This sometimes works, but from a time perspective it can be more practical to simply disable a large section of code initially. If this causes a relevant change in output, you can then narrow your change by approximately half the code. You can repeat this until you either understand the problem or the the problem re-appears. This is actually just a binary search and can be very time efficient.
Example code
Sentinel value debugging
Practically speaking all software systems eventually communicate some data to another system. Even if the external system is just the filesystem of the operating system. Sentinel value debugging is the modification of a program to produce a specific value which it otherwise never would.
Utility
When performing analysis of existing code you may arrive at a scenario where individual values used at the source code level appear meaningful in isolation. They may be constants stored in the source code or could be values loaded at runtime from the filesystem or network. Lacking a complete understanding of the algorithm implemented by the source code, it may not be immediately apparent how the individual values are used to produce the output. By introducing a unique value or other pattern into the inputs, you can identify where it appears in the output. This serves as an initial point for developing a better understanding of the algorithms used.
In the event that introducing a specific value into the output of one section of code causes a failure in another section of code you are simultaneously utilizing failure oriented debugging. A common example of this would be introducing a null pointer value or a numerical zero in the place of division.
The secondarily utility to this technique is that if modification of a value produces no change in the output, that value may not actually have any meaning.
Example code
Fixing the example program
The example program I've used is a simple program which produces a scrolling display of text in your terminal.
You can run it like this
echo -n 'now is the good time for all programmers to stop adding bugs to software' | go run word_carousel_fixed.go
The program works fine for long inputs. Short inputs always crash
$ echo -n 'hello' | go run word_carousel.go panic: runtime error: index out of range [5] with length 5 goroutine 1 [running]: main.wordCarousel({0xc0000a2e90, 0x5, 0xc0000a2f20?}, 0xc0000a2e58, 0x15) /home/ericu/www.hydrogen18.com/site/debugging-lightning-round/word_carousel.go:18 +0xf0 main.main() /home/ericu/www.hydrogen18.com/site/debugging-lightning-round/word_carousel.go:39 +0x19b exit status 2
This happens because the program tries to display 21 characters at a time. This does not work for short inputs, because there are not 21 characters to display. This can be solved by just shortening the displayed length when the input length is shorter.