Scientific Debugging, Part 3: Space and Time

Engineering Insights

Published in

Machine Words

5 min readJan 11, 2019

This is part three of a three-part series on scientific debugging, that is, debugging software using the scientific method. You may want to read part one and part two before you read this part.

Debugging…in spaaaaaaaace!

Sometimes it’s not so easy to conduct experiments on your software. In some cases, the computer that you need to debug is very far away…250 million miles away!

NASA’s Deep Space One probe was the first time an autonomous spacecraft used an AI-based “expert system” to plan it’s mission goals.

Normally the way that space probes work is that when you want it to do something, you have your engineers write a program, upload that program to the spacecraft, and execute it. However DS1’s planning software (developed by a team supervised by my good friend Dr. Barney Pell) was smart enough that you could simply upload a list of goals — it would check the available resources (fuel, battery power, and so on) as well as the current position and velocity of the spacecraft, and figure out the minimum set of valve turnings, electrical switch activations, and thruster firings needed to achieve those goals.

Unfortunately, there was an error in the low-level operating system scheduler (not part of the AI planner) that was causing the system to lock up. The mission engineers at NASA Ames needed a way to debug the system quickly: they only had a short time window to solve the problem, and there was a 45-minute lightspeed delay communicating with the spacecraft. There would only be a limited number of opportunities to interact with the software.

Fortunately, the spacecraft’s Lisp REPL (Read-Eval-Print Loop) was still functional, so the engineers could send complex command-lines to be executed. The engineers worked hard to come up with a set of experiments that would yield the maximum amount of information in the smallest number of radio transmissions. Each uploaded input command line had to contain tests for multiple hypotheses; the sequence of commands was, in effect, a decision tree. Using these techniques, they managed solve the problem after just two commands!

This is a fascinating story, and I highly recommend watching the video which tells it:

Slow Experiments

Now, you may be thinking “well, that’s an interesting story, but it’s not likely to be relevant in my workplace.”

Not so fast! There are lots of cases where running an experiment may take days or even weeks. You may have to wait a long time before you see the results.

An example of this is a web server in a production environment. Suppose you have an error that is only happening on production systems; so far you have not been able to reproduce it on your test or staging servers. This means you need to add experimental code to the production server to diagnose the problem.

Unfortunately, most businesses won’t allow their engineers to simply log in to production servers and start making random tweaks to the code, unless it’s a dire emergency. And with good reason! There’s a pretty high probability of making a mistake that will seriously impact users. After all, the most likely time for introducing a new bug is when you are rushing to fix a different bug.

Instead, your experimental code is going to have to go through the regular release cycle for your organization, including QA and staging. That may take a week or longer, depending on your release cycle.

Ideally you’ll want to minimize the number of deployment cycles needed to debug your problem. Just like the engineers at NASA, you’ll want to collect as much data as you can for each experiment that you run. Your debugging code should gather as many facts as possible and organize them in a way that will allow you to test multiple theories in each iteration.

Now, what if this production bug only shows up rarely, such as an exception that happens for some users but not others? In cases like this, the best solution is probably to install an exception-logging and analysis package like BugSnag, which will record exceptions and produce reports and charts showing which errors are happening the most frequently.

But that still won’t get you a rapid turnaround; you may fix a reported bug, but you may not see the result until several weeks later when you examine the updated reports.

Performance Tuning

Another challenging kind of bug is a performance bottleneck: the program is displaying the correct result, but is doing so far too slowly. Often, slowness is the fault of one small component of a large system, and fixing it means identifying which component is slow and then optimizing it. In other cases, slowness may be the result of some aspect of the larger architecture or some quirk of the data; certain inputs may cause the program to run much more slowly than normal.

Here the symptoms of the bug are temporal rather than visual; we are concerned primarily with time. And humans aren’t very good at accurately gauging time intervals. We’ll want precise measurements, so we’ll devise a test and then record the elapsed time. Of course, execution speed can vary based on external conditions, so our measurements won’t be exact. We may want to do multiple test runs and combine the results.

Simple performance bottlenecks can be analyzed by a code profiler or other performance measurement tool. However, that will only tell you which functions are slow; it will not necessarily tell you the specific conditions under which slowness occurs.

For example, suppose you have a keyed collection class that stores a large number of values; and let’s imagine that you notice that after about ten minutes the insertion operation becomes very slow. Curiously, you also notice that for some types of input data, the slowness doesn’t happen until much later, perhaps an hour. Simple profiling doesn’t help, since the same lines of code are either slow or fast depending on the input.

In this case, we can form experiments by synthesizing various different kinds of input data. An example hypothesis might be something like “string keys are slower than integer keys”, or “integer keys that are clumped unevenly cause slowness more quickly than integer keys that have an even distribution”, or “the slowness happens when the hash buckets get full”. We can generate data tailored to each hypothesis, feed that data to the method under test, and then measure the result. Each subsequent experiment will reveal more about the program behavior, bringing us closer to an answer.

Conclusion

The examples in this series of articles are intended to illustrate the diverse kinds of hypotheses and experiments that you can generate as part of your scientific reasoning process. Although some of the examples are simple and artificial, the thinking and the deductive logic behind them is real, and is used in debugging real problems.

It seems appropriate to end with a quote from the immortal Sherlock Holmes:

Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth.

Want to ask a question about software engineering practices? Email asktalin@viridia.org!

Scientific Debugging, Part 3: Space and Time

Engineering Insights

Debugging…in spaaaaaaaace!

Slow Experiments

Performance Tuning

Conclusion

See Also

Engineering Insights

Life lessons for the aspiring software engineer

Talin’s Index of Essays

Diverse topics including computer games, politics, philosophy and science fiction.

Written by Talin