I was talking to my advisor yesterday about the software engineering class that he’s teaching. It seems like no one really knows what to teach in these classes lately. Methodologies like the waterfall model have gone out of fashion, and academia isn’t quite ready to embrace newer ideas like extreme programming, patterns, and all that agile development business. It’s hard to blame the professors, given how little evidence there is that any of these techniques is actually beneficial.
I’m actually not all that interested in what gets taught in introductory software engineering classes. Most students probably learn far more from the project than from the lectures. But I am interested in improving my own programming abilities. And unfortunately there aren’t any classes for advanced software engineers. So here is my question: what sort of material should go in such a class? That is, what would you try to teach someone who has already been programming for five years or so? Or to put it yet another way, how do you turn good programmers into great ones?
I’m not really thinking of a lecture-style class, but more of a web resource. In some ways, I think that a site like stackoverflow.com follows this idea. But it doesn’t really have the format that I’m interested in. I think something like a Wiki would be more valuable. But more likely than not it would have to be moderated, perhaps by the community; the people with the best ideas are rarely the loudest.
Since I’m too lazy to create such a site, here are some topics that I’ve found valuable in the past. I think they’d be interesting material for such a course.
- Language and syntax matter. Sometimes inventing your own little language can really boost productivity. Many many people have written about this before, so I won’t belabor the point.
- Tools also matter a lot. If you’re writing C code, Valgrind will change your life.
- Over-design is a far more pernicious evil than under-design in the long run. This is a lesson that I unfortunately have learned over and over. As such, I still don’t know how to teach it.
- A few years ago, my friend Chet introduced me to some great papers from the ’80s on system design. They are case studies of systems of that day. Mostly they’re written as interviews with the designers and they detail what worked and what didn’t work. Although they’re old, the material is still surprisingly relevant. All of them were written by David Gifford and Alfred Spector. A list of them appears at the bottom of Spector’s web page. Unfortunately, you need ACM access to read them.
- A blog post by my friend Dave perfectly illustrates another topic: the importance of logging, tracing, and visualization.
I’d like to elaborate on the last point in the rest of this post. Dave created a tool that gives you a nice picture of how Mozilla’s TraceMonkey JavaScript JIT spends its time. It makes it much easier to quickly get a sense of what’s going on and where the performance problems are.
A more advanced version of this idea appears in IBM’s TuningFork project. Originally, TuningFork was used to visualize the performance of the Metronome real-time garbage collector. I’ve worked with it in this context as an intern, and it’s incredibly valuable. However, TF is now freely available and you can use it for all sorts of projects.
Tools like TuningFork are valuable whenever your software is doing lots of complicated stuff very quickly. Without a tool, the only way you know whether your software is correct is by ensuring that the final result is correct. With TF, you can watch your program’s execution and make sure that it’s doing all the right steps. And since you can track how long these steps take, you can debug performance problems.
Some people argue that printf is the only kind of debugging that they need. Although it’s extreme, this is not an entirely invalid point. It’s interesting to examine how a tool like TF improves on printf. The first and most obvious way is that graphs are usually easier for your brain to process than text. But this reasoning isn’t always applicable, since sometimes text is the most effective way to represent data. In those cases, I think that there’s one additional advantage that custom tracing tools have over printf: they can summarize the data and report it in different ways depending on what the user is interested in at the time. You can do this with printf, too, but it requires changing your printf code each time you want to view the data in a different way, and that gets unwieldy very quickly.
In my own work, I’ve found that a very simple form of summarization is surprisingly effective. I write all my trace data out in a tree-structured form. Then my trace viewer allows me to hide and show subtrees. (It also permits searching, but you can do that with printf too.) The ability to inspect a particular subtree and ignore the rest is pretty useful. Rather than wading through a giant log file, you can selectively view only the sections that matter. I’ve also added features so that I can highlight a particular part of the trace and immediately re-run it inside the debugger, allowing me to see stack traces and inspect data.
I’d like to end the post with a link to a slightly different tool. It’s from a company called Azul Systems, and they call it Real-Time Performance Monitor (RTPM). Unfortunately, you have to sign a contract with Azul for like $1,000,000 just to be able to use the tool. But they have a video that shows how the tool is used. Simply watching it was a learning experience for me, since I realized how effective performance monitoring tools can be. This one is the best I’ve ever seen. Although the video is long, I promise you’ll be amazed at what they can do.
Tags: programming, tracing