I discovered a neat technique in program analysis today, so I’ll try to explain it here. The first section gives a general overview of inferring integer properties of programs, and the second part describes what I found.
Numerical Domains and Difference Constraints
Inferring numerical properties of programs is a pretty important problem in program analysis. Here’s is an example.
if (x > 0) { y = x; } else { y = 10; }
We can infer from that after this program executes, y > 0. Why? Because in the true branch, y is assigned x, which we know to be positive. In the false branch, y is assigned 10, which is also positive.
There are a variety of techniques to infer properties like this. The most common ones are based on a technique called abstract interpretation. The most well-known technique of all is called the polyhedron domain, discovered by Cousot and Halbwachs. It is quite complex to implement, but it is guaranteed to infer all possible linear inequalities satisfied by a program. Unfortunately, it’s also very slow.
A more efficient and simpler technique goes by the name of “difference constraints” or “bounded difference matrices” or the like. In this post I will call it DC. DC can infer any invariant of the form x – y <= c or x <=c or x >= c. Most of the invariants that programmers rely on are of this form, so DC is kind of a sweet spot domain. (Note that constraints like x == y can be converted to the conjunction of x -y <= 0 and y – x <= 0.)
The domain works by putting all constraints into the form x – y <= c. Constraints of the form x <= c or x >= c become x – 0 <= c or 0 – x <= -c (i.e., zero is just treated as a variable). Then it saturates the constraint system by generating all constraints that follow from the known ones by linear combination. For example, if you know that x – y <= 5 and that y – z <= -3, then you can infer that x – z <= 2. The algorithm for saturation is pretty simple: you express the constraints as a weighted directed graph (where the variables are nodes and the constraints are edges) and then use the Floyd-Warshall all-pairs shortest path algorithm.
Once the constraint system has been saturated, you can perform all sorts of operations. First, it’s easy to check if the constraints are satisfiable by looking for a constraint of the form x – x <= c, where c < 0. If you want to see if one set of constraints C1 implies another set C2, you just take every constraint x – y <= c in C2 and make sure that C1 has a constraint of the form x – y <= c’, where c’ <= c.
Probably the most common operation is assignment. If you know that some set of constraints C is true before an assignment, then you want to know the set C’ that holds after the assignment. Assume the assignment is x := e. First we add a new constraint “tmp == e” to the system (using the translation of equality above) and then we saturate the system. Then we eliminate any constraints that talk about x from the system, and rename tmp to x.
Another important operation is the join. This operation takes the constraints generated by two branches of a conditional statement (call them C1 and C2) and tries to find the best set of constraints C that abstracts both branches. For this to be true, the constraints in C1 must imply C and the constraints in C2 must also imply C. To generate C, we take all pairs of variables x and y and look for a constraint of the form x – y <= c1 in C1 and x – y <= c2 in C2. Then we add x – y <= max(c1, c2) to C.
Sparsity
The problem with DC is that saturation is expensive: it’s an O(n^3) algorithm. My goal was to fix this problem, at least in the common case. In the common case, the initial set of constraints is sparse: when you form the directed graph of constraints, there aren’t very many edges. The problem is that saturation adds a lot more edges, and those edges make later operations much more expensive.
In my experience with difference constraints, most of the extra edges added by saturation are caused by the zero node (remember that we treat zero as a variable). As an example, consider these constraints: 0 <= x <= 10, 5 <= y <= 20, 10 <= z <= 30. If we draw the graph for this system, there will be edges between x and the zero node, between y and the zero node, and between z and the zero node. But once we saturate, there will be edges between all pairs of variables saying things like y – z <= 10 and z – y <= 25.
It turns out that for some operations like assignment and checking satisfiability, there’s no need to saturate across the zero node. That is, we allow saturation to take the linear combination of x – y <= c1 and y – z <= c2 as long as y is not the zero node. It’s pretty easy to modify the Floyd-Warshall algorithm to do this. And when you do, the edges that were added in the example in the last paragraph will no longer be added, so everything will be faster and we won’t have lost any precision.
Things are a bit more complicated for join. We can still do the same trick of restricting saturation, but now we will actually lose the ability to infer some invariants. Consider the case where we know x = 1 and y = 2 on one branch and x = 5 and y = 6 on the other. We would like to infer y – x = 1. With normal saturation we will, but not with restricted saturation.
Here’s the solution. We use restricted saturation to get a joined result C from the inputs C1 and C2. Then we iterate over all pairs of variables x and y. We find the shortest path from x to y in C1 by looking for an edge directly from x to y as well as edges from x to the zero node and zero to y. Call this distance c1. Do the same in C2 to get distance c2. Now do the same in C to get c. If max(c1, c2) < c, then we simply add an edge in C directly from x to y with distance max(c1, c2). Otherwise we do nothing.
This technique achieves the same precision as using full saturation, but it adds many fewer edges in practice. My hope is that will be much faster as well, although I haven’t tested it much yet.
Here’s one way to think of this technique. Saturation produces the transitive closure of our constraint graph. However, after we perform operations like join, we’d like to take the transitive reduction so that they have a minimal representation. Unfortunately, the transitive reduction is expensive to compute and isn’t even unique for our graphs. But since we know that zero is the crucial node in most of our graphs, we can compute a sort of transitive reduction, except that the result is only reduced at the zero node.