Tuesday, November 9, 2010

On the Notion of Ghost Variables

When one reads the literature of formal methods, especially that of formal verification, one encounters the notion of ghost variable also called model variable. In short, they are variables added to a program for the sake of verification to express simply and concisely what is the data abstraction a certain module is supposed to be implemented. They reflect very well the a posteriori nature of verification because they add the variables to a presumably existing program. They are characteristically absent from the regular data flow and control flow and, as such, their values need not be stored during the execution of the program. They are mere abstractions.

The naming seems to come from the fact that the predominant role the verification community gives to a program variable is to be an abstraction for the storage of a category of important values. A variable for which the values need not be stored then feels kind of odd: it looks like a variable but it's not really one. Hence the special attribute is added to warn anyone reading the program that this is a strange sort of variables.

We can take a different view with respect to variables though. If we try to understand a problem to then create a program to solve it and a compelling argument for trusting the results of the latter, we need a way to speak of the problem. Namely, it is important to identify the central abstractions, the objects of central importance and give them a name so that their properties of interest can be spelled out using that name rather than mainly in terms of their relation to other objects. One such example is given by:

Jim and Julia's brother play football together. During one match, Jim tackled Julia's brother and hurt him badly.

Compare it to:

Jack and Julia are siblings. Jim and Jack play football together. During one match, Jim tackled Jack and hurt him badly.

Notice how disentangled the second version of the story is. Every sentence states a fact and, if we are interested in the accident, we realize that Julia has very little role to play in it and the first sentence can be ignored altogether without missing any information. In the first version of the story, Julia and her relationship to one of the players is dragged all through the story although, as far as we can see, she plays no role in it herself. More about the issue of naming in mathematical arguments (which is relevant in software design) can be found in [0], in chapter 15 which is dedicated to the subject.

The point I am coming at with all this rambling on naming is that, during the analysis of a programming problem and the design of a solution, the introduction of variables should be seen as the introduction of names for the relevant quantities. The fact that the binary representation of the exact set of variables introduced should not be of immediate concern. A confidence that an efficient representation can be found later on is healthy but anymore thoughts about them should be postponed. What Dijkstra calls a change of coordinates in [1] can be applied later to improve efficiency. In the mean time, the prime concern is to establish sufficient conditions for an implementation to be correct. It is also possible that the set of variables chosen during the design process don't have to be transformed in order to get an efficient implementation and, again, it is of no concern while we come up with them.

When I say those problems are of no concern, it means we should concentrate on them but also, we should not let the vocabulary that we use reflect a distinction that we are not ready to make. For that purpose, I would abandon the notion of model or ghost variables or, at least, downplay there role considerably and introduce the notion of stored variables. Those are the variables of the final implementation for which the value has an impact over the control flow and the data flow directed to the output variables. By symmetry, we can call the others model or ghost variables but, since we reduce their importance to the final implementation stage and that, even there, they are the ones to be left out rather than kept, omitting to name them would not be a big methodological impediment.

Regards, Simon Hudon ETH Zürich Tuesday the 9th of November 2010

References

  [0] On the Shape of Mathematical Arguments, A.J.M. van Gasteren   [1] EWD1032 - Decomposing an Integer in a Sum of two Squares, E.W. Dijkstra, available at here