At the moment, the widespread use of libraries is characterized by the reuse of code in various contexts outside the one which gave it birth. It has been made much easier by the development of .NET --which I salute as being a great technical achievement-- because .NET allows the combination of code written in different languages. I believe it is a stepping stone with respect to exposing programming languages as reasoning aids rather than execution models and it is only hampered by Microsoft's lack of commitment towards portability.
However, I suggest that, although it is an important step, it is not satisfactory as far as the concern for reuse goes. I propose that we start thinking of the reuse of the thoughts behind the code and that they are the things of foremost importance. It does not mean that reusing code is not important -- assuming we preserve the documentation of its interfaces-- but it does not go far enough. Indeed, in code, all the different ideas justifying the design choices are mangled together and one can hardly look at each aspect in isolation. In the community of programming languages, several attempts have been made to provide better modules but, so far, we still see a lot of code where different concerns end up intertwined with one another and I expect it will remain the case.
I think that a good modularity cannot be achieved in (what is understood today as) a programming language because of the universal assumption that the code a programmer types has to be directly (through compilation) executable.
My favorite example comes from parallel programming. In the field of programming language design, many mechanisms have been proposed to make the task of parallel programming easier. It includes semaphores, monitors and message passing. They make it easier to avoid low level design errors like race conditions but they still don't address design issues. In each case, the code is still organized in processes; since the behavior of a parallel program is really a property of the combination of the processes, a change to said behavior requires a change of many modules when the modular boundaries are drawn around processes.
Aside: I consider threads to be a special kind of processes and I don't think that, at the design level, differentiating between the two is of any help at all. I will therefore not mention threads again. (end of aside)
As a comparison, UNITY ([CM88]) uses action systems as the basic unit of composition. An action system has some local variables and can share variables with its environment. It also has a set of atomic actions that can modify the variables. The important thing is that it is possible to make assumptions about how the data is going to be manipulated, to enforce the assumptions and to use them to prove some safety and progress properties. When someone wants to change the functionality of a program, either he can add a new module built around the new properties or he can add the properties to an existing module, if it is close enough to what the module was built for. The key here is to see that programs properties (specifically safety and progress properties) are closely mapped to modules so we have a very good encapsulation.
At the end of a development, a UNITY system consists of a set of actions and a set of variables both partitioned between modules. If someone wants to execute the system, he needs first to choose the platform on which he wants to execute it (e.g. it could be a network of computers or a network of processes residing on one computer) and then partition the various variables and actions of the system between the components of the platform without consideration for the modular boundaries. Some variables might become channels between processes or between computers, others may be stored in the memory of a computer, private to one process or shared and possibly protected by locks.
In sequential programming, the control flow is something that can be closely tied to program properties and it can therefore be safely attributed to a modular unit --such as a procedure-- because the properties of interest --preconditions, postconditions and termination-- closely apply to syntactic structures describing the control flow. That is to say that pre- and postconditions are paired and, between them lies a sequence of statement which are structured in a proper fashion for proving correctness. Similarly, loops are associated with invariants and variants and nothing outside of the loop is relevant for reasoning about said loop.
In contrast, [FvG99] shows the kind of proof obligation one has to contend with when working with processes in parallel programming. In their theory, each assertion has local correctness, which is analogous to partial correctness in sequential programming, and a global correctness which relies on the structure of the other processes. At this point, it is unthinkable to take one process away and to expect that the properties one has proven about it will hold in a different environment. In short, the interface between processes, when made explicit, is too far from a thin interface to allow processes to properly encapsulate anything.
In UNITY, the fact that it is based on action systems rather than processes makes it possible to attribute a precise and small set of properties to a module and work in isolation on it. The usual process is called refinement and it makes all the intermediate design decisions explicit. It makes the effort spent on the development easier to transpose to different situation even when none of the code produced is actually useful.
The UNITY method has proof rules and formal semantics and the most rigorous approaches will take advantage of them. However, it is not necessary to use the whole battery to reap the benefits of the method. Since modularity is something they did well with UNITY, using it and its temporal logic as a guideline for design can already be beneficial in pointing out what we must be careful about.
In short, I believe that it is a property of paramount importance for modular boundaries to confine each programming task to the construction or modification of only one module. The best possible way to achieve that is to choose a unit of modularity for which the proofs of correctness need only rely on very few, simple and chosen a priori --that is, chosen as much as possible independently of any implementation-- assumptions of the surrounding environment. Some might say that it is an obvious goal but as far as I can see, very few modular decomposition mechanisms are satisfactory in that respect and I think that most language designers are still confined by "how the program is going to be executed" so are unable to meet my criteria.
*
I know of another effort directed at the reuse of design: the so-called design patterns. The idea is a good one but its realization is much too weak. They are basically pattens of modular decomposition which have proved useful in many situations. However, there is no basis to state precisely what properties make them useful or how they meet those properties. In short, they cannot have an existence of their own because no notion of correctness apply to them. I do recognize it as a step in the right direction but it seems over-hyped for what it provides. I would welcome the publication of such books where designs are proven correct, where it is (formally) precise where the modules fit in a larger design and how one should proceed to code them correctly.
Next, I would have like to address some of the objections to formal methods that I have heard again and again and that used to unsettle me. However, this post is already pretty long and, by now, it is long overdue. I will, therefore, leave it for a post which, hopefully, will be coming soon.
Simon Hudon
April 10th, 2011
Meilen
References
[CM88] K.M. Chandy and J. Misra, Parallel Program Design:
A Foundation, Addison-Wesley, 1988
[FvG99] W.H.J. Feijen, A.J.M. van Gasteren, On a Method of
Multiprogramming, Springer-Verlag New York, Inc.,
1999