Saturday, November 28, 2009

Back in the 60s

Since the beginning of Computing Science and computer programming, there has been an important evolution in the methods used to design software. We're now much better than we used to be to produce software. When I turn around and look at the Internet trend though, I have the impression of looking at a software seen of at least 40 years ago. Whereas people are now capable of choosing a programming language on account of the abstractions it allows them to formulate and use for desktop applications, it seems that web programming is still tightly tied with the individual idiosyncrasies of the browser that will be used to run them. Whenever I complain about the hazards of using Javascript, I get the answer that this is what gets executed in the browsers! Since I'm very enthusiastic about the research that I do, I don't usually refrain to mention the sorry state of affair of software and the thought that formal methods can be of great use for that. One year ago, I met a friend of a friend who is enthusiastic about web programming. When I suggested that we should be very careful about the properties that we select for our programs and the assumptions that their validity rely on, he pointed out that would be of very little help for web programming because of the important differences that exist between the browsers. Continuing in that direction, I got the strong feeling that he considered the variety of platforms to be one of the most important and most interesting challenges of web programming. I could not help but be remembered the accounts of the software scene of the 60s that I have read. The accidental complexity is so important that people mistake it for the core of the problem. In that respect, I guess this is no surprise that web applications are of such a poor quality. What triggered the present note is my suffering from the poor quality and and random behavior of Facebook, Google Wave and, while I type, I am reminded that Blogger is not much better. A final word, lest I am mistakenly assumed to be satisfied with the state of the art in non-browser based application: they appear much better in comparison to web application but there are important shortcomings also and I think that they are part of the problem of the web scene. Simon Hudon November 28th Meilen

Wednesday, November 18, 2009

More of the Same

I just submitted a post on language design and I feel that this one is closely related. I just attended a talk in the context of my software verification course. We are now studying techniques related to data flow analysis and we had a speaker present us how he used such techniques to measure the quality of a test suite. A priori, I am not very much into testing but still, I think there is something to be done with the topic. As far as I know, the only research done in that respect that treats testing in a decent way consider the specifications even though it also uses the program's structure as a guide for testing. Now that I think of it, this is the shortcoming that ticked me off the most. It was based on Java code and there was no hint of suggestion that considering an invariant or any other statement of abstract properties of the data or of the program would enhance the assessment of the tests suites. The code was taken as it stands, assuming that the mind of the programmers is impenetrable. To avoid repeating myself, I will simply say that the overwhelming feeling I got while listening was: "this advanced computing theory of obsolete programming techniques".
I have no difficulty explaining its survival though: it does not suggest the need for education for anybody and produces a tool which can be used without thoughts. In other words, it's a fancy way of patting on back the industrial managers and the programmers alike. Nothing is more welcomed than being told that you're doing a good job by an automated tool. In my eye, this is just more of the same.
Simon Hudon
Zürich
November 20th, 2009

Wednesday, November 4, 2009

Bad Separation of Concerns

In the course of my master's, I have to attend a seminar and prepare one talk. Last week, the talk I listened to was about using pattern matching in object oriented programming languages. I thought, as the talk went on, that all the yardstick they took to evaluate the design of their language construct was all wrong. It struggled, in my opinion in vain, to improve the expressiveness while retaining the best possible performances and remaining "concrete". In that respect, I tend to take for granted the necessity of expressing precise specifications with every piece of code. This means that, for me, the most important is that my specification language be expressive enough, but still very simple, to allow me to reason about my problems and my implementation language be as simple as possible and allow me to express any algorithm I wish. In my experience, the specification language should involve as little operational elements as possible for simplicity. As for the implementation language, it should be possible to implement it on any reasonable architecture. Before I'm objected that it is not customary in industry to use precise specification notations or to keep multiple representations, at different degrees of abstraction, of a single solution, I have to point out that this is an academic result and it does not have to follow industrial customs. Actually, I would go one step further and state that, if, in academia, we wait for techniques to become widely adopted before using them, we are lagging behind and risk doing no more than producing more of the same. Academia being more remote from the market place than industry are supposed to be less affected by the political issues that keep industry behind the state of the art. And this is not badmouthing the industry: it is understandable that they have more concerns than only technological ones. Let's get back to the paper. In their design, they seemed to tackle both problems of expressiveness and implementation at once and the result was not pretty. For example, in their pattern matching, they had to take into consideration that the evaluation of the expression (which, in their case, is an absolute necessity) they were dealing with could very well have side effects and this was an important concern of theirs. I think that this possibility was popularized by C and I think this is an important set back with respect to allowing programmers to think about their programs. I think, already, the discipline of separating queries from commands, which is part of Eiffel's style, is much better in this respect. They also had to deal with constructors in order to match objects like you would in Haskell. This is very ugly. Pattern matching is a very elegant way of writing expressive specifications but object creations do not fit in it. This is because an expression creating an object has an implicit reference to memory allocation and it can't be dissociated to it. People can't forget that the object has a memory representation and that a comparison in the context of pattern matching will have a given cost. I prefer to use mathematical expressions to express the value contained by the object and allocate the object only if necessary.
Aside In a recent programming experiment where I derived my code from its proof of correctness, I was parsing a string and, since all the properties of a language can so elegantly be formulated in terms of strings of character, I used strings to represent the state of the computation. I had my input string S, the processed string x and the left over string z and they were linked by the invariant:
S = x ++ z
with ++ the string concatenation operator. At each iteration, I split z in two parts such that
z = [ y ] ++ w
and scanning one character was encoded as:
x, z := x ++ [ y ], w
If you can't help yourself and have to imagine the memory footprint and performances of this algorithm, you might be scandalized that this is really inefficient. This is not meant to be executed as such; it is only a very simple way of understanding defining what the algorithm does. With one data refinement, I went to a representation where the use of memory is limited to the array representing the input, an integer index indicating which character y is and two booleans to represent the acceptance state of the algorithm. There is no efficiency problem with that, and my proof of correctness and my proof of data refinement are very simple and I consider them the clearest documentation for the algorithm.
end of aside
In summary, I think the right way of seeing that problem is:
Pattern Matching is a power way of associating properties or code with the shape of the data and we need to be able to express it. Since we can use it with code, it would be important to have programming techniques to make the executable, that is, techniques usable by a programmer to transform a specification using pattern matching into executable code implementing it. If the methods are especially effective and simple, we could have it implemented in our compiler but this is far from being a design consideration for the construct.
* * *
More recently, in my compiler design class, we were presented a research programming language which is said to be "relationship-based". I think the idea is a good one: they factor the inter-class relationship out of classes to treat them as separate modules. The question whether each such relationship deserve its own module is worthy of debate but, for now, what bothers me is that they treat it as a programming language construct, that is to say they keep an "implementability anchor" in their heads while they design the notation. One of the consequence is that they will refrain from adding anything for which they aren't sure they can compile it automatically. I would rather see this emerge as a specification structuring artifact and see some programming techniques emerge in connection to it so that the most common one can be implemented quite straightforwardly. For the rest, it's open to experimentation and, in all cases, a case by case implementation would allow optimization that systematic compilation wouldn't be able to do.
I think the big problem here is still the taking of current practices as a guideline for academic research: "since programmers don't write specifications, we won't provide a means to write good specifications even though progress would require it". I put it in my "more of the same" bin, look at the original ideas and forget about the language. Actually, I will do so when my semester is over.
Simon Hudon
ETH Zürich
finished on November 18th