Simon's blog prime: scientific thoughts

Showing posts with label scientific thoughts. Show all posts

Monday, June 6, 2011

Decidability

The proof of the undecidability of the halting problem has always fascinated me. It is a formidable result to behold, at least the first time around. It relies on the construction of a program that cannot exist because of a contradiction drawn through self-reference. Here is an interesting rendering of the argument by Christopher Strachey: An Impossible Program.

However, an interesting paper written by Eric Hehner (Problems with the Halting Problem) sheds doubt on the correctness of the usual proof. He proposes that the contradiction in the proof is due to self reference rather than because of the structure of the halting problem. It is a good read and I strongly recommend it.

Simon Hudon
Meilen
June 6th 2011

Wednesday, November 4, 2009

Bad Separation of Concerns

In the course of my master's, I have to attend a seminar and prepare one talk. Last week, the talk I listened to was about using pattern matching in object oriented programming languages. I thought, as the talk went on, that all the yardstick they took to evaluate the design of their language construct was all wrong. It struggled, in my opinion in vain, to improve the expressiveness while retaining the best possible performances and remaining "concrete". In that respect, I tend to take for granted the necessity of expressing precise specifications with every piece of code. This means that, for me, the most important is that my specification language be expressive enough, but still very simple, to allow me to reason about my problems and my implementation language be as simple as possible and allow me to express any algorithm I wish. In my experience, the specification language should involve as little operational elements as possible for simplicity. As for the implementation language, it should be possible to implement it on any reasonable architecture. Before I'm objected that it is not customary in industry to use precise specification notations or to keep multiple representations, at different degrees of abstraction, of a single solution, I have to point out that this is an academic result and it does not have to follow industrial customs. Actually, I would go one step further and state that, if, in academia, we wait for techniques to become widely adopted before using them, we are lagging behind and risk doing no more than producing more of the same. Academia being more remote from the market place than industry are supposed to be less affected by the political issues that keep industry behind the state of the art. And this is not badmouthing the industry: it is understandable that they have more concerns than only technological ones. Let's get back to the paper. In their design, they seemed to tackle both problems of expressiveness and implementation at once and the result was not pretty. For example, in their pattern matching, they had to take into consideration that the evaluation of the expression (which, in their case, is an absolute necessity) they were dealing with could very well have side effects and this was an important concern of theirs. I think that this possibility was popularized by C and I think this is an important set back with respect to allowing programmers to think about their programs. I think, already, the discipline of separating queries from commands, which is part of Eiffel's style, is much better in this respect. They also had to deal with constructors in order to match objects like you would in Haskell. This is very ugly. Pattern matching is a very elegant way of writing expressive specifications but object creations do not fit in it. This is because an expression creating an object has an implicit reference to memory allocation and it can't be dissociated to it. People can't forget that the object has a memory representation and that a comparison in the context of pattern matching will have a given cost. I prefer to use mathematical expressions to express the value contained by the object and allocate the object only if necessary.

Aside In a recent programming experiment where I derived my code from its proof of correctness, I was parsing a string and, since all the properties of a language can so elegantly be formulated in terms of strings of character, I used strings to represent the state of the computation. I had my input string S, the processed string x and the left over string z and they were linked by the invariant:

S = x ++ z

with ++ the string concatenation operator. At each iteration, I split z in two parts such that

z = [ y ] ++ w

and scanning one character was encoded as:

x, z := x ++ [ y ], w

If you can't help yourself and have to imagine the memory footprint and performances of this algorithm, you might be scandalized that this is really inefficient. This is not meant to be executed as such; it is only a very simple way of understanding defining what the algorithm does. With one data refinement, I went to a representation where the use of memory is limited to the array representing the input, an integer index indicating which character y is and two booleans to represent the acceptance state of the algorithm. There is no efficiency problem with that, and my proof of correctness and my proof of data refinement are very simple and I consider them the clearest documentation for the algorithm.

end of aside

In summary, I think the right way of seeing that problem is:

Pattern Matching is a power way of associating properties or code with the shape of the data and we need to be able to express it. Since we can use it with code, it would be important to have programming techniques to make the executable, that is, techniques usable by a programmer to transform a specification using pattern matching into executable code implementing it. If the methods are especially effective and simple, we could have it implemented in our compiler but this is far from being a design consideration for the construct.

* * *

More recently, in my compiler design class, we were presented a research programming language which is said to be "relationship-based". I think the idea is a good one: they factor the inter-class relationship out of classes to treat them as separate modules. The question whether each such relationship deserve its own module is worthy of debate but, for now, what bothers me is that they treat it as a programming language construct, that is to say they keep an "implementability anchor" in their heads while they design the notation. One of the consequence is that they will refrain from adding anything for which they aren't sure they can compile it automatically. I would rather see this emerge as a specification structuring artifact and see some programming techniques emerge in connection to it so that the most common one can be implemented quite straightforwardly. For the rest, it's open to experimentation and, in all cases, a case by case implementation would allow optimization that systematic compilation wouldn't be able to do.

I think the big problem here is still the taking of current practices as a guideline for academic research: "since programmers don't write specifications, we won't provide a means to write good specifications even though progress would require it". I put it in my "more of the same" bin, look at the original ideas and forget about the language. Actually, I will do so when my semester is over.

Simon Hudon

ETH Zürich

finished on November 18th

Tuesday, October 13, 2009

"For someone whose only tool is a hammer, every problem looks like a nail" (EWD 690)

This note is about paradigms. In short, I would define a paradigm as being the belief that a particular abstraction or theory fits a certain context. We can see, for example, in programming, the paradigm of object oriented programming as being the belief that focusing one's analysis and design efforts on the relations between different pieces of data is an efficient way to cope with the variation in requirements. Also, we can see that applying game theory to economic behavior gives rise to the paradigm that we might call "the rational choice". I think a paradigm is a very useful belief to have because it allows one to deal in simple terms with problems that might get complicated. In my opinion, a healthy use of a paradigm is like controlled schizophrenia: when you deal with a problem within the realm of applicability of the paradigm, it is useful and often necessary to assume its postulate without questions. This is very efficient when dealing with related problems. However, one should be able to defend the use of the paradigm and more specifically, should consciously stop believing in the postulates while doing so. I find it frustrating when discussing with somebody when I realize that, not only can I identify the paradigm he adopts (which is not bad) but I realize that he is unable to step out of it even after several direct attacks at his assumptions. This happens both in technical discussions about computing and in humanities. On the other hand, it is also fruitless to discuss hard problems with somebody who can't assume the postulates once they have been defended. In technical terms, this might be seen as somebody that keeps reasoning about a method call in terms of the instructions that gets executed as a substitute instead of reasoning in terms of an abstract specification. In the first case, the person is hardly schizophrenic at all (what a fraud!) and in the second case the schizophrenia is uncontrolled so the person's reasoning is all over the place: he can't focus on one aspect of his problems at a time because he can't see a clear separation between the two. As a second example of uncontrolled schizophrenia, I see some people rely strongly on set theory and have to understand everything in terms of set. Some of them for example, can't bear to use sequences for their reasoning because the proofs with sequences are so complicated and messy. This is a recent conviction of mine but I keep seeing examples where having a clear and simple abstraction would be much more effective. In the case of sequences, we need a set of postulates that describe sequences as objects of their own right and don't drag sets or anything else into the picture unless a useful relationship can be defined. And even then, the relationship is not central to understanding the abstraction. For soundness purposes, it might be useful to translate postulates of sequence theory in set theory. This becomes metamathematics though. It does not affect the use of sequences.

Sunday, October 11, 2009

What is Abstraction?

I've been talking about the sin of confusing vagueness for abstraction for a while but I haven't stopped to define more precisely what it means. So far, the best I have done to give an understanding of what abstraction is is to say that it is a simplification of an object and that being abstract does not preclude being absolutely precise. Let's try to be more precise on what constitutes an abstraction and what does not. First, an abstraction is a simplification but a simplification does not qualify automatically to be called an abstraction. So, what is missing? If we wanted a completely precise expression, we would say that A is an abstraction for B with respect to a set of properties. The set of properties is usually universal in the context in which abstraction is used. For example, for software engineers, it will be interesting to see a specification as an abstraction for a set of implementations with respect to functional properties. It means that seeing the effects of a particular implementation, if we can always follow them by reading a specification and not get any surprise, the implementation is correct with respect to that specification. For somebody working with algorithms, the performances will be more important so it may be useful to have an abstraction with respect to running time but one that does not preserve functional properties. It is in this respect that I got convinced that having a small theoretical kernel for a modeling language (in Abrial's sense) is not appropriate useful or abstract. For instance, in Event-B, everything is understood in the context of set theory and I was opposed that using sequences was too difficult because of the complications of the underlying set-theoretic model. The key here is that the objection denotes a failure to create a level of abstraction where sequences are understood for what they provide. Providing a set-theoretic model for them is not useful for their use; it is useful to convince someone that the set of axioms describing sequences is sound. After that, we must understand sequences on their own. This is why, I believe, abstract data types are very important for the use of formal methods. Simon Hudon October 11th Meilen

Saturday, September 5, 2009

Collective Knowledge

At the moment, I have a few drafts I am writing but they are not progressing very fast and I am offering this short reflection. Looking around and reading different kinds of research, I can't help but feel that there exists a hierarchy of collective knowledge. Science is part of that hierarchy but it is not the whole story. As somebody who is strongly mathematically inclined, I have a tendency to view formal mathematics as the only means for expressing one's understanding but this position does not hold water. Since I am also very interested in methodology, namely, how we program and how we do mathematics, I have to have considerations for human activities and those are particularly hard to formalize and one might wonder what need we have of formalizing them. It must be honestly revealed that the activity of formalization demands tremendous efforts which are not carried out merely for the beauty of formal theories (although, many don't need much more motivation for doing it). The point is that, at any time before embarking on a formalization project, it is worth it to wonder if it is worth embarking on. On one hand, there is a class of programming languages, of which C++ and PL/I are nice representative, for which the formalization is not worth the effort because they are so complex and there documentation is so ambiguous that the formalization amount to reinventing them. Also, while reinventing them for the sake of formalization, for survival's sake, one should have the reflex of questioning every design decision before deciding to adopt them and, doing so, one would not end up with C++ or PL/I but with a language à la Wirth. If we carried that judgment to a further conclusion, it can also mean that they are not worth using either. On the other hand, there are subjects which are worth studying but the gain of a formalization is not immediately obvious. We have other tools than formalization to be rigorous and precise in our discourse and should not neglect them (to name only one: we have our natural language for which an exceptional mastery can do wonders). Once the benefit of formalization is seen, though, one should not hesitate. At the moment of writing, I am thinking about the general literature about programming methodology (aka software engineering). I have the feeling that although some publications concentrate on giving examples of the use of a particular method, some succeed at being very rigorous by being modest and accepting that they are merely working with one example. Also, their rigor seem to rely on the analysis of their example, of what helped and what did not. On the hand, some very disappointing papers on the topic try to be more rigorous by building a tool, having a handful of programmers use them, measure (in some vague sense) the improvement in their productivity and publish the statistic. Without a theory to compare those numbers to, they are meaningless and I can't help but see it as another instance of the general hype around statistic which I would summarize with "a number is better than no numbers at all". Simon Hudon Zürich, September 5th 2009

Friday, August 14, 2009

Pondering about Reasoning Errors

This note is mostly about teaching and, in general, communicating with other human beings. I have thought about writing it for a while but I was afraid not to have enough material. I'm writing it now because of new observations and I don't care too much about the length, today. In logic, we have the two important notions of soundness and of completeness. In short, a logical system is sound if you can only use it to prove statements that are true (but maybe not all of them) and complete if you can prove all statements that are true (but maybe also some false statements). I mention them because they appear more and more relevant to the way I choose to explain or with which I choose to help people. In such cases, it is necessary to make assumptions in order to explain or skip notions. Two kinds of errors are of interest for me here: either I assume that the other person does not know something and choose to explain it while he or she already know about it or I assume that he or she does and skip it while he or she does not know about it. In the first case, the reaction can be to be insulted because the attitude seems patronizing because the notion seems obvious. In the second case, the other person can miss a notion and be too afraid to appear stupid to ask about it. While I was an intern at A2M (a game developer in Montreal), my supervisor, which I regard with high respect, taught me that it is desirable to make communications as simple and concise as possible and to trust that the other person (in this situation an engineer or a scientist) will not be shy to ask about the missing parts. During the last hour, I've been thinking about this post while doing my groceries and I realized that there are a handful of people that I trust enough to be very straightforward in my communications because I've been talking to them for a long enough period to see that they are active while listening. For the others, it does not mean they are any less. It just means that I don't know if they will ask questions and, from time to time, I don't want to risk them misunderstanding me. In no case is this a sign that I mistrust someone's intelligence because I noticed recently that I was too explicit even with some people which I have in high esteem. This is pretty short but it's all I could summon before my stomach started threatening me. Simon August 14th 2009 Zürich

Simon's blog prime