Three famous quotes

These three quotes are logically related to each other. Have you figured out their logical connections? ;-)

UNIX is simple, it just takes a genius to understand its simplicity. —Dennis Ritchie

This suit of clothes is invisible to those unfit for their positions, stupid, or incompetent. —the emperor’s weavers

If you can’t explain it to a six year old, you don’t understand it yourself. —Albert Einstein

RubySonar: a type inferencer and indexer for Ruby

(This is a reposting of a blog post I wrote on Sourcegraph‘s website)

Since the last enhancement to our Python analysis, we have been working on improving the Ruby part of Sourcegraph by building a similar static analysis tool for Ruby. The result is a new open-source project RubySonar. With its help, now we are happy to see significantly improved results of Ruby repos which in general finds over ten times more symbols, provides more examples and more accurate jump-to-definition. It also provides local variable highlighting which was not available in our previous Ruby results. With the improved speed of analysis, we can now process the full Ruby standard library, Ruby on Rails, and Ruby apps such as Homebrew.

RubySonar’s analysis is interprocedural and is sensitive to both data-flow and control-flow, which makes it highly accurate. Although we currently don’t show the types, RubSonar internally uses the same type inference technique of PySonar, and thus can resolve some of the difficult cases that can challenge a good Ruby IDE.

Although Ruby and Python seem to be very similar languages, lots of work has been done to adapt PySonar’s code to Ruby. Still more work has to be done before RubySonar can achieve the same coverage of PySonar2, but so far the Ruby results are a lot more usable. We have been carefully keeping things simple and easy to extend, so we are optimistic about further improvements to the Ruby analysis.

Tests and static analysis

Ever since I made a static analysis tool for Python called PySonar, I have been asked about the question: “What is the difference between testing and static analysis?” When I worked at Coverity, my coworkers told me that they also spent quite some time explaining to people about their difference. My answer to this question evolves as my understanding of this area deepens. Recently I replied to a comment asking a similar question, so I think it’s a good time to write down some systematic answer for this question.

Static analysis is static, tests are dynamic

Static analysis and tests are similar in their purposes. They are both tools for improving code quality. But they are very different in nature: static analysis is (of course) static, but tests are dynamic. “Static” basically means “without running the program”.

Static analysis is similar to the compiler’s type checker but usually a lot more powerful. Static analysis finds more than type errors. It can find defects such as resource leaks, array index out of bounds, security risks etc. Advanced static analysis tools may contain some capabilities of an automatic theorem prover. In essence a type checker can be considered a static analysis with a coarse precision.

Static analysis is like predicting the future, but testing is like doing small experiments in real life. Static analysis has the “reasoning power” that tests hasn’t, so static analysis can find problems that tests may never detect. For example, a security static analysis may show you how your website can be hacked after a series of events that you may never thought of.

On the other hand, tests just run the programs with certain inputs. They are fully dynamic, so you can’t test all cases but just some of them. But because tests run dynamically, they may detect bugs that static analysis can’t find. For example, tests may find that your autopilot program produces wrong results at certain altitude and speed. Static analysis tools can’t check this kind of complex dynamic properties because they don’t have access to the actual running situation.

But notice that although tests can tell you that your algorithm is wrong, they can’t tell you that it is correct. To guarantee the correctness of programs is terribly harder than tests or static analysis. You need a mechanical proof of the program’s correctness, which means at the moment that you need a theorem prover (or proof assistant) such as Coq, Isabelle or ACL2, lots of knowledge of math and logic, lots of experience dealing with those tools’ quirks, lots of time, and even with all those you may not be able to prove it, because you program can easily encode something like the Collatz conjecture in it. So the program’s passing the tests doesn’t mean it is correct. It only means that you haven’t done terribly stupid things.

Difference in manual labor

Testing requires lots of manual work. Tests for “silly bugs” (such as null pointer dereference) are very boring and tedious to make. Because of the design flaws of lots of programming languages, those things can happen anywhere in the code, so you need a good coverage in order to prevent them.

You can’t just make sure that every line of the code is covered by the tests, you need good path coverage. But in the worst case, the number of execution paths of the program is exponential to its size, so it is almost impossible to get good path coverage however careful you are.

On the other hand, static analysis is fully automatic. It explores all paths in the program systematically, so you get very high path coverage for free. Because of the exponential algorithm complexity exploring the paths, static analysis tools may use some heuristics to cut down running time, so the coverage may not be 100%, but it’s still enormously higher than any human test writer can get.

Static analysis is symbolic

Even when you get good path coverage in tests, you may still miss lots of bugs. Because you can only pass specific values into the tests, the code can still crash at the values that you haven’t tested. In comparison, static analysis processes the code symbolically. It doesn’t assume specific values for variables. It reasons about all possible values for every variable. This is a bit like computer algebra systems (e.g. Mathematica) although it doesn’t do sophisticated math.

The most powerful static analysis tools can keep track of specific ranges of the numbers that the variables represent, so they may statically detect bugs such as “array index out of bound” etc. Tests may detect those bugs too, but only if you pass them specific values that hits the boundary conditions. Those tests are painful to make, because the indexes may come after a series of arithmetic operations. You will have a hard time finding the cases where the final result can hit the boundary.

Static analysis has false positives

Some static analysis tools may be designed to be conservative. That is, whenever it is unsure, it can assume that the worst things can happen and issue a warning: “You may have a problem here.” Thus in principle it can tell you whenever some code may cause trouble. But a lot of times the bugs may never happen, this is called a false positive. This is like your doctor misdiagnosed you to have some disease which you don’t have. Lots of the work in building static analysis tools is about how to reduce the false positive rate, so that the users don’t lose faith in the diagnosis reports.

Tests don’t have false positives, because when they fail your program will surely fail under those conditions.

The value of static analysis

Although static analysis tools don’t have the power to guarantee the correctness of programs, they are the most powerful bug-finding tools that don’t need lots of manual labor. They can prevent lots of the silly bugs that we spend a lot of time and energy writing tests for. Some of those bugs are stupid but very easy to make. Once they happen they may crash an airplane or launch a missile. So static analysis is a very useful and valuable tool. It takes over the mindless and tedious jobs from human testers so that they can focus on more intellectual and interesting tests.

What makes python static analysis interesting and hard

(The original version of this post appeared in my company’s blog. Here is just a personal copy of it.)

Python is a dynamic language in the authentic sense. It represents the world-view that types don’t really exist in reality, and this is probably right. Type systems are “thinking tools” that only exist in our head. They help us reason about programs, but the world doesn’t really work in their ways. Types help prevent basic programming errors, but if not designed properly, a type system can be a big impedance to the programmer’s productivity. Years of cutting-edge research still haven’t produced a type system that won’t get into the programmers’ way, but you can still write useful program without a static type system. This is why dynamic languages such Python, Ruby and JavaScript are attractive to people, especially those who want to express themselves more directly and don’t like to go out of their way to make type systems happy.

But this directness and ease of use have a cost. Making IDEs and refactoring tools is a hard problem for Python. Type errors are hard to catch. For a simple example, the current state-of-the-art in Python IDEs is like the following.

I loaded this tiny program into the most recent version of JetBrains PyCharm (3.0.2), a Python IDE, and pressed the “show declaration” key on the field accessor foo, and it gave me three irrelevant choices, without including the right one.

Despite this limitation, PyCharm is still a really nice tool. I like PyCharm and use it myself. But this problem of finding the correct declaration of foo is harder than you might have thought. Without knowing what x is, how would you find the declaration of foo? This requires an “interprocedural analysis” where you track all the places where function f can be called. This could be many levels deep into the call chain. And because you can use functions as values, you sometimes don’t even know which function a name represents. You have to keep track of the type information that flows around this intricate data and control flow. Doing interprocedural analysis is hard in general, and it is even harder in a dynamic language with features such as first-class functions.

Finding type errors in Python is also hard. PyCharm would not report any problems for the following piece of code, but when you run it, you get a type error because you are adding the string "hi" and the integer 1. The if statement sets x to different types in different branches (int and str), and this confuses PyCharm’s type checker.

This might not be a big problem for an IDE, but as a semantic code search and browsing engine, we at Sourcegraph must do better, because if we can’t resolve the names, we will not have any usage information about that piece of code. We want to get as much information about code as possible. For this purpose, we use PySonar, a static analysis tool for Python. PySonar can handle the above cases, and a lot more complicated ones.

PySonar is at its core an “abstract interpreter”. It basically works like a Python interpreter, only that it doesn’t really execute the program. It implements almost everything in Python: modules, classes, objects, functions (closures), etc. It transfers “abstract values” (types) instead of actual values. It explores all possible execution paths of the program, and it always terminates. This simple technique works surprisingly well.

Working on PySonar is interesting because you think about the world philosophically with a “multiple worlds” model. If you don’t know the answer to a decisive question in your life, what can you tell about the future no matter what the answer is? The trick is to use as much information you can get, put it into the possible “branches”, and try to deduce conclusions from what you have. This is basically what mathematical proofs are about—I don’t know what the natural number N is, but a natural number can only be 1) zero, 2) M+1 where M is some other natural number. So as long as I know that 1) P holds for zero, and 2) P holds for M+1 whenever M is another natural number, then I know that P holds for all natural numbers.

Similarly in the above example, PySonar knows that if we don’t know what z is, x could be either an integer or a string. A program is safe only if it is safe in all possible worlds, thus PySonar knows that things could go wrong when we try to execute x+1. This kind of reasoning is done pretty easily by building so-called “models”. Models abstractly represent things inside the program’s execution. Old models transform into new models when actions happen in the program, and the “world” of models split at branching statements.

The new PySonar2 is a lot better than its predecessor. Its interprocedural analysis finds more declarations and more precisely. It also generates type information which can be helpful for understanding programs. This type information is already available on Sourcegraph for many Python repositories.

The recent additions to PySonar2 are intersection types, a more principled way of handling union types, and lots of improvements to its performance and simplicity. I’m also working on porting the same techniques to Ruby.

PySonar2 successfully integrated with Sourcegraph.com

(The above picture was taken from: http://sourcegraph.com/github.com/mitsuhiko/flask)

I recently joined a newly founded company called Sourcegraph.com. We build an intelligent code search engine using some of the most powerful programming language technologies.

The difference between Sourcegraph and other code search sites is: Sourcegraph truly understands code and produces accurate results. Sourcegraph lets you semantically search and browse opensource code on the web in a similar fashion as you do locally with IDEs. It also finds users of your code worldwide, and show exactly how they use your code.

For example the following is what Sourcegraph shows you about Flask framework’s Flask.route method.

Flask.route sourcegraph page

code examples of Flask.route

PySonar2 integration

Two weeks since joining, I have successfully integrated PySonar2‘s type inference and indexing features with Sourcegraph.com, so that it can now show inferred types for Python functions. This is an advanced feature currently not available in any Python IDE.

For example, the following is the “Top Functions” page for the Flask framework, showing the PySonar2 inferred types. Notice that the parameters are not type-annotated. PySonar2 infers the types by an advanced interprocedural analysis.

How to use sourcegraph.com to browse your GitHub repo

You can browse your GitHub by prepending “http://sourcegraph.com/” to your GitHub repo’s address. For example, if your repo address is github.com/myname/myrepo, then you put this address into the browser:

http://sourcegraph.com/github.com/myname/myrepo

Currently this trick works only for GitHub addresses. If Sourcegraph hasn’t yet processed the repo, it will queue it and start analyzing as soon as possible. Usually this finishes within a few minutes up to half an hour, depending on how large the repository is and how busy the servers are. Sourcegraph currently supports Go, JavaScript, Python and Ruby. More languages are under development.

All our features are still in beta and may contain quite some bugs and many areas of improvements. You are very welcome to send bug reports and feature requests to us.

A pure logic negation operator for miniKanren

Have you ever noticed that some examples from The Reasoned Schemer are not so reasonable? If so, you may want to read this post.

Motivation

miniKanren is an educational logic programming language designed by Dan Friedman, William Byrd and Oleg Kiselyov. For teaching logic programming, they also co-authored the book The Reasoned Schemer (TRS). As a person hugely benefitted from this book (and also every other book of the “little book” series), I highly recommend TRS to you.

While elegantly designed, miniKanren hasn’t a “pure” negation operator. There is a ‘conda’ operator which is similar to Prolog’s cut, but it is not pure. That means once it is used, we may miss possible answers even if they exist. Thus although the ‘conda’ operator exists, it is not recommended for serious use.

But now we have a problem, if we can’t have a ‘cond’ operator with implicit negation of the conditions of the previous lines, we will have trouble interpreting the results from some code from The Reasoned Schemer (TRS). For example, on “Frame 30″ of TRS, we have the following program, which invokes rembero, a “logic function” for deleting an item from a list.

The definition of rembero is (notice the ‘conde’ operator):

(define rembero
  (lambda (x l out)
    (conde
      ((nullo l) (== '() out))
      ((caro l x) (cdro l out))
      ((fresh (res)
         (fresh (d)
           (cdro l d)
           (rembero x d res))
         (fresh (a)
           (caro l a)
           (conso a res out)))))))

If it is used this way:

(run* (out)
 (fresh (y)
   (rembero y `(a b ,y d peas e) out)))

Running it and we have the following 7 answers:

;; =>
;; ((b a d peas e)               ; y == a
;;  (a b d peas e)               ; y == b
;;  (a b d peas e)               ; y == y
;;  (a b d peas e)               ; unreasonable beyond this point
;;  (a b peas d e)
;;  (a b e d peas)
;;  (a b _.0 d peas e))

Have you ever been surprised that there are 7 answers? Is it really possible that y fails to remove itself, but goes beyond and removes ‘d’, ‘peas’ and ‘e’ (Answers 4, 5 and 6), or fails to remove all of them (Answer 7)? Have you noticed that only the the first 3 answers are reasonable, and the last 4 answers shouldn’t really happen?

For this particular example, the result from The Reasoned Schemer was not so reasonable.

a pure negation operator

As a student in Dan’s class (B521), I was puzzled by the above results. I asked Dan and Will why this happens. They told me that this is because ‘conde’ operator of miniKanren is not exactly like ‘cond’ of Scheme. In Scheme, every line in the ‘cond’ expression implicitly negates all the previous conditions. This is to say, we execute the second line only if the first condition fails, and we execute the third line only if the conditions on the first and second lines both fail, and so on.

On the other hand, the ‘conde’ operator of miniKanren doesn’t implicitly insert the negation of the conditions on the previous lines. The reason that miniKanren doesn’t do this is because there is no easy way of doing “negation” in logic programming. According to Will Byrd, this is a thorny subject that has been researched for over 30 years.

As a dare devil who never believes how difficult things are, I thought: “Why not try my luck and see how far I can go competing with these 30 years of research?” Out of this evil-minded motivation, I independently reimplemented miniKanren and added a negation operator in it (named “noto”, naturally). Different from ‘conda’ and Prolog’s cut, noto is pure in the sense that it doesn’t cut out possible answers if they exist. Using noto, I defined a new conditional construct named ‘condc’, which implicitly inserts negations of all previous conditions on each line. It is designed to behave as an exact logic counterpart of Scheme’s ‘cond’.

If we use the ‘condc’ operator to redefine remebero (only one character is changed), we will have the following (more reasonable) results:

;; redefine rembero using condc operator
(define rembero
  (lambda (x l out)
    (condc
      ((nullo l) (== '() out))
      ((caro l x) (cdro l out))
      ((fresh (res)
         (fresh (d)
           (cdro l d)
           (rembero x d res))
         (fresh (a)
           (caro l a)
           (conso a res out)))))))

(run* (out)
 (fresh (y)
   (rembero y `(a b ,y d peas e) out)))

;; =>
;; (((b a d peas e) ())
;;  ((a b d peas e) ())
;;  ((a b d peas e)
;;   (constraints:
;;    ((noto (caro (b #1(y) d peas e) #1(y)))
;;     (noto (caro (a b #1(y) d peas e) #1(y)))))))

Notice that we got only 3 answers (instead of 7), plus two constraints for the third answer. In fact each answer is paired with a constraint list, but the constraint lists are empty for the first two answers. This is why they are displayed as ((b a d peas e) ()) and  ((a b d peas e) ()).

Now I briefly describe what these three answers mean. The first answer (b a d peas e) happens when “y is a”, thus it removes the first item (a) from the list. The second answer (a b d peas e) happens when “y is b”, thus it removes the second item (b) from the list. If you are confused why we still have a ‘b’ here after ‘a’, this is because the third item (y) is now ‘b’!

The third answer is more interesting. It not only has an answer (a b d peas e), but also has two constraints attached to this answer:

(noto (caro (b #1(y) d peas e) #1(y)))
(noto (caro (a b #1(y) d peas e) #1(y)))

(Here #1(y) is a special notation to say that y is a logic variable.)

These constraints are in conjunctive form. They are saying: If we are to have this answer, neither (caro (b y d peas e) y) nor (caro (a b y d peas e) y) should hold, which is basically saying “y is neither a nor b”. This is correct, because if y is either a or b, we would not have reached this answer because y would have removed one of the first two items, and the iteration would have stopped.

We have no more answers beyond the third, because under no condition should y be able to remove ‘d’, ‘peas’ or ‘e’, because the logic variable y will definitely remove y, no matter what it is! The iteration will definitely stop at the point where “y meets y”. Clear?

How does it work?

The principle behind the negation operator (“noto”) is to propagate the negation of goals as constraints (as in constraint logic programming) down the execution paths of the miniKanren program.

Before I tell you further details, I want to describe the intuition behind its design. Looking at the details without knowing the design principles will not be very useful. To see how you can design a negation operator, just think about how you can make the goal (noto G) succeed. First of all, you want to make the goal G fail, so that (noto G) can succeed, right? But G may contain unbound logic variables, and you can’t just randomly assign them values. This is why a more elaborate mechanism was devised. It is there to ensure the soundness of the logic.

So now we can proceed to look at the details how this works:

  1. When the negation of a goal G is first encountered, as (noto G), a specially designed “evil unifier” (unify-evil) is invoked. As its name suggests, unify-evil works similar to unify, but in a “negative way”. The goal of unify-evil is to take every chance to make the goal G fail. Basically, it tries its best to find values that can be bound to the free logic variables, such that G can fail. But notice that unify-evil doesn’t permanently associate those values to the logic variables. It just tries out those values, and as soon as it knows that G can fail, it dumps those associations. Thus the free logic variables remain free.
  2. If unify-evil cannot make G fail no matter how hard it tries, then we know that G will succeed, thus we know that the goal (noto G) will fail. This means, we have failed to produce answers on this path. We should backtrack and explore other paths.
  3. If unify-evil succeeds in making G fail, then (noto G) has a chance to succeed. But at this moment it is too early to declare success, because the unbound logic variables may pick up some other values later, which could make G succeed, and consequently make (noto G) fail.
  4. Because of (3), we will have to propagate the negation of G as a constraint down the path of execution, checking that G fails every time when we have new information about the unbound logic variables (e.g. some fresh variables are later bound).
  5. If at the end of the execution path unify-evil can still succeed in making G fail, then we can safely declare the success of (noto G). This is because the free logic variables will have no more chances to make G succeed. This (noto G), if it is not subsumed by the current substitution state, should be included in the final answer.
  6. A reified value, together with the non-subsumed constraints on the logic variables, will be output together as the answer.

From the above mechanism, can you see why the example of rembero produces these results?

(run* (out)
 (fresh (y)
   (rembero y `(a b ,y d peas e) out)))

;; =>
;; (((b a d peas e) ())
;;  ((a b d peas e) ())
;;  ((a b d peas e)
;;   (constraints:
;;    ((noto (caro (b #1(y) d peas e) #1(y)))
;;     (noto (caro (a b #1(y) d peas e) #1(y)))))))</pre>

Why hasn’t the second answer a constraint which says “y is not a”? This is because for the second answer, we already know that “y is b”, which implies “y is not a”. The system is intelligent enough to omit “y is not a” from the answer’s constraints because it knows that it is subsumed under the current substitution (“y is b”).

Limitations

Nested negations does not work properly, so if you have (noto (noto (== x 10))), you are not guaranteed to have x bound to 10. I have a later version of the negation operator did make this work, but it caused non-termination problems, and I ran out of allocated time soon after that.  More work needs to be done to make nested negations work.

After several years of this experiment, I had an interesting discussion with Oleg Kiselyov on this topic. An excerpt of our conversation is included as comments at the bottom of the code. In his words, although the implementation of noto works to some degree, it is not perfect. To the best of his knowledge, no negation operators work perfectly until this day.

So, did I beat 30 years of hardcore research? Probably not. But consider this – it took me less than a month to think of and implement all this. I worked completely independently, day and night. This happened in 2008 when I first learned miniKanren and logic programming. Today as a mature programming languages researcher, I’d like to take it as an amusement to revisit and see how far I can go down the path I have started exploring 5 years ago.

Code

The reimplemented miniKanren, together with the negation operator, has been available from my GitHub for years without being noticed. Now I made it an independent project and hope to have time (and public pressure) to develop it further. I also hope to gather ideas from real logic programming gurus about other ways of implementing pure negation operators.

If you are interested in playing with it, or you want to research on this topic, my code is here for free:

http://github.com/yinwang0/ykanren

psydiff: a structural comparison tool for Python

(click on the above picture for a demo)

Psydiff is a structural comparison tool for Python, written in Python itself. The main algorithm and UI of psydiff is almost the same as ydiff.

If interested, you can see a demo of it here (psydiff comparing itself):

http://www.yinwang.org/resources/pydiff1-pydiff2.html

All the source code can be downloaded from my GitHub repo:

http://github.com/yinwang0/psydiff

It’s still in early stage of development. I appreciate your bug reports, feature requests or contributions.

Special thanks to Ronny Pfannschmidt for code contribution and suggestions.

(Note: This project was named “pydiff”, but I found that the name has already been taken for another project. Thus it is now renamed to “psydiff“, meaning “PYthon Structural diff” or “PSYcho diff” :=)

Null reference may not be a mistake

null

The null pointer is considered to be a “billion-dollar mistake” by Tony Hoare, but was he really saying that null pointer should never be used? After years of using languages both with null pointers (e.g. Java) and without them (e.g. Haskell), I found that null pointers are much easier and more natural to use than its counterparts (e.g. Haskell’s Maybe type). I have been wondering why there is such a notion of “billion-dollar mistake” until I saw the original video where Tony Hoare claims it to be his mistake. In fact, he didn’t really say that null pointer should not be used, so I realized that I made a mistake by taking the words “billion-dollar mistake” literally.

From this video, you can see that introducing null reference is not really a mistake. On the contrary, null references are helpful and sometimes indispensable (consider how much trouble if you may or may not return a string in C++). The mistake really is not in the existence of the null pointers, but in how the type system handles them. Unfortunately most languages (C++, Java) don’t handle them correctly.

Every class type A of Java is in fact a union type {A, null}, because you can use null anywhere an object of class A is expected. This is equivalent to the Maybe type of Haskell (where null in Java corresponds to Nothing of Haskell). So the trouble really is that an annotation like {String, null} should be distinguished from String, so that it will be clear what can possibly end up in the value. Unfortunately most languages don’t provide a convenient union type that you can put String and null together (Typed Racket is an exception). If Java is to have union types, we can say something like:

{String, null} findName1() {
  if (...) {
    return "okay";
  } else {
    return null;
  }
}

This is saying: findName may return a name which is a String, or it may return nothing. In comparison, we can say something slightly different:

String findName2() {
    ...
    return "okay";
}

By distinguishing the return types of findName1() and fineName2(), the type system knows that you should check for null when you have called findName1(), but you don’t need to check for null if you call findName2(). So you have to write something like:

String s = findName1();  
if (s != null) {
  x = s.length();      // use s as a String only after null check
}

But you may write:

String s = findName2();
x = s.length();

For the latter, you don’t have to check whether s is null because we know definitely that findName2() will return a String which is not null.

In fact this approach is hinted by Tony Hoare in the above video at 00:24.00. He said that null should be a class. Indeed, the union type {String, null} certainly thinks of null to be at the same status of String – it is a class.

But we soon realize that it doesn’t really matter whether null is a class or not since the class Null will have only one value — null. So any language with null references should work equally well given a correct type system.

Advanced static analysis tools can already help alleviate this issue by essentially inferring the union types like {String, null} even when the programmers write String instead, although a type annotation system which allows the programmers to specify union types directly certainly makes type checking easier and also makes programs easier to read.

How to reduce the contagious power of ‘const’

For the purpose of optimization and (possibly?) safety, the C++ programming language decided that the users should manually specify ‘const’ annotations for variables that they know will/should never be modified. While I admit the possible usefulness of this annotation, I often see it misused, and this often results in a contagious effect where unnecessarily many variables and parameters are annotated as ‘const’.

In this post, I reason philosophically about the true meaning of the ‘const’ annotation, and propose some tentative principles regarding the use of it. By following them, I hope to reduce the contagious power of ‘const’ to the minimum.

First, the rule for ‘const’ annotations for variables:

Rule 1. Declare a variable as ‘const’ if you want it to be inlined, or you don’t want it to be modified.

By this principle, these examples are legitimate:

const int max = 1000;
const char* name = "permanent name";

These declarations can be global variables as well as local variables. Declaring them as ‘const’ can facilitate the inlining opportunities of them, as well as prevent unwanted modifications.

Now we just have one other rule. This one concerns the ‘const’ annotations on functions:

Rule 2. As a function, be self-disciplined about what you take, but be relaxed on what you give.

This means, put ‘const’ on the parameters whenever you are sure that you will by no chance modify the input argument, but never put ‘const’ on the return type unless the compiler asks you to.

The reason behind this may take a little bit of thinking. By putting ‘const’ on the input parameter, the function is basically declaring: “Whoever passes me this argument, I promise not to modify it.” “I’m side-effect free on this parameter.” “I’m pure on this parameter.”

This is a good thing, because you give the callers of the function more freedom of what they can give you. If you don’t declare your “pureness” (on the parameter), some callers may not be able to pass you certain data because they don’t want their data to be accidentally modified by you. They need their original values intact, because they are going to use them after you return.

So in general, this is a good thing to do in C++:

  void foo(const char* in);

But there is a catch here: whether you can put ‘const’ here depends on the function body and all the function calls inside. Whenever you put ‘const’ on the parameter type, you have to make sure that any functions you call don’t modify this parameter either. That is to say this is a transitive property. For example, something like this will not work:

  void foo(const char* in) {
    bar(in);
  }

  void bar(char* in) { ... }

That is because bar may modify its parameter, which is also foo’s parameter.

Is this to say that you should declare bar as void bar(const char* in)? It all depends on whether bar actually modifies its argument or not. If it does, then there is no way you can use ‘const’, consequently you cannot declare foo as taking a const char either. Then the type of foo should be “void foo(char in)”, not having the ‘const’.

There is no way you should use ‘const’ in foo then, because the helper bar modifies the data. You have to say this honestly: “I may modify the contents of in, because one of the functions I call modifies it.”

This is where it can get painful in C++, because the other functions you call may be written by other people and they may not follow the same principles here. They may not have ‘const’ on their parameter types even though they haven’t modified the parameter in their function body. This is not their fault, because adding those ‘const’ annotations is so boring. Some automatic inference can certainly be helpful in these cases, but unfortunately inference for ‘const’ is not provided by C++.

On the other hand, why should you NOT put ‘const’ on the return type? This is because if you do, then you are basically announcing: “Whoever calls me cannot modify the value that I return.”

This like saying this in your will: “I will give my house to my son, but he is not allowed to modify it.” Well, the law may give you the right to specify this, but what good can this do to your son? You are limiting the happiness you can provide. More over, can this kind of restriction really be enforced? Who has the time or right to keep an eye on your son and restrict his action on the house? This is just impossible to work.

Coming back to the context of programming, what’s the point of not allowing the callers to modify what you GIVE them? By returning, you are giving up control of the return value to the caller. It should be up to the caller, and not you, to decide whether to modify the return value or not. Even if you put ‘const’ annotations in the return type, they are usually ignored by the callers. They just use a const_cast whenever the ‘const’ gets in their way. This makes the ‘const’ annotation on return types virtually useless.

If you put ‘const’ on the return type and the callers respect it, then it will contaminate the whole data path wherever the return value goes. This is nasty and unnecessary.

So in general, this is not a good thing to do:

  const char* bar();

Because your caller would have to be like this:

void baz() {
  const char* ret = bar();
  bizaar(ret);
}

And the receiver of the return value (bizaar) must be defined as something like:

  void bizaar(const char* in) {
    bizaar2(in);
  }

And now bizaar2 needs to be declared as:

  void bizaar2(const char* in);

And so on… You see how the contagion started? Because bar() returned a const value, which is unnecessarily restrictive. So in general, it is not a good idea to return ‘const’.

I’ll leave this as an thought exercise: When can we reasonably use ‘const’ on the return value?

Why is indexing faster than binary search

We all know that indexing into an array takes only O(1) time, while searching for a number in a sorted array takes O(n) time with linear search, and O(log n) time with binary search. But why is indexing so fast? What is the secret sauce?

The reason is really about the nature of indexing — how it is implemented in a circuit. In order to illustrate this, let me show you an “addressing circuit”.

addressing cuicuit

Here, A and B are the two-bit address lines, they represent the indices: 00, 01, 10, 11. The output Z, Y, X and W are the selectors of the items in the array. Notice that an output selector is enabled only when both of the input lines of the corresponding AND gate is “1″.

Now, ignore the input B and just look at A. See how its signal flows through the direct wires and the inverters. We can make the following observations:

  • When A is “1″, then the AND gate of X and W will receive a “1″ on one of their input ports, while the AND gate of Z and Y will receive a “0″ on one of their input puts.
  • On the other hand, if A is “0″, then the AND gate of X and W will receive a “0″ on one of their input ports, while the AND gate of Z and Y will receive a “1″ on one of their input puts.

From the above, I hope you have seen that the value of A partially selects half of the AND gates — it is either the set {X, W} or {Z, Y}. By “partially select”, I mean they are not fully selected, because we haven’t taken B into account. At this point, I hope you have noticed that A is in fact doing one step of a “binary search”.

Now we do a similar thing, but this time focus on B and ignore A. You should see a similar thing: depending on the value of B, either we partially select {Y, W}, or we partially select {Z, X}. So we can also think of B as doing one step of a “binary search”.

Now, we see that A and B are each a step of a binary search, and it is interesting to see that B’s selection will cut A’s selection evenly, whether A is 0 or 1. We can say the same thing vice versa: A’s selection will cut B’s selection evenly, whether A is 0 or 1.

Also notice that the selection of A and B can happen at the same time. That means, when they work simultaneously, it takes just O(1) for a binary search through an array of length 4. If we generalize this circuit to N bits of input, then within O(1) time, we can do a binary search through an array of length 2N.

This explains why indexing an array is faster than binary search, because it is a parallel binary search where (log n) steps happen at the same time.

On pureness

It is usually a pleasure to use a functional programming language, but it is doubtful whether we should care about the “pureness” of the language. To write programs in a completely pure functional programming language (e.g. Haskell) is much like living in a wired world.

There are no electromagnetic waves nor sound waves in such a world, so you don’t have wifi, cell phones, satellite televisions, radios, etc. You don’t even have light or sound. In other words, everything is blind and deaf.

All information must pass through wires or pipes, connected by switch boxes which we call “monads”. You must carefully route all the wires and pipes and connect them correctly before any information processing device, including your eyes and ears, can properly work.

So I think a language will need parts of it not to be pure, in order to naturally represent the effects such as electromagnetic waves. Normally those effects are called “benign effects“.

Undecidability proof of the halting problem without diagonalization

As a TA for a graduate level Theory of Computation course, I don’t understand why we have to teach the diagonalization proof of the undecidability of the halting problem. It is much easier if we just use a normal argument about functions in a programming language. Basically, functions correspond to Turing machines in a theory of computation context. In order to improve the situation, I came up with a simpler formulation based on the view of “Turing machines as functions”.

Suppose that we have this function Accept(M,w) which takes two arguments, a function M and its input w, and can tell you whether or not M(w) will output True. Basically, suppose that Accept is the hypothetical solver for the halting problem. To connect to usual theory of computing terminology, the fuction M corresponds to a Turing machine, True corresponds to reaching an accept state, and False corresponds to reject state. Note that although the argument M may go into infinite loops, never will Accept.  Accept always returns True or False in a finite amount of time.  Accept determines this without actually running M because M may go into an infinite loop and never return. This way of finding properties of programs without running them is usually called static analysis. It has the same essence as fortune-telling and all other sciences.

Now we come to our simple contradiction: does the following expression (call it F) return True or False?

Accept(λm.not(Accept(m,m)), λm.not(Accept(m,m)))

It turns out that this question cannot be answered at all! If F returns True, then when we actually apply the function λm.not(Accept(m,m)) to the argument λm.not(Accept(m,m)), it should return True, right? But when we apply it, we get

not(Accept(λm.not(Accept(m,m)), λm.not(Accept(m,m))))

Notice that it is exactly the negation of the original expression F. This means that F should return False (because the “actual run” returns False). On the other hand, if F returns False, then when we actually apply the function on its argument, it will return True. Thus the contradiction. QED.

Some may argue that I’m implicitly using “diagonalization” here. Well, you can say that they are equivalent in effect, but the equivalence doesn’t mean that “diagonalization” is a good word for describing what I’m doing. Actually I’m just making a loop in a “circuit”. I don’t see the diagonal, then why call it that way? The diagonal argument and the “loop argument” may have the same essence, but they are very different in their comprehensibility. I feel that the loop way is a much easier to understand than diagonalization.

An interesting final question may be, what is the connection between the above expressions with the Y combinator?

λf.(λx.f(x x)) (λx.f(x x))

A mind map guide for TeXmacs

TeXmacs was my best kept secret for homework assignments and papers with lots of mathematical formulas. It is a state-of-the-art typesetting system with unparalleled beauty. Many people think that it is just a “frontend” of TeX/LaTeX (like LyX), but actually TeXmacs is fundamentally superior than TeX. It inherited some good designs of TeX, but then surpassed it. TeXmacs is a fully independent software. You don’t need to install TeX/LaTeX in order to use it.  It has the same or even better typographical quality than TeX but with much more intuitive and beautifully designed interface.

The editor environment is just amazing. It is 100% WYSIWYG. That means, the screen shows exactly what you would print out on paper. As of today, I don’t know of any word processor (including MS Word) that can achieve this. If you haven’t used TeXmacs before, it would refresh your mind about what the word WYSIWYG really mean.

I know exactly what you would argue now, TeXers, but listen—with all the beauty, you don’t lose detailed control of document structure. You will be surprised to find out that it is also WYTIWYG (What You Think Is What You Get), and it has exploited this notion to the extreme. You will sense the notion of “human centered design” (HCD) in it.

So, ready for a try? Just download it (free) and dive in. Over the years, I have accumulated lots of useful knowledge regarding its use and have collected them into a mind map. Click on the following image and you can see all my best tricks with this software. Have fun!

Understanding the yin-yang puzzle

I have a friend who is a very fast learner. I introduced him to the Scheme programming language a few days ago and he soon digged into this yin-yang puzzle:

(let* ((yin
        ((lambda (cc) (display #\@) cc) (call/cc (lambda (c) c))))
       (yang
        ((lambda (cc) (display #\*) cc) (call/cc (lambda (c) c)))))
  (yin yang))

This program prints out the infinite string @@@@****@*****@****** …

Why does this happen? This took me almost a full afternoon to figure out. It is always good to have friends who feed you with questions and challenges. I’m going to document my finding here. I may want to make some animated slides (as is always a good idea) later, but for now a short note may suffice.

The Plan

You will probably never understand the program if you examine the contents of the stack. You need their high level semantic meaning to think clearly. The higher level meaning of the stack is the continuation. So instead of frightening you with the details of the stack, I’m going to use functions to show the continuations explicitly.

Also I will use a little bit of compiler optimizations along the way. The optimizations will simplify the program and show its deeper meanings.

Here is an overall plan of this article:

  1. Find the functional representations of the two continuations. Name them c1 and c1.
  2. Plug in c1 and c2 into the places of the two call/cc’s.
  3. Simplify the program using compiler optimizations.
  4. Perform behavioral analysis on the simplified program.

Transform and Remove Call/cc

So here we go. Let’s repeat the original program here:

(let* ((yin
        ((lambda (cc) (display #\@) cc) (call/cc (lambda (c) c))))   ; B1
       (yang
        ((lambda (cc) (display #\*) cc) (call/cc (lambda (c) c)))))  ; B2
  (yin yang))

Note that there are two binding clauses, B1 and B2, each prints a char and binds a continuation to a variable.

B1 binds yin to the “current continuation” of the first call/cc. Let’s call this continuation c1. Its function representation is:

(lambda (k)
  (let* ((yin
          ((lambda (cc) (display #\@) cc) k))           
         (yang
          ((lambda (cc) (display #\*) cc) (call/cc (lambda (c) c)))))          
    (yin yang)))

We can simplify it using some compiler optimization tricks. I will go very slowly here just in case you are not familiar with compiler optimizations.

First, since Scheme is a strict language, k will be a pure value (with no side-effects), so we can lift the side-effect (display #\@) out:

(lambda (k)
  (display #\@)
  (let* ((yin
          ((lambda (cc) cc) k))           
         (yang
          ((lambda (cc) (display #\*) cc) (call/cc (lambda (c) c)))))          
    (yin yang)))

Now we can “partial evaluate” ((lambda (cc) cc) k) to k:

(lambda (k)
  (display #\@)
  (let* ((yin k)           
         (yang
          ((lambda (cc) (display #\*) cc) (call/cc (lambda (c) c)))))          
    (yin yang)))

Then we can copy propagate the value of yin, k, to the body of let*, (yin yang). And now we have only one binding left:

(lambda (k)
  (display #\@)
  (let ((yang
         ((lambda (cc) (display #\*) cc) (call/cc (lambda (c) c)))))
                                         ^^^^^^^^^^^^^^^^^^^^^^^^
    (k yang)))

Now we try to figure out the underlined continuation. It should be:

(lambda (j)
  (let ((yang
         ((lambda (cc) (display #\*) cc) j)))
    (k yang)))

You see why? Since the “current continuation” means “what is going to happen with this value”, it doesn’t include the computation before it, namely (display #\@). Now we do a similar optimization on this continuation: lifting out the side-effect (display #*), do some partial evaluation and copy propagation:

The result of this inner continuation is:

(lambda (j)
  (display #\*)
  (k j))

Now we can plug it back:

(lambda (k)
  (display #\@)
  (let ((yang
         ((lambda (cc) (display #\*) cc) 
          (lambda (j)
            (display #\*)
            (k j)))))
    (k yang)))

Do a similar sequence of optimizations: lifting out the first (display #*), partial evaluation, copy propagation. And we have the final result for the value of yin. Let’s name it c1 with a definition:

(define c1
 (lambda (k)
   (display #\@)
   (display #\*)
   (k (lambda (j)
        (display #\*)
        (k j)))))

Now, the original program would look like the following. The program would have by now printed a @ before binding yin to c1, so we make a sequence and include the display term in the front of it.

(begin
  (display #\@)
  (let* ((yin c1)
         (yang
          ((lambda (cc) (display #\*) cc) (call/cc (lambda (c) c)))))
    (yin yang)))

Try it. It will behave the same as the original program. Now we do another copy propagation to get rid of the first binding:

(begin
  (display #\@)
  (let ((yang
         ((lambda (cc) (display #\*) cc) (call/cc (lambda (c) c)))))
    (c1 yang)))

Since the call/cc here will not cause any side-effects (why?), we can lift (display #*) out:

(begin
  (display #\@)
  (display #\*)
  (let ((yang
         ((lambda (cc) cc) (call/cc (lambda (c) c)))))
                           ^^^^^^^^^^^^^^^^^^^^^^^^
    (c1 yang)))

Now we just have to find what this underlined continuation is, and it will become the value of yang. It is:

(lambda (k)
  (let ((yang
         ((lambda (cc) cc) k)))
    (c1 yang)))

After our routine sequence of optimizations we have the value of yang. Let’s define it as c2:

(define c2
  (lambda (k)
    (display #\*)
    (c1 k)))

Plug c1 and c2 back into the original program:

(begin
  (display #\@)
  (display #\*)
  (let* ((yin c1)
         (yang c2))
    (yin yang)))

And do a copy propagation:

(begin
  (display #\@)
  (display #\*)
  (c1 c2))

Transformed Program

Now we have our final transformed and optimized program. Let’s put the definitions of c1 and c2 here for an overview:

(define c1
 (lambda (k)
   (display #\@)
   (display #\*)
   (k (lambda (j)
        (display #\*)
        (k j)))))

(define c2
  (lambda (k)
    (display #\*)
    (c1 k)))

(begin
  (display #\@)
  (display #\*)
  (c1 c2))

For simplicity, let’s inline c2 into the main program. Now c2 disappears.

(define c1
 (lambda (k)
   (display #\@)
   (display #\*)
   (k (lambda (j)
        (display #\*)
        (k j)))))

(begin
  (display #\@)
  (display #\*)
  (c1 (lambda (k)
        (display #\*)
        (c1 k))))

Try it. It will still work the same as the original program.

Behavioral Analysis

Now the program doesn’t contain any call/cc’s and is much easier to understand. Let’s try to figure out how it executes:

  1. First, @* will be printed, and we will invoke:
(c1 (lambda (k)
      (display #\*)
      (c1 k)))
  1. Inside the invocation, @* will be printed by the body of c1, and k is now bound to (lambda (k) (display #*) (c1 k)). So the program proceed to:
((lambda (k)
   (display #\*)
   (c1 k))
 (lambda (j)
   (display #\*)
   ((lambda (k)
      (display #\*)
      (c1 k)) 
    j)))

It will print a *, and becomes:

(c1
 (lambda (j)
   (display #\*)
   ((lambda (k)
      (display #\*)
      (c1 k)) 
    j)))

Now we have seen @*@**

  1. Notice that we are back to a call to c1! This is a good sign of recursion. But this time the argument is different. If we simplify it a little, we get:
(c1
 (lambda (j)
   (display #\*)
   (display #\*)
   (c1 j)))
  1. I think you see what’s going on here. This time, c1 is called with
(lambda (j)
   (display #\*)
   (display #\*)
   (c1 j))

which means “When called, display TWO *’s, and then behave like c1 on the argument”. If we go on, the argument to c1 will be longer and longer, and each time with an additional (display #*). It will print @***, and then @****, and then @*****, and so on.

  1. Let’s introspect a bit. Which part of the program is responsible for creating the ever-longer displays, and why? It is this piece from the definition of c1:
(k (lambda (j)
     (display #\*)
     (k j)))

Here k is c1′s argument. Notice that k is always of the form:

(lambda (j)
   (display #\*)
   (display #\*)
   ...
   (c1 j))

When k is applied, it will print the corresonding number of *’s inside it, and then behave like c1. The argument to k is:

(lambda (j)
  (display #\*)
  (k j))

What does this mean? It says: “When called, print a * and then behave like k on the argument.” This is how you get a “new k”, with one more display of *.

This will go on indifinitely. This is why you see the infinite string: @@@@****@*****@****** …

On software design patterns

In this post I try to answer the contraversial question

Are software design patterns good or bad?

This used to be a long post, but I have since condensed it into a very short one, thanks to the pictures. One picture is worth ten thousand words. Hopefully you will have understood most of Richard Gabriel’s 239-page book after looking at the following two pictures.

  1. Here are some nice design patterns. You can use them to decorate the floors in your house.

  2. Here is a masterpiece. Can you construct it using the above patterns, or any patterns at all?

I think the answer is already obvious. One cannot produce a masterpiece by just putting patterns together. Artists know that techniques are never as important as vision. That is why they spend their lives studying the nature of things such as the human body. Once they get hold of the essence, they use whatever techniques at hand to express what they want. They can also change the techniques or invent new ones as needed.

The same holds for computer programming. We need to keep in mind that design patterns are means and not ends. True programmers are never constrained by existing patterns. They observe and grasp the essence of things, inventing new techniques and patterns as needed.

Concurrent stack does not exist

Some weeks ago I read this thought-provoking article by Nir Shavit Data Structures in the Multicore Age. The topic is about efficiency of concurrent data structures. There have been many thinkings going on after the reading, but the most interesting point to me is that the author started trying to use a “concurrent stack”, but ended up happily using a pool.

“In the end, we gave up the stack’s LIFO ordering in the name of performance.”

Now there can be several interesting questions to ask. Can we give up some property of a data structure if it is crucial to the correctness of the program? If we can give up the LIFO ordering, then why did we think we need a stack? But the most interesting question is probably:

Does a “concurrent stack” really exist?

I think the answer is No — a “concurrent stack” hasn’t really existed and will never exist. Here is why:

  1. If the threads are allowed to push and pop concurrently, then the “stack” can’t really maintain a LIFO order, because the order of the “ins” and “outs” is then indeterminate and contingent on the relative execution speeds of the threads.
  2. If the execution of two threads strictly interleave. Each time a “pusher” pushes an element, a “popper” immediately pops it out, then this is a FIFO order, a queue.
  3. If a pusher pushes all the elements before a popper starts to pop all of them out, then this is indeed a LIFO order, a stack. But notice that there is no concurrency in this case — the executions of the threads must be completely sequential, one after another.
  4. If two threads interleave randomly, or there are more than two threads accessing the “stack” at the same time, then nothing can be said about the access order.

From the above, we can see that there is a fundamental conflict between the two notions, “concurrency” and a “stack”.

If a “stack” can be accessed concurrently, then there is no way we can maintain a LIFO order. On the other hand, if we enforce a LIFO order, then the stack cannot be accessed concurrently.

If we have realized that the essence of a stack is a continuation, and a continuation (by itself) is sequential, then it is no surprise we arrive at this conclusion.

Since a LIFO order is essential to a stack, we can’t call the data structure a “stack” if we can’t maintain a LIFO order.

We can’t call a data structure a “stack” just because it has the methods named “push” and “pop” — we have to look at what the methods actually do.

Even if we continue to think that we are using a stack, the threads are in effect just distributing messages, with the operations “push = send” and “pop = receive”. So in essence this data structure is a pool. This exactly justifies the author’s relaxation to a pool, although no actual relaxation happens — the data structure has been a pool from the beginning. It was just disguised as a stack and less efficient.

So we see that the concept of a “concurrent stack” does not really exist. We also see that some data structures we have in the sequential world may not have concurrent counterparts.

How to reinvent the Y combinator

I always believed that one can never learn the essence of anything without reinventing it. Once you reinvent things, you will never forget them — because otherwise you can just re-reinvent them.

Today I found that I forgot how to derive the defintion of the Y combinator. I learned it several years ago from an online article, but now the search “Y combinator” only brings me news about start-ups (sigh). I tried for two hours, but still couldn’t make the leap from the “poor man’s Y” to “Y”. Finally, I opened my good old friend The Little Schemer. Alas. Chapter 9 tells me exactly how to reinvent Y. Thank you Dan Friedman and Matthias Felleisen!

To prevent myself from forgetting how to derive Y again, I made a slide to record my understanding of it. I hope it can be of help to people (including the future me). So here it is.

Exercise: The Y combinator derived from this tutorial only works for direct recursion, try to derive the Y combinator for mutual recursive functions, for example the following two functions even and odd.

(define even
  (lambda (x)
    (cond
     [(zero? x) #t]
     [(= 1 x) #f]
     [else (odd (sub1 x))])))

(define odd
  (lambda (x)
    (cond
     [(zero? x) #f]
     [(= 1 x) #t]
     [else (even (sub1 x))])))

A reformulation of reducibility

I found the theory of computation books very imprecise about their descriptions of Turing machines and reductions. They usually start with something precise and mathematical, for example they would define a Turing machine as a 7-tuple, but after that, everything about decidability and reduction is described with impenetrable paragraphs in natural languages.

I believe that the use of natural languages leads to most of the confusion in theory of computation because natural languages are inherently imprecise and ambiguous. There are too many ways to say the same thing. For example, you can find these sentences which mean exactly the same in the books:

  • “Run M on w”
  • “Run M on input w”
  • “Simulate M on w”

The pronouns and references are even more confusing. For example:

“Use the description of M and w to construct the TM M1 just described.”

What is “just described”, and do M and w here mean the same things as in the earlier paragraph?

Another serious problem is that natural languages are very weak at representing the notion of “interpretation”, which underlies much of theory of computation. For example, a Turing machine simulates another Turing machine, which again contains and simulates yet another one.

After some thought, I found that we could use something precise such as mathematical notations combined with programming languages to describe the concepts. As an example, I’ll show here how the notion of reduction can be defined precisely as a homomorphism which can be instantiated for reducing one problem to another.

Definition 1 (reduction): A reduction (as in theory of computation) is a homomorphism (as in universal algebra):

Reduce(TM, I) = (TM', I')

satisfying the property

TM @ I = TM' @ I'

where

  • TM = Turing machine which we reduce from
  • TM' = Turing machine which we reduce to
  • I = input string of TM
  • I' = input string of TM’
  • @ = simulation, similar to the Scheme code ((eval TM) I)
  • TM @ I = result from simulating TM on I (accept or reject)
  • TM' @ I' = result from simulating TM’ on I’ (accept or reject)

End of Definition 1.

Notice that the simulation (TM @ I) may go into an infinite loop and never produce any result. Now I show how to use this homomorphism to describe the reduction from ATM ==> HALT, where

  • ATM = the “acceptance problem” (deciding whether a Turing machine M accepts string w)
  • HALT = the “halting problem” (deciding whether a Turing machine M halts on string w)

For convenience, we let

  • DATM = “the decider of ATM”
  • DHALT = “the decider of HALT”

Now the reduction can be fully described by the following homomorphism:

Reduce(DATM, (M,w)) = (DHALT, (M',w))
where
  M' = <if (M @ w) then accept else loop>
satisfying
  DATM @ (M,w) = DHALT @ (M',w)

Yes, that’s an all-inclusive formal proof that HALT is undecidable. It even includes the notion of “reduction” itself.

Let me explain it a bit. The first line says that there exists a function (named Reduce) from the pair (DATM, (M,w)) to another pair (DHALT, (M',w)), where M' = <if (M @ w) then accept else loop> is a description of a Turing machine.

Now let’s look at the last line:

DATM @ (M,w) = DHALT @ (M',w)

It says: if we have a decider for HALT (DHALT), then we can use it to define DATM, thus deciding ATM.

Why this is a valid defintion for DATM? This is because from the definition of M'

<if (M @ w) then accept else loop>

we know that:

  • If (M @ w) accepts, M' accepts, thus DHALT @ (M',w) accepts
  • If (M @ w) rejects, M' loops, thus DHALT @ (M',w) rejects
  • If (M @ w) loops, M' loops, thus DHALT @ (M',w) rejects

Notice from the colored words that DHALT @ (M',w) will accept if and only if M accepts w. Thus this defines a decider for ATM.

So if DHALT exists, then we can have DATM. But this contradicts the fact that DATM cannot exist, so DHALT must not exist.

If you wonder how this proof corresponds to Definition 1, here is some details how it is instantiated:

  • TM = DATM (nonexistent)
  • TM' = DHALT (hypothetical)
  • I = (M,w) where M is the description of a Turing machine which we want to know whether it accepts input w.
  • I' = (M',w) where M' is <if (M @ w) then accept else loop>
  • TM @ I = DATM @ (M,w) (running decider of DATM on input (M,w))
  • TM @ I' = DHALT @ (M',w) (running decider of DHALT on (M',w))

This is a lot more concise, accurate and easier to understand than a paragraph:

F = "On input <M,w>:
  1. Construct the following machine M'
     M' = "On input x:
        1. Run M on x.
        2. If M accepts, accept.
        3. If M rejects, enter a loop."
  2. Output <M',w>."

ADTs and objects

After reading William Cook’s essay On Understanding Data Abstraction, Revisited, let me try to condense the difference between abstract data types (ADTs) and objects into a few words.

(To avoid ambiguity, I use “instances” to stand for data created by instantiating ADTs)

  • “Instances” created by the same ADT share the same functions (methods). Functions may be parameterized, but the code is the same in each instance. Because of this, they need to share the same representation.
  • “Objects” (as proposed by Cook) don’t necessarily share function code. Each object may have a completely different set of functions (with matching names and types). This is in essence the same as “call-backs” in GUI widgets. Because of this diversity, each object may have completely different representation.

Ironically, it is usually the former case in mainstream object-oriented languages like Java and C++. In Java you can somehow achieve the latter with interfaces and anonymous classes, but it is awkward. JavaScript’s prototype-based system seems to be closer to the essence of the latter, but still not feel natural.

But different from Cook’s view, I think it is better not to consider binary operations like set union as methods. Union may take two sets with different representations and create a new one, thus it can operate by using the “true methods” of both arguments (possibly iterators and membership).

A bug in GHC’s type system

Several days ago, I implemented an experimental type inference system with first-class polymorphism. When comparing it with other systems, I found a possible bug in GHC’s type system regarding universal quantification. The phenomemon was confirmed and reproduced by people at #haskell IRC channel for GHC versions above 7.01. The code that causes trouble is:

gen :: [forall a. a -> a]
gen = [id]
test1 = head gen 1

Obviously this program should typecheck, since:

  • id has the type forall a. a -> a.
  • A list gen containing id should have type [forall a. a -> a](as in the annotation).
  • head has the type forall a. [a] -> a.
  • head gen should have the type forall a. a -> a.
  • head gen should be able to be applied to 1.

But GHC rejected this program for a strange reason.

Couldn't match expected type `t1 -> t0'
with actual type `forall a. a -> a'
Expected type: [t1 -> t0]
Actual type: [forall a. a -> a]
In the first argument of `head', namely `gen'
In the expression: head gen 1

On the other hand, it works if (head gen) is bound at let:

test2 = let hg = head gen in hg 1

It doesn’t break the soundness of the type system since it only rejects some correct programs, but this kind of pecularities of type systems can be painful when they add up. I guess this may be caused by the interaction between GHC’s internal type system with the legacy let-polymorphism.