Decoupling Type Classes

(I posted this idea to Haskell Cafe some days ago and got interesting respons from Dominique Devriese, who implemented instance arguments [1,2] (Agda’s equivalence of type classes). The disucssion has been really interesting, so I decided to collect and organize the ideas into a coherent post here. Some syntax is graciously borrowed from their work.)

This post relates the three ideas together: type classes, implicit parameters, and typed dynamic scoping. The first two are researched into some degree but their relationship and design are still subject to debates. The latter is my own analogy which appears to be really helpful clarifying those design decisions.

For the impatient, I have bulletted points with all the main ideas in this post here:

  • We can achieve the same or more expressiveness of type classes without using dictionaries. We just need implicit parameters.
  • Implicit parameters and their types can be fully inferred.
  • Implicit parameters can express MPTCs.
  • Type classes can be thought of as dynamic typing, though more disciplined.

To Group, or not to group.

Haskell’s type classes tie the two concepts “overloading” and “grouping” together, but in fact they are not necessarily related. Decoupling those two concepts has several benefits:

  • It makes the type system more flexible.
  • It makes the implementation more modular.
  • It frees us from the need to pass a whole dictionary when we only use one function in it.

I’m curious why we need to group those functions into a type class, instead of using them directly as overloaded functions. So I built an experimental type system which “decouples” the dictionary. Instead of passing a dictionary, it passes individual “implicit parameters” to the function. Each implicit parameter corresponds to an “undefined variable” in the function. Those implicit parameters are type inferenced and can contain type parameters just as the functions in a type class. Similarly, they are resolved by their types in the call site’s scope.

The convenience of this approach compared to Haskell’s type classes is that we no longer require a user of a type class to define all the methods in a type class. For example, a user could just define a method + without defining other methods in the Num class: -, *, … He can use the method + independently. For example, if + is defined on the String type to be concatenation, we can use + in another function:

weirdConcat x y = x + y + y

This has a utility, because the methods in the Num class don’t “make sense” for Strings except +, but the current type class design requires us to define them. This is burdensome. By separating the Num class into individual overloaed functions, we enables this kind of overloading without any extra work.

Then a question arises: how do we relate the types of functions if we no longer have this grouping? For example, for the Monad class, we expect return to be of type (a -> m a) if (>>=) is of type (m a -> (a -> m b) -> m b).

The answer is to use a polymorphic record to group the functions when needed, then reference the functions through the record.

Thus we have delegated the task of overloading to separate overloaded functions, and the task of grouping to records. This is a more modular design compared to the original type class system of Haskell.

Expressing MPTC

There is another benefit of this decoupling: it is expressive enough to subsume MPTC. Here is why: because the methods are no longer grouped, there is no longer a “common” type parameter to the methods. Thus we can easily have more than one parameter in the individual methods and conveniently use them as MPTC methods.

For example,

overload g
 ... ...
f x (Int y) = g x y

Notice that in my system, there are no explicit declarations containing type variables. The declaration “overload g” is all that is needed. The system infers f’s type as:

a -> Int -> {{g: a -> Int -> b}} -> b

Here it automatically infers the type for g (a -> Int -> b) just from its usage inside f, as if we have written a type class definition like:

class G a where
 g :: a -> Int -> b

So in this system, not only we don’t need to defined type classes, we don’t even need to declare the types of the overloaded functions! We can infer them from their usage and different calls to the same function don’t even need to have the same type! All it takes is:

overload g

And even this is not really necessary. It is for sanity purposes – to avoid inadvertent overloading of unbound variables.

So if g is used as:

f x y (Int z) = g x z y

then we infer its type as:

a -> b -> Int -> {{ g :: a -> Int -> b -> c}} -> c

and g will be equivalent to the one you would have defined in a MPTC method.

Some Implementation Details

Here is how it can be implemented. When we see an “undefined” variable in a function definition which has been declared as “overloaded function”, we store the function name, and the type variables that are associated with it. For example,

overload g -- (explicitly declare g as an overloaded function)

f x y (String s) = ...
...
let z = g x s y in
...
...

We don’t know what x and y are, but we know from the body of f that their types satisfy this pattern: g a String b. Thus we store this pattern constraint as an extra (implicit) argument in the type of f:

f :: a -> b -> String {{g: g a String b}}

We may have multiple such arguments.

At the call sites of f, we look for a function g in the scope that satisfies the pattern g a String b, but we don’t pass on the substitution, so they remain polymorphic. Once found, the function is passed as an extra parameter to f. This is essentially dictionary passing, but without grouping. It can be also more efficient because the parameters may be stored in registers.

Here g is explicitly declared as “overloaded”, although my experimental system doesn’t need this. Any undefined variable inside function body automatically becomes overloaded. This may cause unintended overloading and it catches bugs late. That’s why we need the “overload” declarations.

But the automatic overloading of the undefined may be useful in certain situations. For example, if we are going to use Haskell as a shell language. Every “command” must be evaluated when we type them. If we have mutually recursive definitions, the shell will report “undefined variables” either way we order the functions. The automatic overloading may solve this problem. The undefined variables will temporarily exist as automatic overloaded functions. Once we actually define a function with the same name AND satisfies the type constraints, they become implicit parameters to the function we defined before. If we call a function whose implicit parameters are not associated, the shell reports error very similar to Haskell’s “type a is not of class Num …”

Type Classes are Typed Dynamic Scoping

It appears to be helpful to think of the calls to overloaded functions as referencing dynamically scoped variables. They are dispatched depending on the bindings we have in the call site’s scope (and not the scope where the method is defined!). So it is very much similar to the much-hated dynamic scoping. But the dispatching is more disciplined — it doesn’t just match the name. It must match both the name and the inferred type. So in general it is useful and doesn’t cause trouble.

This intuition also explains why local instances shouldn’t be allowed, because if we capture the variables at the definition site, the method call becomes “statically scoped”.

Let me post an example from Oleg:

foo x =
 let overload bar (x:Int) = x + 1
 in \() -> bar x

baz =
 let overload bar (x:int) = x + 2
 in foo (1::Int)

Here at the call site foo (1::Int), bar should be resolved to the one inside baz, and not the one inside foo. My analogy to “dynamic scoping” is useful to understand the reason here. If we capture the bar inside foo’s definition site, it would correspond to “lexical scoping”.

Footnotes:

[1] https://lirias.kuleuven.be/handle/123456789/304985

[2] http://wiki.portal.chalmers.se/agda/pmwiki.php?n=ReferenceManual.InstanceArguments


Register Allocation Is Not Graph Coloring

After years of wheel-reinventing (examples), I finally did some possibly “new” research :-P Under the supervision of Prof. R. Kent Dybvig, I designed a new method for register allocation, a fundamental problem in compiler construction.

Different from earlier methods, it completely departs from the graph coloring paradigm, and is based on the so-called “model transformer semantics”. I claim that register allocation is in essence not a graph coloring problems, but rather like a cache replacement or scheduling problem. I’m amazed by how simple my method is, and am motivated to submit to a conference such as PLDI. But because of my limited experience, I have no idea whether someone else has done this kind of research before. So I’m just going to post my draft paper here and welcome any early comments about it.

Here is the abstract of the paper:

Register allocation has long been formulated as a graph coloring problem, coloring the conflict graph with physical registers. Such a formulation does not fully capture the goal of the allocation, which is to minimize the traffic between registers and memory. Linear scan has been proposed as an alternative to graph coloring, but in essence, it can be viewed as a greedy algorithm for graph coloring: coloring the vertices not in the order of their degrees, but in the order of their occurence in the program. Thus it suffers from almost the same constraints as graph coloring. In this article, I propose a new method of register allocation based on the ideas of model transformer semantics (MTS) and static cache replacement (SCR). Model transformer semantics captures the semantics of registers and the stack. Static cache replacement relaxes the assumptions made by graph coloring and linear scan, aiming directly at reducing register-memory traffic. The method explores a much larger solution space than that of graph coloring and linear scan, thus providing more opportunities of optimization. It seamlessly performs live range splitting, an optimization found in extensions to graph coloring and linear scan. Also, it simplifies the compiler, and its semantics-based approach provides possibilities of simplifying the formal verification of compilers.

Its full text can be found here. I also have given a pl-wonks talk about it in Nov 2011. Here are the slides [PPT] [PDF].


ydiff: How We Compare Programs in The Future

(Click on the above picture to see it in action. See the end of the post for more demos!)

I neglected to write a proper introduction to ydiff, but I think it is about time to write something about it.

Motivation

Although I call it a “semantic diff tool”, the purpose of designing ydiff is more far-reaching than it seems. I have long been imagining a world where programs are not represented as text, but as data structures (see this post). They will be editted not with text editors, but with structural editors, which create and manipulate the abstract syntax trees (AST) of programs directly. I have been talking to people about this idea, but very few thought this as an important change, until recently I found a post that points out the importance of this idea. I’m happy that another person realized that.

So let’s assume that structural editing is the future of programming. How do we compare programs and do version control in a world where programs are stored not as text streams, but as structured data? The line-by-line diff utilities and version control systems will simply stop to work, because we no longer have “lines” or “words” in our programs!

Well, ydiff is my answer to this futuristic problem. It is a diff tool (and hopefully evolving into a full-fledged version control system) designed for the upcoming “structural editing age” of programming.

Currently ydiff takes pains to first parse program text into ASTs and do structural comparison on the trees, but in the future, programs will be stored as data structures. The parsing step will be gracefully skipped, and ydiff will be readily extensible to all programming languages.

The following is the introduction to ydiff and the demos taken from my homepage

Why ydiff?

We very often want to know how our code is modified. This is normally done by using the diff program. While effective for finding differences between close revisions, the result from diff can be very hard to read for revisions over a long time span. The worst case is when the code is massively rearranged or refactored, where diff may consider every line to have been changed.

ydiff can help in this kind of situation. Different from line-based or character-based diff, ydiff is “language-based”. It parses the programs and then performs structural comparison on the parse trees. In addition to the usual functionalities of diff, it has the following properties:

  • Language-awareness. ydiff truly understands language constructs and will not make non-sensical comparisons. For example it will never compare a string “10000″ with an integer 10000 even though they look similar. Also, it tries to find definitions with the same name before it attempts to destruct and compare definitions of different names.
  • Refactor detection. It can find renamed, moved, reordered, wrapped, lifted, combined or fragmented code. Refactored code can be detected however deep they are into the structures.
  • Format insensitivity. The comparison result will not be affected by linebreaks or whitespaces.
  • Comprehensible output. The output of ydiff is designed for human understanding. The interactive UI helps the user navigate and understand changes efficiently.

These properties make ydiff helpful for understanding changes. It may also be possibly used for detecting plagiarism in programming classes or copyright infringement of code.

Supported Languages

Currently ydiff supports the following languages:

  • Scheme
  • Emacs Lisp
  • Python
  • Javascript
  • C++

The supporting part of a language is just a parser for that language, into a unified parse tree data structure. I have designed a parser combinator library to make parser-writing less painful, but they are still quite tricky to write. Hopefully in the future, the programs will be edited by structural editors and stored as data structures, so no parsers will be needed any more!

Demos

Before going further into the design details, you are invited to take a look at the demos. They are interactive HTML files with a nice UI design. The left and right windows are always locked in their relative position. A mouse click on changed, moved or unchanged nodes will highlight the matched nodes and scroll the other window to match. After that, the windows will be locked into their new relative position for browsing.

Okay, here are the demos.

  • Scheme (mk-mk-c.html)

    Comparison of the original miniKanren from Professor Dan Friendman and the version I modified in order to support condc, a “negation operator”. Pay attention to the function unify, whose major part is moved into unify-good.

  • Emacs Lisp (paredit20-paredit22.html)

    Comparison of two versions (v20 and v22) of Taylor Campbell’s paredit-mode, a structural editing mode of emacs for Lisp programs.

  • Arbitrary S-expressions (pass1-pass2.html)

    Trying to find a bug in an optimizing pass of my Scheme compiler by comparing the IRs (represented as S-expressions).

  • Python (demo1-demo2.html)

    Comparison of two implementations of a small list library that I wrote, which implements Lisp-style lists. The first implementation uses recursive procedures while the second uses Python’s generator syntax and is iterative. Pay some attention to append, whose code is moved inside another function appendAll.

  • Javascript (nav-nav-div.html)

    Comparison between two major revisions of the UI code of ydiff itself.

  • C++ demo1 (d8-3404-d8-8424.html)
    and C++ demo2 (simulator-mips-simulator-arm.html)

    There are two demos for C++. The first demo compares two versions of the d8 Javascript debugger from the V8 project (v3404 from 2009 and v8424 from 2011). The second demo compares V8′s simulators for two different processors (MIPS and ARM).

    The d8 demo is especially interesting because by clicking on the lines of the method Shell::Initialize in the old version, it can be clearly observed that its body has been distributed into several procedures in the new version:

    Shell::Initialize
    Shell::CreateGlobalTemplate
    Shell::RenewEvaluationContext
    Shell::InstallUtilityScript
    

    Also the major part of Shell::Main is moved into the helper Shell::RunMain.

Algorithms and Technical Details

There are basically two main ideas inside ydiff: Tree Editing Distance and Substructure Extraction, plus the use of the neat idea of parser combinators (found in Parsec).

  • Tree Editing Distance. Tree Editing Distance measures difference between two trees. In our case, it is the abstract syntax tree. When we encounter two trees, how do we compare them? If the first nodes of them appear to be the same, then we simply proceed to the next right? But what if the first nodes are different? There are three possibilities:
    1. The programmer inserted a node into the first tree.
    2. The programmer deleted a node from the first tree.
    3. The programmer modified a node in the first tree.

    We have to consider those possibilities for each level of nodes, so this naturally becomes a dynamic programming problem. Normally dynamic programming is done by cleverly find out the order to evaluate the problem space, but ydiff uses another important technique: memoization. There is a two dimensional hash table which stores the “delta” (changes) of two nodes, indexed by the two nodes themselves. The dynamic programming is then done in a recursive way, instead of the usual iterative way. The hash table automatically reduces the cost of recursion to the magnitude of iteration, by avoiding redundant evaluation of the nodes that we already compared.

  • Substructure Extraction. Tree Editing Distance can only find the “edits” — insertions, deletions and modifications, but it cannot find refactorization of code. If a piece of code is lifted out of a scope, the Tree Editing Distance function will produce two big changes: a deletion of a big node (for example, an if-statement), and an insertion of a slightly smaller node (with the “if (condition) { … }” wrap stripped off). Notice that although we only made a small change to the program, the resulting delta can be large. What we really want is that it shows a deletion of the wrap (“if (condition) { … }”), but a movement of the parts inside the wrap. This is called Substructure Extraction.

    Substructure Extraction works by recursively compare the substructures of two nodes, finding possible matches between one node and the subnodes of another. This substructural comparison is only done on the pairs of the resulting insertions and deletions from the Tree Editing Distance function. This little trick captures almost all possible refactorizations between two programs, however deep they are. This is demonstrated in the following picture, where ydiff successfully detects the movement of the function append into another function appendAll.

  • Parser Combinators. The parsers of ydiff are written with the help of a parser combinator library (in Scheme). In addition to the common features of Parsec, it detects left recursion and reports a “stack trace” of the left recursion, so that the parser writer can find the origin of left-recursion and fix the problem accordingly. This helps a lot when constructing the parsers. For examples, I constructed the JavaScript parser in one day, and the C++ parser in just two days. There are corner cases missing from the C++ parser and especially the macro part, but it parses significant pieces of code correctly (for example Google’s V8 project code).

Get the code

ydiff is an open source project. You can follow its development on github: yinwang0/ydiff or get its source code from there.


A Minimal Boot Sector

I have an ambitious personal project of constructing a safe operating system in a safe high-level programming language. Most of my current efforts are on the design of a type system and a compiler for the new language. As the language and its compiler take significant effort to design and implement, I started experimenting with the OS early on, so as to keep myself well-motivated.

The natural first step of building an operating system is to find a way to run programs on “bare hardware”. The task turns out to be quite easy despite its daunting first impression. After researching on the net and trying out various different tutorials on operating systems writing, I decided that all of them are too complicated for my purpose. Most of them assume using a C-like languages as the implementing language, and thus contain code that fits into that kind of framework. Since I will not be using C, I have more freedom of choosing how my boot sector works. After learning the good parts from the tutorials and applying my own simplifications, I arrived at my first boot sector. It is very simple and does very little (It just boots the machine and displays a colorful banner.), but it illustrates my design philosophy quite well.

The criteria I had for the boot sector are:

  • Minimal. It should not do more than illustrating how to boot a computer.
  • Interesting. It should do something interesting so that I can know that the computer is indeed booted. Currently it just displays some colorful text.
  • Independent. Because my OS will be a departure from most popular software systems in existence, it should use as little “library code” as possible. Examples of “library code” include:
    • The BIOS service routines such as “int 10H”. Instead of using them to display text, I decided to write into the VGA memory directly.
    • Complex instructions such as string operations (movsb, scasb, etc.). These instructions are complicated. They clobber certain registers and are not necessarily more efficient. In other words, I’m pursuing a design similar to RISC.
    • The processor’s “push”, “pop” and “ret” instructions. These instructions forces certain language-specific calling conventions and should be avoided.
    • Segmentation and paging. These processor features forces C-style memory management. The high-level programming language I’m designing will hopefully eliminate the need for C entirely. It will take care of memory management itself at object-level. Paging hardware and TLB will be no longer needed in the system.

The code of the boot sector turns out to be very short (only 21 lines of code excluding comments and blank lines).

org 7C00H                      ; the program will be loaded at 7C00H

start:
  mov eax, string_start
  mov ch, 1                    ; ch contains color of text
  mov ebx, 0B8000H + 718H      ; B8000H is VGA memory
                               ; 718H is offset to "center"

print:
  mov cl, [eax]                ; load char into cl
  mov [ebx], cx                ; store [color:char] from cx into VGA
  add ch, 1                    ; change color to (ch+1) mod 16
  and ch, 0x0F
  add eax, 1                   ; advance string pointer
  add ebx, 2                   ; advance VGA pointer
  cmp eax, string_end          ; until the end of string
  jg stop
  jmp print

stop:
  jmp stop                     ; infinite loop after printing

  string_start db 'My colorful new OS!'
  string_end equ $

  times 510-($-$$) db 0        ; pad remainder of boot sector with 0s
  dw 0xAA55                    ; standard PC boot signature

It is in NASM syntax and needs nasm to be assembled into machine code. After that it can be booted by QEMU, a processor emulator. The only two necessary command lines are (assuming the code is stored in a file named myfirst.asm):

nasm -f bin -o myfirst.bin myfirst.asm
qemu -hda myfirst.bin

Of course, you can also burn the disk image (myfirst.bin) onto a CD and boot a real machine from it!


Why We Don’t Need Sum Types

I discussed this idea with many people but convinced few of them, but they found no reasonable disproof of it either. I’m just going to record it here for my own records.

In a new type system I’m designing, I was trying to find a good reason to get rid of sum types (as are commonly used in ML and Haskell). Well, I don’t hate them that much, but for simplicity’s sake, I always try to get rid of things unless there are undeniable reasons that they must exist. I think sum types have a large overlapping of functionality with product types and cause inelegance and inefficiency. So I hope to replace them with “open unions”, which are more orthogonal with respect to product types. I seem to have found a good idea where sum types originated and the reason why we don’t need them in a programming language.

It is quite evident that sum types were orginated from some branch of mathematics. In those mathematical context, there are no (named) product types. All we have are Cartesian products. Cartesian products have no constructors (or type tags) in them. How do we distinguish a student represented as (Name x Weight) with a professor, also represented as (Name x Weight)? We put tags on them before putting them into a “disjoint union”. This is called an “injection”. We do this every time when putting anything into a disjoint union, and we “project” the elements out when we need their values.

From a programming language’s point of view, the injections and projections are very inefficient and inconvenient. This is equivalent to constructing new objects whenever we need to put them into a list, and destructing them when we need their components. Then construct new objects (again!) when we need to put them into another list, and so on …

In order to avoid these shortcomings, programming language researchers decided to attach type tags to the objects when they are created. The tags stay with the objects throughout their life-time. The result is the so-called “record” or “product” types. But somehow the PL researchers seemed to have been influenced by mathematics’ way. They decided that each variant of a “union” needs to define a new constructor (injection). Thus sum types were born. For example,

data T1 = Foo Int | Bar String

Here Foo and Bar are in essence injections. If you take a closer look, there isn’t any difference between a sum type and a disjoint union. Why do we need the injections and projections in a language where type tags always tell us what an object is? Another problem is that a constructor can only belong to one sum type. There is no way we can reuse it in another sum type. For example, there is no way you can define another sum type like:

data T2 = Foo Bool | Baz Float

because Foo is already defined in T1. What I wanted to express is that Foo belongs to the union T1 and it also belongs to the union T2. This is a very reasonable request and I can imagine how often I need it, but type systems of ML and Haskell don’t allow me to do that.

If we relax the requirement that each variant of a sum type must define a new constructor, we get “open unions”. Open unions have been implemented in OCaml (in the name “polymorphic variants”) and Typed Racket and seem to be well-behaved. They are more general and orthogonal with respect to product types. We can always choose to put another level of type tags upon the variants, therefore we lose nothing if we get rid of sum types.

ACKNOWLEDGEMENTS: Thanks to Oleg Kiselyov and Ryan Newton for pointing out that OCaml has open unions since a long time ago.


Boolean Expression Simplification

A friend of mine posed an interesting question to me a few days ago. He was trying to make a search engine for an advertisement system where each advertisement belongs to several categories (or keywords). He would like to implement a search system that can efficiently find advertisements using a simple expression language containing boolean expressions. For example, he would like to have expressions like ((“football” or “basketball”) and “baby”). For efficiency, he would like that the boolean expressions be simplified before they are used to collect streams of data items.

After a while, we decided that the boolean expressions should be converted into a “sum of products” (SoP) form so as to make the search efficient. So I made a prototype of the algorithm which can turn boolean expressions (containing “and”, “or” and “not”) into SoP form.

It turns out to be quite simple if we do it recursively. The basic idea is to:

  1. Push ‘and’ into ‘or’.
  2. Push ‘not’ into ‘and’ and ‘or’
  3. Do (1) and (2) repeatedly until no more simplifications can be made.

It is easy to see that we will have a SoP form after this transformation, because the rules (1) and (2) will be applicable if there are and’s outside of or’s, or there are not’s outside of and’s and not’s. Thus we are assured to have a SoP form when this process terminates.

The rest of the problem is how to represent the final result. I proposed to group and terms so as to reduce nesting. For example, the form (and (and a b) c) is not as good as (and a b c), because the latter contains the implicit information that a, b and c are interchangeable in their order of evaluation, so we may achieve accelerations in the algorithm by searching for keyword which contains fewer items first. This simplification in the result can be achieved with a rewriting procedure which is “context-aware”. We pass the context types into the recursive calls, and the recursion gives us different results in accordance with the context types.

The algorithm for doing these simplifications is prototyped in Scheme and uploaded to my GitHub repository:

http://github.com/yinwang0/experimental-code/blob/master/boolean-simp.ss


Restart

It has been a month since I stopped blogging. Life has been so productive with limited human interactions. Only me and my computer. I have to admit that sometimes one has to be anti-social in order to bring the most out of oneself :-)

There is enough to brag about my accomplishments of this month:

  • As an exploration of type system design, I made inferencers for various type systems including an enhanced version of Haskell’s type classes and various type systems that I came up by myself. Finally, I arrived at a preliminary design of a new type system which I hope to be the only type system that I have no complaints about.
  • I started writing a Python version of PySonar, a static analyzer which checks for type errors in Python programs. It is still in its early stage, but the most critical parts are already there. After looking at the literature, I found that my “home-made” analysis has been called “control flow analysis” (CFA) and incidentally, my analysis appears to be among the most advanced variants in this family of stuff. I will consider publish some of the ideas in it.
  • I created ydiff, a “semantic diff” tool for comparing programs. It uses a pretty accurate Tree Editing Distance (TED) function to compare parse trees. It is language-aware, refactor-aware, has a nice interface, and extensible. The currently supported languages are Scheme, Emacs Lisp, Python, JavaScript and C++. I’m looking to extend it to support all major programming languages.
  • To support ydiff, I wrote a Parsec-style parser combinator library. With its help, I wrote parsers for S-expressions, JavaScript and C++. Notably, the C++ parser was constructed in just two days, defeating the common belief that parsing C++ is hard.
  • I improved a home-made “register coalescing” pass in my Scheme compiler, which reduced the generated code size by 16%.

End of Blogging

I have stopped writing blogs.


Excuses for Writing Illiterate Programs

A friend and I had a discussion about the current trends of programming paradigms. At some point, the topic turned to the future of literate programming (LP). We learned from a recent interview that it hasn’t been widely adopted. “It has tens of thousands of fans, but not millions,” said Donald Knuth, the inventor of the term “literate programming”.

I personally have written a non-trivial literate program in CWEB a few years back, and since then became a fan of literate programming. But to tell you the truth, like many other people, I failed to persevere. As a hindsight, there is some good reasons.

Literate programming contradicts the nature of programming

There is deep truth in The Art of Computer Programming: programming is an art, and thus not literature (nor science). In what medium do artists communicate? Pictures. So, here is a picture for this activity:

You may have noticed that my drawings are not quite as good as my programs, but however awful they are, they still have the advantage that they transmit information through a parallel and high-bandwidth channel (as indicated by the wide arrows). There is no predefined order in which we appreciate art. This is very different from reading, which is in a more or less sequential fashion, and information takes significantly more time to get through:

Literature has those undesirable perperties because the sequential nature of “natural languages”. I put it in quotation marks because I have trouble understanding why they are called natural while they are obviously (and arbitrarily) man-made :-)

Whatever, here is what I guess how the history of language design goes…

At first there weren’t written languages. Our ancestors, just evolved from apes, communicated with each other using speech. Speech transmits as sound waves, and sound waves are sequential and slow (compared to light). Spoken languages have the undesirable property that they cannot be recorded for later use (considering the technologies at that time), so after some million years, people invented alphabets and written languages, so that they could store their folklores. Having trouble imagining that they could have computers some thousand years later, they decided that the written languages should be “persistent storage” of speech, so they modeled them in a similar sequential fashion as their spoken languages.

Some thousand years later, people invented modern mathematics, a great breakthrough. However again, because they couldn’t imagine of computers, they modeled mathematics in a similar fashion as natural langauges. They prove theorems in natural languages augmented with greek alphabets and strange symbols. They use different font styles to represent types. Because of the sequential nature of natural languages, they tend to repeat themselves a lot, and very often they fail to connect premises to conclusions. And they have always been confusing about scopes of variables. Consequently they have trouble convincing people even though their conclusions are perfectly correct, but they didn’t try to improve their language. Instead, they call those who cannot understand their proofs lacking “mathematical sophistication”.

Recently, some really smart people invented computers. As those computers become advanced, they can run many programs in parallel. However, humans are so intrenched in their stone-age ways of communcation that they decided that their new baby, programming languages, should also have syntax and grammar, just like natural languages. And some people even think that programs should be rearraged into the order of an essay, because that is the way how humans understand things. But they haven’t noticed that natural languages are not the only way humans can communicate. Computers are fundamentally different from most of their old media. They are able to display interactive text and graphics concurrently on the display, getting input from the user through all sorts of channels. These capabilities are very similar to art, which people have been using to communicate their most complicated concepts.

Writing a program is like creating a sculpture, you shuffle the materials around, carve and paste, trying to model a concept. Like an artist, when you are in the process of writing a program, you don’t really care whether other people understand it or not. What you really care is whether the result truly matches the concept in your mind, and whether the concept truly reflects how things work. In this process, the order of composition imposed by literature and the concerns about other people’s understanding will inevitably disrupt the “flow” in the programmer’s mind, resulting in lengthy, inelegant and complicated programs.

Admittedly, communication is necessary, but it should happen only after the artifact has been polished and ready for the world to see. A beautiful program is usually indescribable in words but can be easily grasped by people by looking at it directly. Turing it into a literate program will cause it to lose its structural beauty. An ugly program will not be understandable however hard you explain, and turning it into a literate program will only make it worse.

Programming languages are a lot more structured than natural languages and a lot more powerful in describing complicated concepts. We can understand programs a lot better by looking at them directly. If we re-order the blocks into the fashion of literature, we lose most of the structural information. For example, we see chapters and sections instead of functions and objects, and we lose their relative position in scope, because they are detached from the structure where they naturally belong. Thus programming languages, and not natural languages, should be our major way of thinking about programming.

With the help of computers, we would like to communicate with other programmers in a parallel fashion. For example editing programs in an interactive, networked, co-operatative, and semantics-aware editor, like this:

Notice that, like art, the information here is transmitted in parallel and orderlessly, because our fellow programmers can understand the programs by looking at outlines of them and expand the outlines into details as needed. They can also jump to definitions and cross-references and even start the program and trace its execution. Comments in natural languages are helpful, but they should be scattered around the program, directly attached to the most relevant programming language structure, and generally do not form a sequential document of their own. In other words, natural languages should be secondary in a program. A nice feature would be “foldable comments”, so that programmers can write detailed descriptions of a piece of code while giving the readers an option to ignore them.

The Dilemma of Literate Programming

If you have found the first excuse unconvincing (although I think there is some deep truth in it), I have another one from the viewpoints of programming languages design.

There are mainly two kinds of LP systems. The first kind is the traditional Knuth WEB style systems (WEB, CWEB, noweb, FunnelWeb …). These systems allow programmers to fragment a program into small pieces, reorder the pieces into an order that is “convenient to the human mind”. Later on, a program “weave” generates printable documentation for the program; a program “tangle” assemble the fragments into a machine-executable program. However, these systems have a well-known mistake in programming languages design—treating programs as text. WEB programs often break up procedure bodies, thus breaking up the scopes. It’s way too easy to mess up the scopes and those bugs are very subtle and hard to find. This practice is very much like macros in early Lisps and C, extremely dangerous and error-prone.

Another kind of LP systems (Literate Haskell, CoffeeScript etc.) don’t break up the procedures. But because we already have the freedom to rearrange the order of procedures in any programming language, those systems are nothing more a fancy comment facility (like JavaDoc). In views of literate programmers, those systems are “misconceptions” of LP, or “semi-literate” programming systems.

You see the dilemma? If you break up the procedures, you violate the hard-learned lessons in programming languages design; if you don’t break them up, you are a misconception ;-)

A proposed solution

There is a conflict between the elegance of the program and how to help other people understand it. The literate programming way asks the programmer to get out their ways to make the readers happy, while completely illiterate programs are hard to understand. There is actually a way in between: give the programmer the freedom to model the program as close as they like, while giving the reader the order to understand the pieces. We will see that this “linked literate” programs has the best of both worlds.


Layout Syntax Considered Harmful

Although the idea of layout syntax—using whitespace characters to delimit blocks—has been promoted by several languages (notably Python and Haskell), I feel that this kind of syntax brings more trouble than benefits.

It takes just one keystroke to produce a serious bug.

Programs in layout syntax are fragile. For example, consider the following two Python programs:

# correct definition
def member(x, ls):
    for y in ls:
        if x == y:
            return True
    return False         # correct indentation
# incorrect definition
def member(x, ls):
    for y in ls:
        if x == y:
            return True
        return False     # incorrect indentation

The second definition has been produced from the first by an inadvertent TAB key, which indented the return statement one more level to the right. While the two definitions differ only in one indentation, they produce totally different results. The first definition is correct, while the second has serious bugs:

  1. If ls is non-empty, it will always return False whether x is an element of ls or not.
  2. If ls is empty, it will return None (instead of False) because there is no return statement after the for-loop, a “missing return statement” bug.

Although this is just a minimal example, the bug may take quite some time to show up and be fixed. In order to prevent this kind of bugs from happening, I often find myself tediously moving the cursor up-and-down in a straight line in order to align the statements.

Let’s see why this bug cannot happen in a language which does not use layout syntax. We now invent an alternative syntax for Python. Now the pervious program looks like:

def member(x, ls) {
    for y in ls {
        if x == y {
            return True
        }
    }
    return False
}

Given the correct definition, can you imagine how you could reproduce the bug with just one keystroke? It is almost impossible. To see why:

  1. The return statement can never get into the loop with a change in indentation.
  2. It takes at least two edits and one movement in the editor to move the return statement into the loop. There are two alternatives to choose from:
    • Cut return False. Move the cursor into the closing curly brace of the for-loop. Paste.

    • Delete the closing curly brace of the for-loop. Move the cursor beyond return False. Insert an closing curly brace.

Either way, you must be deliberate and precise in order to reproduce the bug. Otherwise the parser would have complained (for example, if you just delete a closing curly brace).

However, the situation is very different with layout syntax, where one TAB key press produces the same amount of change as the above three operations. The change happens quickly and the program remains grammatically correct, obscuring the presence of a bug.

The situation is a little better for Haskell, because incorrect indentations often cause type errors and the programmer will be alerted, but in either languages, it takes quite some time to fix this kind of bugs.

Unconvincing advantages

It is often claimed that layout syntax has the following advantages over curly braces:

  • Your programs become a lot shorter because less curly braces are used.
  • You write less clutter such as curly braces and semicolons, and that beautifies your code.

I found that neither of the two advantages convincing. For the first part, Python and Haskell programs are indeed several times shorter than equivalent Java or C programs, but this cannot really be creditted to layout syntax.

We need to have some blank lines even in a Python or Haskell program, between definitions and sometimes in the middle of a block. The blank lines naturally denote groups of statements. So if we count the number of additional lines introduced by curly braces, we will find that there aren’t many. Curly braces also naturally denote statement groups, so not only they don’t look bad, they are helpful.

In Python and Haskell, it is the semantics (first-class functions etc.) that is making the programs short, not layout syntax. If we had an alternative syntax of Java which uses layout, then Java programs would still be several times longer than equivalent Scala programs. Java programs are longer not because they use curly braces, but because they don’t have things such as first-class functions, so they have to use some tedious design patterns.

Second, layout syntax does not really save “clutter”. Even in a language with layout syntax, we may still need to write almost the same amount of (if not more) clutter. The following is a random piece of code taken from the current release of GHC. We can still see lots of curly braces and semicolons in it. I guess layout syntax actually caused troubles, so the authors of GHC decided that they will just start to write curly braces ;-)

tcInstanceMethodBody skol_info tyvars dfun_ev_vars
                     meth_id local_meth_id
		     meth_sig_fn specs
                     (L loc bind)
  = do	{       -- Typecheck the binding, first extending the envt
		-- so that when tcInstSig looks up the local_meth_id to find
		-- its signature, we'll find it in the environment
          let lm_bind = L loc (bind { fun_id = L loc (idName local_meth_id) })
                             -- Substitute the local_meth_name for the binder
			     -- NB: the binding is always a FunBind

	; (ev_binds, (tc_bind, _))
               <- checkConstraints skol_info tyvars dfun_ev_vars $
		  tcExtendIdEnv [local_meth_id] $
	          tcPolyBinds TopLevel meth_sig_fn no_prag_fn
		  	     NonRecursive NonRecursive
		  	     [lm_bind]

        ; let full_bind = AbsBinds { abs_tvs = tyvars, abs_ev_vars = dfun_ev_vars
                                   , abs_exports = [(tyvars, meth_id, local_meth_id, specs)]
                                   , abs_ev_binds = ev_binds
                                   , abs_binds = tc_bind }

        ; return (L loc full_bind) }
  where
    no_prag_fn  _ = []		-- No pragmas for local_meth_id;
    		    		-- they are all for meth_id
Better ways to save clutter

Even if we do hate curly braces, there are better ways to reduce or even completely eliminate them. For a trivial “solution”, we could just use a dim color for curly braces and semicolons in an editor, so that they become less noticeable. For example the above Python program in that curly bracy syntax could look like this in your editor ;-)

def member(x, ls) {
    for y in ls {
        if x == y {
            return True
        }
    }
    return False
}

Better still, we could use a structural editor that lets us manipulate the AST (abstract syntax tree) directly. Those editors could provide several options of denoting blocks. You can switch between colored blocks, curly braces, or nothing at all. You can switch the look of your code at any time, instantly. People have implemented such editors, for example this editor designed by Kirill Osenkov!


Re-indentation hassle

In a language that doesn’t use layout syntax (Java, C, Scheme, …), no re-indentation is really needed when the code changes. The programmer can move a block of code by a simple copy-and-paste and continue solving the real problem. Re-indentation can always be done later and can be done automatically.

But in a language that uses layout syntax, re-indentation is mandatory, and worse, it can only be done manually. Layout syntax completely disables any editor’s auto-indent function. One may think that we might be able to invent a smarter editor that can auto-indent code for those languages. This simply cannot be done. I think this is already evident from the analysis of the above example. The two programs differ only in indentation, but they have completely different meanings. Both are legitimate programs and the editor has no way to tell which is the one you want unless it can read your mind.

Some people say that because those languages have advanced semantics, programs are so short that we don’t need to re-indent code very often. But experiences prove to me that the need for changing and rewriting code can never be eliminated. Writing code is like writing a book, you can always find pieces that need change or even complete rewriting. Usually changes in the following category will cause re-indentation:

  • Scope changes. There are lots of examples in this category: lifting an internal function to top level or push a top-level function into an internal function, surrounding a block with let-bindings, loops, conditional statements, try-except or lifting a block out of them, factoring out duplicated patterns, lifting “invariant code” out of recursion, … to name a few. These will necessarily change the indentation of a block of code, and each line needs to be re-indented.
  • Align code. For example in Haskell, when we align the arrows (->) of a case expression or the equal signs (=) of a function definition, we will notice that we have to re-indent most of the right-hand-sides, because they often contain multi-line expressions but the editors (for example, Emacs’ align-regexp function) only move the lines that contains the arrows or equal signs.
  • Renaming. We seldom choose the best names on the first shot, and good names make programs self-explanatory, so renaming is a very important and commonplace action. But in the following simple Haskell program, if we change the name from “helloworld” to “hello” and don’t re-indent the rest of the lines, we will get a parse error.
    helloworld z = let x = 1
                       y = 2 in
                     x+y+z

    Because the code becomes the following after the renaming, and the second line will no longer be aligned to “x = …”, and that confuses the parser.

    hello z = let x = 1
                       y = 2 in
                     x+y+z

    A similar thing happens when we lengthen the name to something like “helloworldcup”. Try it yourself. From this example, I hope you see how simple things are made frustratingly complicated by layout syntax. If you haven’t been convinced, try adding more lines to the above let expression :-)

The interruption from re-indenting code is usually not just one or two seconds, but often tens of seconds, even minutes. The programmer has to put down real problems at hand and turn to the mind-dead re-indenting job. This disrupts the “flow”, which is essential for producing creative and elegant code.

Syntax considered harmful

I believe that syntax, although an important aspect of natural languages, should not play a significant role in programming languages. It has already brought us too much trouble and frustration and wasted too much of our energy. You can read my other blog post detailing the harm of syntax in general and why we should remove the concept of “syntax” as a whole from programming languages.

Syntax has prevented lots of new design possibilities in programming languages. You may have heard language designers say: “Hey this is a nice feature, but the syntax of my language doesn’t leave me any room for it!” But layout syntax pushes us to this direction even more. It forces us to consciously and constantly think about syntax, drawing our attention away from semantics design. Because layout poses certain constraints on how code must be formatted, it makes a language even less extensible. Thus I consider layout syntax to be the most harmful type of syntax.


Follow

Get every new post delivered to your Inbox.