“Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface”. This the widely accepted Unix Philosophy
. There is probably nothing wrong with the first two clauses; those are the basic principles of modular design. But the last clause (about text streams), although being accepted by many people, is not quite right. Here I hope to explore a bit the reason behind it, and from that propose a simple and once-and-for-all solution to the parsing problem.
Text is a universal interface, but not a very good universal interface
We all have observed how typical Unix programs work together:
- Each program read text from from some input stream (a “file” in the virtual file system), computes some result, format the result and print the resulting text to the output stream.
- Programs can be connected using pipes, or they write into and read from files, so that they can communicate.
All seem to work well? But how does the second program extract information from the text stream? Here is exactly where the pain starts. In order to put information into a text stream, the first program has to encode it, and the second program has to decode the original information from the received text.
The encoding part is easy, we just have to decide on some syntax, for example “each line contains comma-separated fields”. But when decoding, we will need something like regular expressions if the encoding of information is simple enough, but we will need a parser if we have to communicate something of a recursive nature, such as program text. For regular expressions, we could use Perl, AWK, or a regexp library in any language, but for the parser, we have no choice but painstakingly writing it.
Even with modern parsing technologies, writing a parser is still way too much pain. They are hard to learn and hard to tweak. The language at hand may have a horribly designed syntax, making it very hard to parse. Ambiguity, left recursion,… Those problems have never stopped harassing compiler writers.
The need for a standard binary format
Those problems suggests that it is problematic to use text to represent data structures such as abstract syntax trees (ASTs). It is just unreasonably hard to decode the information that we put into it a minute ago! Indeed, text is a universal interface, but it is not the only one. There are many other universal interfaces around. Why should I stick to text as prescribed by the Unix Philosophy?
If we think of it in a type theoretic way, text streams are only one type: string. We have many other types: integers, booleans, records, … Why should we encode all other types into strings? We need a more general and efficient format that can represent all types, instead of converting them into strings.
In fact, text is part of a more general concept which I will call standard binary format in this post. Some people hate the term “binary formats”, but they haven’t realized that all data in computers and networks are in binary formats. The so-called “plain text” is just binary bits encoded in ASCII or Unicode, so they are also “binary formats” in essence.
Just think about this question: If there were no standardization of text (such as ASCII or Unicode) until today, would anybody still use text as an interface? Obviously No! If everybody uses a different encoding of the alphabet, we will have no way to display text, not to mention to use it as an “interface” for other kinds of information.
So the point is not really about “text” or binary, it is all about standardization. If we have a “standard binary format” for all types of data (including text), we would have a much better universal interface. Data structures will be passed around as they are, and there will be no need for complicated decoding procedure. This is a principle originated from the functional programming community: “Avoid encoding.”
We have many candidates for standard binary format: XML, JSON, protocol buffers, S-expressions, … I’ll try to analyze their capabilities and limitations later, but let’s assume such a format can be designed and standardized.
Programs should not be encoded as text either
Now that we have a standard binary format for all data structures (I’m cheating, but let’s assume that we have it now), we can also use it to represent ASTs. We will need extensions for existing editors or browsers. The editors or browsers will “render” the ASTs into various styles. They could be either text, graphics, sounds or some other sensory signals. The editors will allow the programmer to manipulate the ASTs directly using natural and intuitive commands, and they will save the ASTs in a standard binary format. By doing so, we completely eliminate the need for parsers because compilers just read in the ASTs directly. We also completely eliminate the concept of “syntax”. Compiler and IDE writers will be living a much happier life and programmer productivity will be greatly enhanced, because we will easily have much better editor support for ALL programming languages.
Don’t get me wrong. I’m not saying that parsing is not an interesting problem. In fact, it is fascinating. Maybe it is the pure interest in the problem itself that is driven language designers to use syntaxes are harder rather than easier to parse, thus creating problems rather than solve problems ;-)
You may think that extending editors and browsers is too much work, but actually it can be quite easy. In fact, a standard binary format for ASTs can be much simpler than formats like JPEG. Every editor and browser should be easily extensible to it. The only harder problem is how to make people agree on a standard. We may need some sort of committee to do the standardization. But aren’t we doing that for each new language, to make us agree on a syntax? This solution only requires us to do the standardization for ASTs once, for all languages.
This is the ultimate solution to the parsing problem, because it eradicate the root of the problem by changing the way we represent and manipulate programs. It is a trivial solution from a research point of view, but it demonstrates how a hard problem can be solved in a trivial way. However, the implication of this solution is definitely non-trivial. It will open up many surprisingly interesting possibilities.
UPDATE 5/13/2011: I just learned that I’m not alone. Many people have already realized the problem and started solving it long before. JetBrains released its meta-programming tool MPS, which generates structural editors from a structured description of languages (encoded by XML). Other tools in this area includes Intentional Software, Microsoft Software Factories etc. Take a look at [this post] by Martin Fowler.
April 14th, 2011 at 1:54 pm
I agree with you in principle. Microsoft agrees with you in principle: their new PowerShell stuff attempts to do this by unifying pipes on .NET objects, as I understand it. On the other hand, I think that doing this is a lot harder than it appears. One of the benefits of the UNIX text-stream API is that it is easy to write programs. It might be troublesome to use them at some point, but the ability to write one off programs should not be under-estimated. The quick and dirty solution that works for your specific case is better than an over-engineered general solution that you will never use. This is why Scheme’s syntactc abstraction is nice. You can do the quick and dirty without worrying about ruining the lives of other programmers. You do not have to put the entire language at risk to temporarily create a new language feature that suits your specific sub-domain.
If we are going to do something like this for shells, then it needs to be done in such a way that creating the one off script is just as easy as it is today, if not more so.
April 15th, 2011 at 1:21 am
Hi Aaron,
Let me reply your comment with two separate messages because it contains two different subjects. I took a look at PowerShell and yes, it passes objects through pipes. But why would we need pipes at all? If our “shell language” is the same as our main programming language (as Scheme :-)), then
Cmd1 | Cmd2
is just equivalent to the nested function calls
Cmd2( Cmd1() )
In this world, we don’t really need shells or pipes. Objects can be just passed among functions as arguments.
So I think the reason we need pipes is just because of the exact problem of the Unix philosophy—passing data as streams, plus another: Unix (C) programs’ main entrance has the fixed type “[String] -> Int”. You can see that from the main function of every C program. I think the need for pipes arise exactly because of the main function’s inability to take arguments other than strings.
We need pipes because by using pipes we can print into output streams and that can be redirected into another command as input stream. This equivalent to passing data through an argument if the main function can take a variety of argument types.
Similarly using PowerShell, we are assuming that commands produce an object (and not text) stream, which is still a stream. If we get rid of this restriction in the main function, we don’t need shells or pipes, or, PowerShell, any more.
April 15th, 2011 at 2:59 am
Pipes are not there because of the api, IMO, but because of the general usage pattern of shells. In APL, for example, the guaranteed right to left evaluation order generally makes it possible to pass data without using composition elements like pipes. On the other hand, the syntax requires certain restrictions to make this more feasible. In systems like Scheme, you don’t have those restrictions, and this makes the parentheses required. Either way, you have to have some syntactic method of controlling where a functions arguments end, and where the next call begins. In the example you gave, you just change the pipe syntax from a single character to a set of two parentheses. The pipe is still there, but the syntax is different. Whatever you decide to go with, you still have to have a means of specifying control flow.
April 15th, 2011 at 6:10 am
I don’t think this is just about syntax. To my understanding, using object pipes, there must be ONE upstream program which passes ALL output to ONE downstream program, but normal argument passing can be more flexible. To see it, how do you express the following code:
Cmd1(Cmd2(Cmd3(), Cmd4()), Cmd5())
in the object pipe syntax? I guess it can get quite awkward.
April 15th, 2011 at 1:46 am
Aaron,
Now let me answer your another question. Yes it is easy to write programs using text streams, but it is also easy to write programs in another way. Let’s imagine some editors which can manipulate ASTs directly and see how that brings benefits that a purely text-based editor cannot. The idea is inspired by Epigram (http://www.e-pig.org) which implements a type-directed semi-automatic programming environment.
To write a function, you press some hotkey or type some magic word such as “fun..”. The editor will recognize the programmer’s intention of writing a FUNCTION. It looks up the grammar of the language (assuming a language like Java) and finds that a FUNCTION can be constructed as follows:
FUNCTION ::= TYPE NAME [ARGUMENT] BODY
That is, a function contains a return type, a name of the function, and a list of arguments and a body. The editor will then go into a “function editing environment” in which it generates four “empty bubbles”, and prompts the programmer to fill in the contents. Because the [ARGUMENT] part is a list, its bubble contains a “extension block”, the programmer can click on the extension block or use another hotkey to extend the list.
The programmer use the TAB key or mouse cursors to move among the bubbles, and type into them. The bubbles will check the input according to the grammar for any incompatibility. Comments can be put almost anywhere by using a special hotkey and rendered differently. The editor constructs and manipulates an internal AST as the programmer enter into the bubbles. This AST is fully synchronized with the current screen rendering of the partial program.
The editor will type check the partial program that the programmer has entered and detect semantic errors. Once an error is found in a bubble, the contents in that bubble will be rejected and the bubble will turn red, prompting the programmer to correct the problem.
The editor is easily extensible to new languages by using BNF-like grammar files. When the programmer finishes editing, the internal AST is saved directly as an object in the standard binary format for ASTs.
April 15th, 2011 at 3:07 am
I have thought about these sorts of editors for a while, but I have not seen an implementation of them yet that actually improves my world.
Show me one that is actually more efficient in daily use than something like Emacs, NEdit, or Vi, and I will seriously consider this, because I feel like this is a real innovation that needs to happen, but I just haven’t seen anything that does it. We need more than just some minor AST editors that happent o compensate for poor language syntax that is hard to remember and use. We need something that fundamentally improves how we work with programs, and to me, this means rethinking how we think about syntax; something that an AST editor does not do.
April 15th, 2011 at 6:14 am
To get a feel of such an editor, you may want to try “paredit mode” for Lisp/Scheme (http://mumble.net/~campbell/emacs/paredit.el). It is actually doing some tricks on the text, but when using it you do feel that you are getting hold of the s-exps and not just text streams. It took some getting used to, but once you get used to it, you will never return to the normal scheme-mode. I can’t write Scheme programs without it now.
April 16th, 2011 at 8:08 am
Yin,
I have tried paredit, and it is a great editor, but it is nice because it doesn’t actually alter the way that we do code entry that much. It allows us to continue typing in a stream of characters and it just happens to provide additional structural editing commands. This is very nice, but it is also not what I would call a full blown structural editor. Indeed, Paredit does have problems because it isn’t a structural editing mode.
Regarding pipes, while the normal usage of pipes is a one input one output set, this is purely because this is how most compositions work. Like the composition function or operator in many programming language or math, this appears to be a common use case. That does not, of course, mean that you cannot pipe more data into a program through other means. A program can have a potentially infinite source of text streams on most UNIX shells if one is willing to encant the proper characters.
On the other hand, is this not beside the point? Isn’t the important thing here the switch away from text streams, not a switch away from unary procedures? It’s not a question of how many streams we have coming in, but of what types of streams these are. Here, I think my complaint still stands. It is still fairly easy in most languages to write text streams. It is not so easy to write and read structural data in most languages. This is not true for a language like Scheme, but for a paradigm like this to be really useful, it has to work across languages. How would we, for example, map a tree structure program onto APL data structures? How would the *shell* know what to do? I certainly don’t want to have to worry about that when I am writing my programs.
January 3rd, 2012 at 5:04 pm
[...] long been imagining a world where programs are not represented as text, but as data structures (see this post). They will be edited not with text editors, but with structural editors which creates and [...]
February 12th, 2012 at 3:54 pm
[...] Last Friday, I gave a PL-wonks talk on “Structural Version Control”. The talk was about the design of version control systems when Structural Programming becomes prevalent. By “Structural Programming”, I mean a new kind of programming environment where programs are no longer edited or stored as text, but directly as parse trees. I claim that this change will revolutionize our programming tools. I have reiterated this opinion in many of my other posts. [...]