Category Archives: algorithms

Why is indexing faster than binary search

We all know that indexing into an array takes only O(1) time, while searching for a number in a sorted array takes O(n) time with linear search, and O(log n) time with binary search. But why is indexing so fast? What is the secret sauce?

The reason is really about the nature of indexing — how it is implemented in a circuit. In order to illustrate this, let me show you an “addressing circuit”.

addressing cuicuit

Here, A and B are the two-bit address lines, they represent the indices: 00, 01, 10, 11. The output Z, Y, X and W are the selectors of the items in the array. Notice that an output selector is enabled only when both of the input lines of the corresponding AND gate is “1”.

Now, ignore the input B and just look at A. See how its signal flows through the direct wires and the inverters. We can make the following observations:

  • When A is “1”, then the AND gate of X and W will receive a “1” on one of their input ports, while the AND gate of Z and Y will receive a “0” on one of their input puts.
  • On the other hand, if A is “0”, then the AND gate of X and W will receive a “0” on one of their input ports, while the AND gate of Z and Y will receive a “1” on one of their input puts.

From the above, I hope you have seen that the value of A partially selects half of the AND gates — it is either the set {X, W} or {Z, Y}. By “partially select”, I mean they are not fully selected, because we haven’t taken B into account. At this point, I hope you have noticed that A is in fact doing one step of a “binary search”.

Now we do a similar thing, but this time focus on B and ignore A. You should see a similar thing: depending on the value of B, either we partially select {Y, W}, or we partially select {Z, X}. So we can also think of B as doing one step of a “binary search”.

Now, we see that A and B are each a step of a binary search, and it is interesting to see that B’s selection will cut A’s selection evenly, whether A is 0 or 1. We can say the same thing vice versa: A’s selection will cut B’s selection evenly, whether A is 0 or 1.

Also notice that the selection of A and B can happen at the same time. That means, when they work simultaneously, it takes just O(1) for a binary search through an array of length 4. If we generalize this circuit to N bits of input, then within O(1) time, we can do a binary search through an array of length 2N.

This explains why indexing an array is faster than binary search, because it is a parallel binary search where (log n) steps happen at the same time.

Comments Off on Why is indexing faster than binary search

Posted by on April 2, 2013 in algorithms, architecture, concurrency


A Concise Solution to the P=NP? Problem

To solve P=NP? is to answer the following question:

Can we solve strictly more problems with a computer that does not exist?

Thus we have proved that this problem is nonsense. QED.

Seriously, this blog took me years to conceive, and a week to write. Please don’t take it lightly. If you need a more “authoritative” proof, please refer to this.

Comments Off on A Concise Solution to the P=NP? Problem

Posted by on March 27, 2013 in algorithms, complexity


ydiff: a structural program comparison tool

(Click on the above picture to see it in action. See the end of the post for more demos)


I have been imagining a world where programs are not represented as text, but as data structures. They will be edited not with text editors, but with structural editors, which create and manipulate the abstract syntax trees (AST) directly. In such a programming environment, the line-by-line diff utilities and version control tools will stop working, because there are no longer “lines” or “words” in programs.

ydiff is a proof-of-concept for handling this situation. It is a diff tool designed for “structural programming”.

Currently ydiff takes pains to parse program text into ASTs, but hopefully in the future programs will be stored directly as data structures so that the parsing step can be skipped. This will enable this kind of tool to extend to all programming languages effortlessly.


  • Language-aware. ydiff parses programs, understands basic language constructs and will not make non-sensical comparisons. For example it will not compare a string “10000” with an integer 10000 even though they look very similar. Also, it tries to match functions with the same name before it attempts to destruct and compare functions of different names.
  • Format insensitive. The comparison result will not be affected by different number of white spaces, line breaks or indentation. For example, ydiff will not produce a large diff just because you surrounded a block of code with if (condition) {…}.
  • Moved code detection. ydiff can find refactored code — renamed, moved, reordered, wrapped, lifted, combined or fragmented code. Refactored code can be detected however deep they are into the structures.
  • Human-friendly output. The output of ydiff is designed for human understanding. The interactive UI helps the user navigate and understand changes efficiently.

These properties make ydiff helpful for understanding changes. It may also be possibly used for detecting plagiarism in programming classes or copyright infringement of code. For large-scale use cases you may be more interested in MOSS, but ydiff is fundamentally more accurate because it parses programs.


Here are some interactive HTML demos with a pretty nice UI design. The left and right windows are always locked in their relative position. A mouse click on changed, moved or unchanged nodes will highlight the matched nodes and scroll the other window to match. After that, the windows will be locked into their new relative position for browsing.

Okay, here are the demos.

  • Scheme Demo1. ydiff’s algorithm diffing itself (between the first version on GitHub and the latest version).
  • Scheme Demo 2. Comparison of the original miniKanren from Professor Dan Friendman and the version I modified in order to support condc, a “negation operator”. Pay attention to the function unify, whose major part is moved into unify-good.
  • Emacs Lisp. Comparison of two versions (v20 and v23) of Taylor Campbell’s paredit-mode, a structural editing mode of emacs for Lisp programs.
  • Clojure. Compare the first commit of Typed Clojure with its recent version.
  • Arbitrary S-expressions. Trying to find a bug in an optimizing pass of my Scheme compiler by comparing the IRs (represented as S-expressions).
  • Python. ydiff has a completely separate implementation in Python (named “PyDiff”), which can diff two Python programs. This is a comparison of two implementations of a small list library that I wrote, which implements Lisp-style lists. The first implementation uses recursive procedures while the second uses Python’s generator syntax and is iterative. Pay some attention to append, whose code is moved inside another function appendAll.
  • Javascript. Comparison between two major revisions of the UI code of ydiff itself.
  • C++ demo1 and C++ demo2. There are two demos for C++. The first demo compares two versions of the d8 Javascript debugger from the V8 project(v3404 from 2009 and v8424 from 2011). The second demo compares V8’s simulators for two different processors (MIPS and ARM).The d8 demo is especially interesting because by clicking on the lines of the method Shell::Initializein the old version, it can be clearly observed that its body has been distributed into several procedures in the new version:

    Also the major part of Shell::Main is moved into the helper Shell::RunMain.

Get the code

ydiff is an open source project. You can follow its development on github: yinwang0/ydiff or get its source code from there.

Comments Off on ydiff: a structural program comparison tool

Posted by on January 3, 2012 in algorithms, programming languages, tools


PySonar: a type inferencer and indexer for Python


PySonar is a type inferencer and indexer for Python. It includes a powerful type system and a sophisticated inter-procedural analysis. Compared to style-checking tools or IDEs, PySonar analyzes programs in deeper ways and produces more accurate results. PySonar resolves more names than typical IDEs. The current resolution rate is about 97% for Python’s standard library.


To get a quick feel about what PySonar can do, here is a sample analysis result for a small fraction of Python’s standard library.

What’s in there

  1. A powerful type system. In addition to the usual types you can find in programming languages, PySonar’s type system has union types and intersection types — two of the most powerful elements I have found during my PL research. They are rarely found in programming languages. I know of only two languages with statically checked union types: Typed Racket and Ceylon. Different from these languages, PySonar can work without any type annotations. It infers all the types by doing inter-procedural analysis.
  2. Control-flow aware interprocedural analysis. Because Python has very dynamic and polymorphic semantics and doesn’t contain type annotations, a modular type inference system such as the Hindley-Milner system will not work. I actually implemented a HM-like system in the first version of PySonar, but it didn’t work well. As a consequence, all types are inferred by an inter-procedural analysis which follows the control-flow and some other aspects of the semantics.
  3. Handling of Python’s dynamism. Static analysis for Python is hard because it has many dynamic features. They help make programs concise and flexible, but they also make automated reasoning about Python programs hard. Some of these features can be reasonably handled but some others not. For code that are undecidable, PySonar attempts to report all known possibilities. For example, it can infer union types which contains all possible types it can possibly have:
  4. High accuracy semantic indexing
    PySonar can build code indexes that respects scopes and types. Because it performs inter-procedural analysis, it is often able to find the definitions of attributes inside function parameters. This works across functions, classes and modules. The following image shows that it can accurately locate the field x.z which refers to the “z” fields in classes B1 and C1, but not A1.


The code is open source from my GitHub repository.


Here are some of PySonar’s users:

  • Google. Google uses PySonar 1.0 to index millions of lines of Python code, serving internal code search and analysis services such as Grok and Code Search
  • SourceGraph. SourceGraph is a semantic code search engine. They use PySonar to index hundreds of thousands of opensource Python repositories. They started using PySonar 1.0 as the Python analysis for their site. I recently joined them and finished integrating PySonar 2.0
Comments Off on PySonar: a type inferencer and indexer for Python

Posted by on September 12, 2010 in algorithms, programming languages, semantics, static analysis, types


Tags: , , , ,