Half of what I say is meaningless
But I say it just to reach you, Julia.
-The Beatles. Julia. 1968

The forthcoming Julia programming language has scientists, number crunchers, and all kinds of data junkies excited over its promise to bring expressive and flexible syntax without compromising performance. Traditionally, numerical programming has been done in either

a) compiled programming languages like C and Fortran that offer top tier performance at the cost of additional complexity and greater learning difficulty, or

b) interpreted languages like MATLAB, Python/NumPy, or R that offer an expressive syntax friendlier to scientists, but often at the cost of performance.

Broadly speaking, the crux of the matter is that programming languages in class (b) “interpret” each line of code dynamically as the program runs its course, every time the program runs. In contrast, languages in class (a) require an additional compilation step before the program runs the first time, in which the compiler produces highly optimized machine code.

With compiled languages, the compiler is able to perform a static analysis of the code that allows many optimizations which are fundamentally not possible in an interpreted language, such as being able to assume the type of a particular variable (i.e., is variable “foo” an integer, or is it a string? and so on) throughout the life of a program. It is this “interpreter overhead” that causes programs written in class (b) languages to often run much slower than class (a) languages, unless care is taken to write programs in a particular style that avoids such overhead.

What is meant by taking care to avoid interpreter overhead? This is a complex topic, but one of the most important concepts is that iterating over large arrays using “for” loops is slow. Take for example, the following Python functions that sums up the elements of an array, named “a”:

def array_sum_slow(a):
    total = 0
    for i in range(len(a)):
        total += a[i]
    return total

 

Running on a Macbook Pro over NumPy arrays of 1 million elements, that function runs in about 0.2 seconds. How can we speed that up? The answer is the same in Python/NumPy as it is in most numerical programming languages: we must “vectorize” the code. This simply means that we use functions built into the language that are designed to operate over an entire array, instead of a single element. These functions are often implemented under the hood in lower level programming languages to achieve performance, but those details can be happily ignored by the Python programmer. The vectorized version looks like this:

def array_sum_fast(a):
    return a.sum()

 

Now we accomplish the same work in 0.001 seconds, better than a 100x speedup. Nice!

So that’s great, and it explains the popularity of languages like Python/NumPy and MATLAB with numerical programmers over the years. You get the expressiveness of the higher level language without the performance overhead, as long as you follow some basic coding style principles to achieve good performance.

So what’s the problem? Why isn’t that good enough?

Well, for the simple case of summing the elements of an array, it is good enough. But what happens when the particular array manipulation we need to do is too difficult to express in a vectorized way, and could be written much more easily using iteration? Or perhaps more importantly, what happens when NumPy or other open source libraries don’t offer a vectorized function to perform the particular task at hand?

The answer is that you must now implement your own library. In the world of Python/NumPy, that means either writing quite complex C code, or learning to use a special tool like Cython which can simplify the task but still adds complexity.

This is where Julia becomes exciting. Because Julia uses a hybrid design that is in between compiled and interpreted (known as Just In Time compilation, or JIT), when you want to write a performant library in the Julia ecosystem, you just keep writing Julia! No other languages or tools necessary.

(Note, another approach to the same problem is to re-implement the Python programming language using a JIT design, which is what the PyPy project has accomplished, but here we will focus on Julia).

The details of JIT compilation are complex, but in essence it is a blend between compiled and interpreted designs that essentially produces optimized machine code “on the fly”. No separate compilation step is necessary, and the language syntax can achieve much of the simplicity of an interpreted language while at the same time coming close to the performance of compiled languages like C.

So, let’s test this out. Let’s pretend existing image filtering libraries aren’t giving us quite what we want, and we decide to implement our own. Perhaps our first task is to implement a simple image blur kernel. This involves multiplying the 9×9 set of pixels surrounding each pixel in an image array by a 9×9 “kernel”, and replacing the value of the current pixel by the result of this operation. It’s probably possible to write a vectorized implementation of this, but the iterative solution is likely easier to write for most of us.

Here is one possible iterative solution, in Python:

def convolve_slow(image, kernel):
    result = image.copy()
    d = image
    for i in range(d.shape[0]):
        for j in range(d.shape[1]):
            try:
                cur = np.array(
                        [ [d[i-1, j-1], d[i, j-1], d[i+1, j-1]],
                          [d[i-1,   j], d[i,   j], d[i+1,   j]],
                          [d[i-1, j+1], d[i, j+1], d[i+1, j+1]],
                        ])
                result[i, j] = (cur * kernel).sum()
            except IndexError:
                pass # skip boundaries
 
    return result

 

Running over a 900×450 image, that solution takes a whopping 13 seconds!

Here is the result of our blur filter. The image is a global infrared satellite composite from the AVHRR imager aboard the NOAA-19 satellite, courtesy of the UW-Madison CIMSS Climate Data Portal.

Here is the original image:

Here is the blurred image:

So the algorithm is slow, but it did do its job!

Since we’ve only implemented a basic blur filter, we can directly compare it to the highly optimized Python library function “scipy.ndimage.convolve”, which accomplishes the same thing in only about 0.02 seconds.

But we want to implement our own image library, not just use someone else’s, remember? In Python, this is where we start learning C or digging through the documentation of helper tools like Cython or Numba.

The promise of Julia is that the solution we just wrote could be performant enough to be our final implementation. We just write the easy, iterative solution (without switching to a different programming language!) and we are done.

How does that work in practice using the latest version of Julia (v0.5.0)? Here is the simple, iterative solution in Julia:

function convolve_slow(image, kernel)
    result = copy(image)
    d = image
    for i = 2:size(d, 1) - 1
        for j = 2:size(d, 2) - 1
            cur = [ d[i-1, j-1] d[i, j-1] d[i+1, j-1];
                    d[i-1,   j] d[i,   j] d[i+1,   j];
                    d[i-1, j+1] d[i, j+1] d[i+1, j+1]
                  ]
            result[i, j] = sum(cur .* kernel)
        end
    end
    result
end

 

On the same 900×450 image, that runs in about 0.38 sec, or about 30x faster than the Python version (that is after letting the JIT interpreter “warm up”, i.e. we run the code several times to allow the interpreter to produce optimized machine code)! That’s great! But, it’s still not nearly as fast as the highly optimized library version of the same image filter. And that probably shouldn’t surprise us too much, as image filtering is a common numerical computing task whose libraries have been optimized over decades.

So, in this extremely simple comparison, we find that the current implementation of the Julia programming language is beginning to come through on its promise of providing great performance without forcing the programmer to make compromises like implementing libraries in lower level languages or being limited to a “vectorized” coding style. That said, there is definitely still room for improvement as we approach the first stable release of the language.

Lastly, don’t take our word for it! Download Julia, try it yourself, and let us know what type of benchmarks you get yourself! You can also try out the code used for this blog post by cloning this GitHub repository.