That Better Way... D?

June 10, 2012
categories: languages, programming

Although I'm a plasma physicist by training, I spend the majority of my research time at the moment programming whilst waiting for simulations to run. This is almost all numerical: A fair sized simulation code which runs on supercomputers like HPC-FF, and a bunch of smallish (< 2000 line) codes to produce inputs and process outputs into pretty pictures. Most of this is in C++, with a fair bit of IDL and Python. Whilst I'm generally happy with these tools, like many people I'm also unable to resist a shiny thing and am always searching for that elusive Better Way. This time I may have found it, and it's called D.

Any discussion with physicists about programming always comes around to Fortran. It is very widely used in scientific, and particularly High Performance, computing: many researchers use it in my department, and it's taught to undergraduates and graduate students (I teach the latter, for my sins). I don't really want to add to the endless C++ / Fortran debate, but that's not going to stop a good rant so here's my 2 cents anyway.

There is no doubt that Fortran is outstanding in it's number crunching domain: The Fortran standards committee consider this their main audience, built-in support for multidimensional arrays makes implementing many mathematical algorithms straightforward (particularly linear algebra), and given its long history there are a lot of high quality libraries to choose from. It is also (in my experience) much easier for a novice physicist to learn Fortran than C or C++. The terrible Fortran codes written by scientists which give the language a bad reputation are mainly due to lack of training, rather than the language. Computed GOTOs and COMMON blocks were bad enough, but I shudder to think what could have been perpetrated if C++ had been used instead.

So if modern Fortran is so amazing, why don't I use it much? Short answer is I find it clunky and just plain ugly on some reptilian level. Long answer is that whilst Fortran is great in its limited domain of numerical array manipulation, it is lacking many features for reusable and maintainable software development. Maybe I've just been spoiled by C++ and Python (there's little spoiling to be had in IDL), but I'm reluctant to give up so many of the language features I've got used to. It's true that modern Fortran is a vast improvement on the old FORTRAN, and Fortran 2003 finally introduced Object Oriented Programming with inheritance and polymorphism. Unfortunately, there is still no freely available compiler which implements all of the features of F2003, although gfortran comes close. Even with the latest standard, Fortran 2008, there is still no support for modern features like try...catch style exception handling (GOTO "exception" handling is just horrific), functional tools like lambda functions and closures, or templates.

Yes, that's right. Templates. They're a poor man's metaprogramming tool compared with what's available in other languages like Lisp or Haskell, but the ability to write code once which will operate on arbitrary user-defined types is very powerful. They have enabled large libraries of generic code to be written like the STL and Boost. Whatever the shortcomings of the STL, these libraries greatly improve productivity as we no longer need to reinvent our own flawed version of the wheel all the time.

A very simple example of templates is a function to swap two values. In C++ this can be implemented as

template<class T>
void swap(T &a, T &b) { 
  T c(a);
  a = b;
  b = c;
}

which will automatically create a new swap function for any new type it's given. There's no need to re-write the same function over and over again with different types, but specialised functions can still be written for optimisation if needed. This is using templates as a sophisticated C macro system; the real power comes with (mis-)using templates for things like template expressions, which enable matrix performance comparable to Fortran for example using the Blitz++ library, not to mention the mind-bending techniques described in Modern C++ design.

Unfortunately, all this power comes at a terrible price. One of the biggest problems with C++ templates is that it's very easy to write code which is completely incomprehensible. More seriously, users are exposed to the internals of template libraries, rather than a nice interface. This can be illustrated by trying to compile the following example, which applies the STL sort algorithm to a vector (dynamic array) of a type "mytype" which we define:

#include <vector>
#include <algorithm>
using namespace std;

struct mytype {
  int data;
};

int main() {
  vector<mytype> vec;
  sort(vec.begin(), vec.end());   
}

Using g++, this produces several pages of error messages, starting

In file included from /usr/include/c++/4.6/algorithm:63:0,
                 from test.cxx:2:
/usr/include/c++/4.6/bits/stl_algo.h: In function ‘void std::__insertion_sort(_RandomAccessIterator, 
_RandomAccessIterator) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<mytype*, std::vector<mytype> >]’:
/usr/include/c++/4.6/bits/stl_algo.h:2181:4:   
instantiated from ‘void std::__final_insertion_sort(_RandomAccessIterator, _RandomAccessIterator) 
[with _RandomAccessIterator = __gnu_cxx::__normal_iterator<mytype*, std::vector<mytype> >]’
/usr/include/c++/4.6/bits/stl_algo.h:5409:4:   instantiated from ‘void std::sort(_RAIter, _RAIter)
[with _RAIter = __gnu_cxx::__normal_iterator<mytype*, std::vector<mytype> >]’
test.cxx:11:30:   instantiated from here
/usr/include/c++/4.6/bits/stl_algo.h:2107:4: error: no match for ‘operator<’ in ‘__i.__gnu_cxx::__normal_iterator
<_Iterator, _Container>::operator* [with _Iterator = mytype*, _Container = std::vector<mytype>, 
__gnu_cxx::__normal_iterator<_Iterator, _Container>::reference = mytype&]() 
< __first.__gnu_cxx::__normal_iterator<_Iterator, _Container>::operator* [with _Iterator = mytype*, 
_Container = std::vector<mytype>, __gnu_cxx::__normal_iterator<_Iterator, _Container>::reference = mytype&]()’
...

which is apparently caused by an error on line 2181 of the library file stl_algo.h. As a user of the library I have no idea what that does, and little desire to go digging through it to find out. Buried in the noise is the useful information "no match for ‘operator<’ in ...", which is hinting at the real problem: We have created a type but not defined a function to test if one instance is less than another, and so sorting them doesn't make sense.

This is a pretty simple example; trying to decipher an error message when more complicated template manipulations go wrong can make you hate your life. Surely there has to be a way to enjoy the advanced features of C++ without the pain? Surely there must be a better way?

The D language

My favourite Better Way at the moment is D. Walter Bright and Alexei Alexandrescu have taken the mishmash of legacy and experiments that is C++, redesigned it, and made it Great. Not only does D have a lot of nice new features (some of which are in C++11 in a more cryptic form), but it is much more coherent, clear, and intuitive than C++. D blurs the line between compile-time and run-time with features like mixins, compile-time evaluation, and dynamic function dispatch, making code which would be very hard (or impossible) to write in C++ quite straightforward. The result is a language with all the advantages of a compiled language and strong type system, and much of the flexibility of a dynamic language like Python. This may sound overblown, and it's not all good, but it certainly is a breath of fresh air compared with C++. As an example of how templates have been re-designed, the original swap example doesn't change much:

void swap(T)(ref T a, ref T b) {
  T c = a;
  a = b;
  b = c;
}

The most noticeable difference is the lack of < > brackets, which were a terrible choice and have caused no end of trouble. Instead, round brackets and !() are used. Types can be inspected easily at compile time, making much of the gynmastics needed in C++ templates unnecessary. Crucially, the error message problem is fixed. The equivalent of the sort example in D is

import std.algorithm;

struct mytype {
  int data;
};

int main() {
  mytype[] vec;
  sort(vec);
}   

Not only is the code clearer and more concise, but the DMD and gdc compilers produce the error message

Error: static assert  "Invalid predicate passed to sort: a < b"
test.d(9):        instantiated from here: sort!("a < b",cast(SwapStrategy)0,mytype[])

which I think you'll agree is much easier to understand than the C++ equivalent. The compiler can do this because templates can specify conditions that their arguments must satisfy, and can be checked by the compiler. In this case that there must be an "a < b" operation which produces a true/false answer. This makes debugging template code much easier, and means that I can use the sort function without being exposed to its internals when something goes wrong. A similar functionality was at one time proposed for C++11 in the form of Concepts, but was dropped from the standard.

Or maybe not

The flaw in all of this is that what makes a language useful is not just nice syntax (and D is very nice), but libraries. D can call C code which helps a lot, but the process is not entirely straightforward: Most C libraries use macros to a greater or lesser degree, and these must be translated into D functions. A bigger issue at the moment is that there is only limited support for multidimensional arrays, either built-in or through libraries like Blitz++. There are efforts to develop scientific libraries for D such as SciD, and my own attempt at HPC for D. I just hope these improve fast enough, otherwise I might be forced to admit that Fortran is the Better Way after all.