A few weeks ago I asked people a question on Twitter:
So, for people working on developer tools (compilers, debuggers, IDEs, many other things), how did you get into it?— Jordan Rose (@UINT_MIN) November 2, 2015
I got many interesting responses, but one of them asked if I had any recommendations for books on compilers.
Oh gosh. Books? Books about computer science? I’m actually not very well-read: while I’ve been programming since I was a kid, I “only” have an undergraduate formal education. That means the set of books I’ve read about CS is rather haphazard. But I suppose I do have a few things that come to mind.
Abelson and Sussman’s Scheme-based textbook was originally designed for the first computer science course you’d take at MIT; when I was at Berkeley, it was the first course you’d take once you knew a bit about “programming”. SICP (pronounced “sick-P”) is divided into five chapters that build a solid foundation for many of the main ideas in programming. Chapter 4 is all about interpreters, beginning with a Scheme interpreter written in Scheme and then moving into several interesting variations. Chapter 5 takes this further by describing an abstract machine, and then turning the interpreter into a compiler for this machine.
If I had to choose the most important message from SICP (and from the Berkeley “61” series of lower-division CS classes), it’s that nothing is magic. Scheme is a small language, and yet you can use its relatively limited primitives to accomplish things that would be entire features on their own in a larger language like Swift.
(If I had to pick a favorite topic in the book, though, the metacircular evaluator—the Scheme-in-Scheme interpreter—loses out to streams, the book’s treatment of lazy-but-infinite sequences. This was my first encounter with infinite data structures.)
SICP alone is very dense reading, so you may instead want something to follow along. Berkeley’s old “CS61A” course materials are still online, as is the new version of the course that uses Python. And finally, though I’m not sure it’s still active, you can also check out Understudy, an iOS app that tries to pair newer students with those further along in a course. Think of them as your Scheme senpai.
Meyer’s Object-Oriented Software Construction is the first book I read that was really about programming language design, and I think it may still be the only book I’ve encountered so far that really talks about programming language design. The book walks through the creation of a language from the ground up, justifying each feature along the way. There’s some talk about how to use the features as well—it is called “OO software construction”, after all—but what got me hooked was the language design parts. I may not agree with all of the choices, but for once someone was bothering to explain each one, in a way that didn’t depend on external literature or history.
(I haven’t picked up a copy of OOSC in almost a decade, so forgive me if I’ve misremembered some of the details.)
There are lots of books on compilers out there, but most of them really are about compilers: different techniques for parsing source text into abstract syntax trees (“ASTs”), a discussion of the formal semantics of various constructs (“semantic analysis”), and then a lowering to machine code (“code generation”). All of this is important, but (a) not all of it is practical in real compilers1, and (b) it felt like there was no sense of design; it’s just different techniques for getting from point A to point B. Problem-solving.
I don’t want to slander compiler work; I was just tired of the tried-and-true academic approach to compilers. And it turned out I was more interested in program semantics and language design anyway.
So let’s circle back to compilers. The LLVM project has a series of tutorials on how to build a toy language called “Kaleidoscope” on top of LLVM’s libraries. This allows Kaleidoscope to be compiled to machine code or even immediately executed; it’s the same way that Clang and Swift are built on top of LLVM.
This isn’t a book, but a great thing about the tutorial is it shows how “writing a compiler” isn’t some gargantuan task beyond the abilities of mere mortals. It is in C++, though, so you’ll have to get used to that if it’s not a language you’re familiar with.
There are probably other compiler tutorials out there that are even more compact and/or even more accessible, possibly by virtue of not using C++. Kaleidoscope’s just the first one that comes to mind, but because it sits on top of LLVM and generates actual object files it’s well defended against criticisms that this isn’t what real compilers are like.
I’m sure there are more great books about both programming language design and compilers and other developer tools, not to mention myriad interesting papers on compiler research. And then of course there’s the rest of the internet. Here are a few things that didn’t get full coverage above:
From Mathematics to Generic Programming: What OOSC does for language design, FM2GP does for library design: it builds several algorithms from the ground up in simple C++ and shows how to make them both reusable and efficient…weaving in both math and history of math along the way. It’s not a light read, and we’re getting further from the original question about “compilers”, but I will say that understanding the notion of concepts in particular is a big part of understanding why the Swift standard library is designed the way it is, not just C++’s.
Lambda the Ultimate: A blog aggregating interesting articles about programming languages from across the internet. (Warning: often fairly technical and on the theory side of things.)
Embedded in Academia: John Regehr is a CS professor who’s well-involved in the LLVM projects; he often posts about interesting features of C and C++ compilers (and the languages themselves).
NSBlog: Mike Ash’s blog isn’t really about compilers, but his “Friday Q&A” series does talk a lot about how various things are implemented, delving into libraries and language runtimes in detailed but accessible depth.
But I’d also love to know what language and compiler resources you’ve found interesting, so I can pick them up. Comment below or on Twitter, and I’ll add them here!
Several people mentioned “the Dragon Book”, which can refer to either Principles of Compiler Design or the newer Compilers: Principles, Techniques, and Tools. The former is actually one of the books I considered overly academic and theory/structure-based, but it’s also probably at the top of that list. At some point I’ll psych myself up for a reread.
John Regehr (mentioned above) and a few others suggest Muchnick’s Advanced Compiler Design and Implementation. (There doesn’t seem to be a website for this book, but it’s still being sold and I assume you can find it in academic libraries.)
Mastenbrook also recommends Friedman and Wand’s Essentials of Programming Languages.
In the comments below, Chris suggests Wirth’s Compiler Construction (PDF) over the Dragon Book.
Chris also points out Linkers and Loaders, which I recommend as well. Great little book about static and dynamic linking, and you can read it all for free online if you want (in draft form).
If you ever take a college course on compilers, you will almost certainly be introduced to
yacc(or their GNU equivalents,
bison). At a high level, these tools take an abstract grammar and generate a fairly efficient parser from it. However, both Clang and Swift use hand-written “recursive descent” parsers, basically the most dead-simple, inefficient parsing algorithm you can think of. Why? Three reasons, I think:
- The “efficiency” gains we’re talking about hardly ever apply in practice. When they are relevant, they’re easy enough to special-case.
- The languages we’re parsing are messy and don’t fit into a tidy grammar.
- Most importantly, a generated parser is great for perfect input, but doesn’t handle incorrect or incomplete code very well. A good compiler should have great error handling. (Insert your own dig at Swift here.)
An undergrad college course will give you a good understanding of how we want things to work, but as with pretty much every area of CS (and probably most majors outside of CS as well), the real world turns out be more complicated. Optimizing for the perfect case is, IMHO, the wrong tradeoff. ↩