So You Want to Be a (Compiler) Wizard
A month or so ago, @__biancat (whose username I can’t help but read as “Bian-cat” even though it’s probably “Bianca T.”) suggested I write up some ideas for getting into compilers and programming languages.
It turns out I’m happy to expound on this, and it doesn’t need a formal CS education either.1 Unfortunately, pretty much all of them require some amount of free time. I’ll come back to that at the end of the post.
P.S. I know the title is a bit off-brand, but I couldn’t resist the reference.
These are things you can do on your own. I’ve arranged them roughly in order of difficulty and time commitment, although of course the language / environment you pick will affect things.
Learn regular expressions. This isn’t exactly a project, and it’s not how real compilers work, but regular expressions get you thinking about parsing and also about languages, and they’re handy to know regardless. I found Regex Crossword to be a fun way to practice.
Make a calculator. Not a graphical one, but something that can take input like “1+2*3-4” and produce “3”. Or even “5”, as a start. If you don’t feel like doing the parsing part, just build a tree-like data structure that can represent mathematical expressions and write an
evaluatefunction. A calculator is like a very tiny interpreter.
Make a text adventure. This is one of my favorite ways to play with a new language, but it’s also a lot like an interpreter. Just start with the basics: a bunch of places and descriptions, and a “read-evaluate-print loop” that handles directions. From there you can add other actions, items, whatever.
snprintfin C. For those who haven’t used C before,
snprintfis a function that produces formatted output based on an input string and a variable number of arguments. Doing it in C forces you to deal with constraints you may not have had to deal with in a higher-level language. Don’t skip out on writing unit tests! (And don’t bother with floating-point numbers; just handle
const size_t bufferLength = 128; char buffer[bufferLength]; snprintf(buffer, bufferLength, "%s %d %s", "first", 2, "last"); assert(0 == strcmp(buffer, "first 2 last"));
snprintfin assembly…for the exact same reason. Pretty much no one programs in assembly any more, and that’s generally a good thing, but this will (a) force you to learn a new and very suboptimal language, (b) get you to learn a little about your CPU2, and (c) help you later on if you ever need to debug a compiled program without debug info. Bonus points if you can get your assembly version to work correctly with C.
(This was an assignment in one of my lower-div classes at college.)
Learn LISP, Scheme, or Racket, then make a tiny Scheme interpreter. The benefit of doing this in one of these languages is twofold: first, the syntax is very simple, which lets you focus on semantics; and second, making an interpreter for a language in that language feels much more impressive than otherwise. (This is the fourth chapter of the classic programming book Structure and Interpretation of Computer Programs.)
Go through the Kaleidoscope tutorials. This is a walkthrough to build a real compiler for a tiny language, using LLVM. LLVM is the compiler “backend” behind Clang, the C compiler used by Apple and many others, and of course Swift as well, so it’s hard to say you’re not building a real compiler in this project. The original tutorial uses C++, but I know there are alternate versions of it for other languages around the internet.
Andi McClure has a good diagram about what LLVM provides for you on her blog.
At this point I’ve shaded off of generic projects and into specific ones, which I’ve talked about before.
Books and Online Resources
Contributing to Open Source
Personal projects are all well and good, but they’re a far cry from working on a large program with many contributors and lots of moving pieces. I definitely had a shock when I started my first internship, even after writing my own apps and working on plenty of projects for school, because you can’t possibly understand it all before you start making changes.
Don’t panic. That’s what good interfaces and good tests are for, and if either of those fails you, don’t be afraid to ask questions.
(There are also a lot of ways to contribute to a project other than writing code, but I’m going to assume you’re here because you want to write code. Otherwise you wouldn’t have stayed through the previous section.)
I’m not sure I have any specific advice that hasn’t already been said before somewhere on the internet (such as the site First Timers Only, or this Quora post), but here are some things I’d suggest keeping in mind:
Pick a bug and try to solve it. Look through the bug database and find a bug that you understand and that feels like it should be a small thing to fix. Some projects, including Swift, have dedicated “starter bugs” or “first-timers only” bugs that they expect someone with less experience to be able to tackle. You’re not limited to those, especially if you’re interested in a particular area of a project, but they’re a place to start looking.
When choosing a bug you might not even have looked at the code for that part of the project. That’s fine. Pick a bug, then jump into the code and go looking for the problem. If there’s something you don’t understand, but you don’t need it to fix the issue, feel free to ignore it. At the same time, though, a lot of bugs in a bug database are open because they’re trickier than they look, so be willing to set a bug back on the shelf if you’re not making progress.
Partial answers are better than none. While it’s not quite as satisfying as being able to close a bug, just taking, say, a crash, and figuring out why it’s happening can be incredibly useful, even if you don’t know how to fix it. (Example: “It crashes when I compile this file.” → “It’s crashing in
TypeChecker::doTheThing.” → “The type of some property is null.” → “The type gets reset to null by accident in
Talk to people. Ask for feedback on patches. Ask questions when you’re stuck. Most projects have mailing lists and/or chat rooms (IRC, Slack, etc.); subscribe / sign in even if you don’t have a question at the moment. Answer other people’s questions when you can.
Don’t forget to be polite. In general, learn how to get answers, and please don’t ask a question before looking for an answer in the obvious places, like the documentation.
There’s a hidden motive here: regular contributors to the project recognize names that come up regularly. Asking questions demonstrates interest; contributing back demonstrates capability and value.
If you don’t get an answer, please don’t be offended. Not only are open-source contributors busy people (like everyone else), but usually participation on mailing lists, forums, and chat rooms isn’t an official part of their duties. It’s usually acceptable to “ping” an email after several days or a week to see if anyone has time now.
Open source projects are big. I said this already, but don’t get stuck on the size of the project. I’ve been working on Swift for years, and Clang for years before that, and there are still big chunks of both compilers that I don’t understand. You start in one area, and you learn what you need to as time goes on.
Don’t forget to follow contributor guidelines. It’s one of the more mundane things about working on a project with other people, but if you don’t follow the contribution process and coding standards of the project you’re working on, other people are less likely to take you seriously. (They’ll feel a little like you’re wasting their time.)
If you’re a student, consider Google Summer of Code. In short, if you can come up with a good project idea and convince an open-source project to take you on, Google will pay you to work on it for a summer. It’s like a lightweight, remote internship. (This is actually how I got into compilers, by working on the Clang Static Analyzer for GSoC. Not that I wasn’t already interested in programming languages.)
GSoC also isn’t the only non-internship game in town, so look around. Your particular project might even have their own program.
On “Free Time”
The sad thing is, there are 10 kinds of open-source developers: those who do open-source work for a company, and those who do it in their spare time. For those in the former group, open-source work is mostly the same as any other software work: we get paid for it, and we’re not expected to do it outside of work.
If you’re reading this post, you might be in that group—say, if this is your first compiler job—but more likely you’re looking to switch. That means you’re expected to do all of this in your free time, on top of your normal job and any other responsibilities or outside interests you might have.
I want to explicitly recognize this as a form of privilege. The people most likely to have spare time to do outside projects are people
- who don’t have families to take care of (kids, parents, whatever)
- who have a good, steady income (i.e. not learning while, say, balancing two part-time jobs)
- who live close to work (minimizing a commute)
etc. Additionally, people who’ve been programming for a long time will likely have an easier time picking up one more thing, meaning the process is also biased towards those who had the opportunity to learn to program when they were younger. In the US, that means men more than women, upper-class more than lower-class, and white more than black.
All of that is institutional bias, even before taking into account the individual, usually-unconscious biases that plague our industry (particularly from “people who look like me”)…not to mention Imposter Syndrome. And it stinks. But it can’t stop you—and rather than take my word for it, you can read Kronda Adair’s “Dear Marginalized People Coming Into Tech”.
Oh, one more thing to look for is that the project has community guidelines and a code of conduct. That doesn’t automatically mean there won’t be conduct problems, but it at least indicates that the community is considered important, not just the product.
This section is a short and simplified look at some of the problems with today’s open source model. For more depth, check out Ashe Dryden’s “The Ethics of Unpaid Labor and the OSS Community”.
Like last time, I’d love to hear of other good resources for starting out on compiler work, or for first-time open-source contributors. Comment below or on Twitter, and I’ll add them here. I also enjoy talking to people about their own experiences, and I’m happy to help out with non-technical questions on Twitter as well. (Technical questions should usually stay on the mailing lists or review discussion, so that other people can see them.)
Good luck, have fun. We can’t wait to have you.
I have a B.A. in Computer Science, and took just one upper-division class about compilers and one grad class about programming languages (though some of my lower-division classes did have compiler-related work in them). My coworker Joe Groff doesn’t have any formal education, and he’s one of the best compiler people I know. ↩︎
In reality, CPUs have tons of tricks up their proverbial sleeves, which means even assembly doesn’t correspond precisely to how the program actually gets run. But it gives you an idea of common denominator capabilities, anyway. ↩︎