My Little (String) Optimization, Part 2

Previously, I talked about how Clang is smart enough to optimize a series of comparisons against constant strings in C++ by starting out with a switch on the length. I left off with the idea that while this is good, you might be able to do better if your strings have a unique character at a certain offset. Today we’re going to see what that looks like.

My Little Optimization: The Compiler Is Magic

Today I had the idea to play with a pretty simple optimization problem: you have a string, and you want to see if it matches one of a set of known strings (down to the same Unicode codepoints, not caring about canonical equivalence). The naive way to do this would be to walk through and simply check every single string:

bool isOneOfTheStringsICareAbout(const std::string &s) {
  return s == "Battler" ||
         s == "George" ||
         s == "Jessica" ||
         s == "Maria" ||
         s == "Beatrice";
}