My Little (String) Optimization, Part 2

Previously, I talked about how Clang is smart enough to optimize a series of comparisons against constant strings in C++ by starting out with a switch on the length. I left off with the idea that while this is good, you might be able to do better if your strings have a unique character at a certain offset. Today we’re going to see what that looks like.

Once again, the goal is to take something like this:

bool isOneOfTheStringsICareAbout(std::string_view s) {
  return isInSet(s, "Battler", "George", "Jessica", "Maria", "Beatrice");
}

and end up with generated code that looks like this:

bool isOneOfTheStringsICareAbout(std::string_view s) {
  if (s.size() < 5) { return false; }
  switch (s[2]) {
  case 't': return s == "Battler";
  case 'o': return s == "George";
  case 's': return s == "Jessica";
  case 'r': return s == "Maria";
  case 'a': return s == "Beatrice";
  default: return false;
  }
}

That is, we only want to write the name of each string once, and we don’t want to hardcode which offset to check. My idea for how to approach this went something like this:

  1. Write a function to check if the character at offset I is unique for every string.
  2. Using this, find the first such I where this is true.
  3. Using this, do the comparison like last time, checking the Ith character.

The first step went smoothly. That code looks like this:

template <size_t I, size_t N>
__attribute__((always_inline))
static constexpr bool isUniqueCharacter(const char (&first)[N]) {
  return true;
}

template <size_t I, size_t N, size_t N2, size_t ...RestN>
__attribute__((always_inline))
static constexpr bool isUniqueCharacter(const char (&first)[N],
                                        const char (&next)[N2],
                                        const char (&...rest)[RestN]) {
  static_assert(I < N - 1, "too big");
  if (first[I] == next[I]) {
    return false;
  }
  return isUniqueCharacter<I>(first, rest...) &&
         isUniqueCharacter<I>(next, rest...);
}

This is a very silly algorithm that checks every string against every other string. If I wanted something that was going to be computed at run time, I’d do something cleverer, like setting bits in a 256-bit value, to see if there were any collisions. However, I’m still in the “getting something working” stages, and this is also going to run at compile-time anyway.

The next step was a little harder.

template <size_t I = 0, size_t ...Ns>
__attribute__((always_inline))
static constexpr size_t
uniquelyIdentifyingCharacter(const char (&...strs)[Ns]) {
  if constexpr (I + 1 >= std::min({Ns...})) {
    return std::string_view::npos;
  } else {
    if (isUniqueCharacter<I>(strs...)) {
      return I;
    } else {
      return uniquelyIdentifyingCharacter<I+1>(strs...);
    }
  }
}

If you just ignore that “if constexpr”, this function isn’t too complicated. “If the index we’re trying to look at is at the end of any of the strings, give up and return npos. Otherwise, if it’s a unique index, great! And finally, try the next index.”

So what’s the if constexpr for? That’s a C++17 feature that makes sure the compiler only looks at one branch of the if. And that’s important because of that static_assert we had back in isUniqueCharacter—without if constexpr, the compiler would try to generate isUniqueCharacter<I> even if I was well out of bounds. Heck, it would try to generate uniquelyIdentifyingCharacter<I+1> too—recursion with no base case!

There are ways in earlier versions of C++ to deal with this, but they’re all clunkier and I always have to look up how they work. if constexpr just lets me write the condition I want, as long as it really is a constant expression.

What I really wanted to do with uniquelyIdentifyingCharacter was static_assert if there’s no unique character offset for the strings that get passed in. Unfortunately, that would require making the second if into if constexpr, so that we could avoid recursing. That doesn’t work because arguments can’t be used in constant expressions, even if the arguments are going to be constant whenever we use this function. So I had to return a placeholder value, std::string_view::npos instead.

Anyway, this function works too! So now to just plug it in to the original isInSet from the first post:

__attribute__((always_inline))
static bool isInSetImpl(std::string_view s, size_t offset) {
  return false;
}

template <size_t N, size_t ...RestN>
__attribute__((always_inline))
static bool isInSetImpl(
    std::string_view s,
    size_t offset, // NEW!
    const char (&first)[N], // "reference to array of size N"
    const char (&...rest)[RestN] // "a bunch more references to arrays"
) {
  if (s[offset] == first[offset]) { // NEW!
    if (s == first) {
      return true;
    }
  }
  return isInSetImpl(s, offset, rest...);
}

// NEW!
template <size_t ...Ns>
__attribute__((always_inline))
static constexpr bool isInSet(
  std::string_view s,
  const char (&...strs)[Ns]
) {
  if (s.size() < std::min({Ns...}) - 1) {
    return false;
  }
  size_t offset = uniquelyIdentifyingCharacter(strs...);
  assert(offset != std::string_view::npos &&
         "no uniquely identifying character");
  return isInSetImpl(s, offset, strs...);
}

bool isOneOfTheStringsICareAbout(std::string_view s) {
  return isInSet(s, "Battler", "George", "Jessica", "Maria", "Beatrice");
}

And, well, hm. This does compile, and is even smart enough to only load the one character (the third letter of each name is unique), but it looks like Clang wants to make it into a series of ifs instead.

I’ll pause at this point to say a series of ifs isn’t bad. Unlike before, where the chain of ifs tested the size but might have to fall back to comparing the whole string, this time we’re just checking the single unique character. That’s something that a CPU can race through pretty quickly without touching memory or calling a function or anything.

But the goal of this exercise was to get something that looked like a switch, so I decided to go further. What if I held off on doing the full string comparison until the end?

// NEW!
__attribute__((always_inline))
static constexpr std::string_view findPossibleMatch(
  char otherChar,
  size_t offset
) {
  return std::string_view();
}

// NEW!
template <size_t N, size_t ...RestN>
__attribute__((always_inline))
static constexpr std::string_view findPossibleMatch(
    char searchChar,
    size_t offset,
    const char (&first)[N],
    const char (&...rest)[RestN]
) {
  if (searchChar == first[offset]) {
    return std::string_view(first, N-1);
  }
  return findPossibleMatch(searchChar, offset, rest...);
}

template <size_t ...Ns>
__attribute__((always_inline))
static constexpr bool isInSet(
  std::string_view s,
  const char (&...strs)[Ns]
) {
  if (s.size() < std::min({Ns...}) - 1) {
    return false;
  }
  size_t offset = uniquelyIdentifyingCharacter(strs...);
  assert(offset != std::string_view::npos &&
         "no uniquely identifying character");
  // NEW!
  std::string_view possibleMatch =
      findPossibleMatch(s[offset], offset, strs...);
  return s == possibleMatch;
}

bool isOneOfTheStringsICareAbout(std::string_view s) {
  return isInSet(s, "Battler", "George", "Jessica", "Maria", "Beatrice");
}

And this code ends up compiling to something interesting. Rather than a switch, it generates code that does something like this:

std::string_view stringsICareAbout[] = {
  "Battler",
  "George",
  "Jessica",
  "Maria",
  "Beatrice"
};
size_t index = determineIndexFromOffsetCharacter(s, offset);
return stringsICareAbout[index] == s;

Which brings us to an insightful comment by my coworker David Smith on the last post: trying to find a single byte that uniquely identifies the entire string is a specific instance of trying to find a perfect hash function. If determineIndexFromOffsetCharacter really existed, that’s what it would be doing: “if this value is in the set of strings I care about, here’s where it would be”.

In practice the compiler isn’t exactly doing determineIndexFromOffsetCharacter; instead, it just makes a table with some extra slots in it and…I’m actually not quite sure how it works, it might be doing an unnecessary comparison if the length is the same as one of the other strings. Still, it only ever does one full string comparison, which was the original goal.

It turns out this is incredibly order-dependent: if we put “Beatrice” first we actually do get a switch. Clearly the compiler thinks it should make different tradeoffs in the two different cases! I haven’t dug into why this is, but it’s something related to the last case getting used as a default instead of the empty string_view written in the source.

Anyway, that’s enough for now. As another coworker David Ungar pointed out, this sort of optimization depends on a lot more than just eyeballing the code that gets generated. But once again, it’s cool how much compilers can do these days.

Okay, okay, one more thing. You’ll notice I’ve been sneaking constexpr into all of these helper functions, meaning that if all the arguments are compile-time constants, the result should be too. And it turns out that if I mark isOneOfTheStringsICareAbout constexpr as well, this actually works:

static_assert(isOneOfTheStringsICareAbout("Beatrice"),
              "Beatrice is important");
static_assert(!isOneOfTheStringsICareAbout("Ange"),
              "Apparently Ange is not?");

…at least when using libc++, which marks up enough of string_view as constexpr to use in constant expressions like this. Pretty cool, right?