Swift Delight: Value Semantics Collections

Part of the Swift Regrets series.

Classes have reference semantics, structs made of primitives have value semantics, and structs made of collections can also have value semantics. And I can pass a collection around without worrying about someone modifying it. By contrast, Python has this pitfall that trips up newcomers where a default argument is evaluated only at function definition, not when the function is called, and so a default list persists across calls even if you modify it.

def contrived(lst=[]):
  return lst

Value semantics aren’t the only way to fix the Python pitfall, but they are one way. If Python’s lists used value semantics, every call would have its own list. (Of course, modifying a list would require extra indirection or assignment then, as opposed to just passing it in.)

In Objective-C and Java and C#, collections have reference semantics, so if you want to hang on to their current contents past the current call, you have to make a “defensive copy”. ObjC even gained a copy annotation for properties of these types. This was Fine but not great. Functional programmers, by contrast, have long known that it’s safe to share immutable data structures, where changing them really means “making a copy with something changed”. You can even sometimes share storage with the original value. And Rust extends that with “shared XOR mutable”. Either you can share something, or you can mutate it, but not both, and the compiler will check that for you. Or the library, if you need to do something the compiler can’t prove. If you can’t do either, you have to clone the value. Finally, C++ has all the pieces of this (const pointers and copy constructors), but no enforcement. Sure, you can pass everything around by value, but that’s wasting a lot of work. It’s much easier to share collections instead.

Swift came into all this and said “okay, can we do better than defensive copying? without using the naive approach of eager copying, of course”. And…the answer is “yes, via reference counting”. Refcounting isn’t usually held up as one of Swift’s great features. It’s got trade-offs vs. a tracing garbage collector, with a huge downside being that you have to avoid/break reference cycles manually, or you leak memory. And those are usually non-local, so the compiler can’t help. But one upside is that you can ask whether an object is “uniquely referenced”. If it is, you can safely modify it! Cause it’s not shared!

“wait I thought it’s not safe to check retain counts” 1 is a special case, as long as there are no weak/unowned references.1

What if the object isn’t uniquely referenced? Then you gotta copy it if you want to modify it.

But if you’re just passing it around, no need to copy. You can share the backing storage via refcounting. And voila, copy-on-write!

So because Swift uses refcounting, it can provide CoW collections that avoid defensive copies, while Java and C# cannot (at least not without runtime changes). And that means the default collection types can have value semantics. No defensive copying, no accidental reuse.

Rust does provide this functionality as Arc::get_mut/make_mut, but it’s not the usual collection type, which means…my idiomatic Rust code, in some cases, might be doing more copying than my idiomatic Swift. Of course there’s more going on there, but it’s a reminder that (1) higher-level abstractions sometimes enable more efficiency, and (2) “currency” types have a lot of influence.

P.S. I say this confidently now, but in the pre-Swift-1.0 days we worried that making every modification check for uniqueness would be too expensive…so we tried a half-measure. It wasn’t good. Scroll down to “Bonus Track” in Tammo Freese’s article from the first year of Swift.

P.P.S. Note that copy-on-write can be done in GCs; it’s avoiding copies when it’s not strictly necessary that’s harder. But consider Java’s String and StringBuilder: you can share String around all you want, and when you need to modify it you can make a StringBuilder. You could have a language that does that more implicitly.

  1. isKnownUniquelyReferenced could check for Swift-style unowned (and does check for Swift-style weak, I believe), but it can’t do anything about Unmanaged or ObjC weak, so you really do have to enforce this by not escaping any direct references to the storage, or else you’ll have shared mutation. ↩︎