What's the best way to override CustomStringConvertible for a collection?

kansaichris · May 25, 2019, 7:34am

Imagine that you have a set of integers and you want to override the default implementation of CustomStringConvertible for a Set to use set-builder notation. You might try to do this with an extension like the following:

extension Set: CustomStringConvertible where Element == Int {
    var description: String {
        let values = self.map({ "\($0)" }).joined(separator: ", ")
        return "{\(values)}"
    }
}

Unfortunately, this fails to compile with the following error:

Conformance of 'Set' to protocol 'CustomStringConvertible' conflicts with that stated in the type's module 'Swift' and will be ignored; there cannot be more than one conformance, even with different conditional bounds

Well, it looks like that won't work because Set already conforms to CustomStringConvertible.

Okay, fine. We can still provide our own implementation of description without declaring a (conflicting) conformance to CustomStringConvertible:

extension Set where Element == Int {
    var description: String {
        let values = self.map({ "\($0)" }).joined(separator: ", ")
        return "{\(values)}"
    }
}

Now the compiler's happy, so let's kick the tires a bit:

let set: Set<Int> = [7, 3, 15, 31]

String(describing: set) // prints "[7, 3, 15, 31]"

Wait, what? Where are our curly braces?

set.description // prints "{7, 3, 15, 31}"

Oh, there they are. Hmm…

I know that static dispatch is used to call methods that are defined in a protocol extension rather than in the protocol itself, but description is defined in the original CustomStringConvertible protocol, so shouldn't init(describing:) use dynamic dispatch to call our own custom implementation?

All hope is not lost yet, though. We can also try defining an empty SetBuilderNotationRepresentable protocol:

protocol SetBuilderNotationRepresentable: CustomStringConvertible {}

Next, we'll make sets of integers conform to this new protocol:

extension Set: SetBuilderNotationRepresentable where Element == Int {}

Finally, we'll provide our own implementation of init(describing:):

extension String {
    init<Set>(describing instance: Set) where Set: SetBuilderNotationRepresentable & Sequence {
        let values = instance.map({ "\($0)" }).joined(separator: ", ")
        self.init("{\(values)}")
    }
}

Now all that's left is to cross our fingers, and…

String(describing: set) // prints "{7, 3, 15, 31}"

Success!!

In all seriousness, though, this seems a bit excessive just to replace a pair of square brackets with curly brackets. Am I missing something here, or is this really the most "elegant" way to provide our own implementation of CustomStringConvertible for a collection?

Ben_Cohen · May 25, 2019, 3:40pm

The simplest answer to all of this is: you can't, and that's a feature not a bug. Set isn't your type and you don't get to change its already-defined behavior like this.

The capability to intercept and alter the behavior of a type to do something different to what that type is originally designed to do is powerful, but fraught with downsides. In this case, you've chosen a fairly benign thing to alter, but in other cases doing this could do all sorts of damage to assumed invariants of a type by changing its program-wide behavior.

The best way to alter a type to do something different like this is to wrap it in your own low-cost struct. Unfortunately this does mean writing a fair amount of boilerplate for forwarding – though less and less these days as we gain features like synthesized conformances and dynamic member lookup. Hopefully someday we'll get features that do make easier to create a newtype with customized behavior.

Ben_Cohen · May 25, 2019, 4:03pm

Note, you haven't actually given Set a new CustomStringConvertible implementation here. What is happening is there is now a new overload of String.init(describing:) in your module. You then call it directly. You get the same behavior without defining an extra protocol:

extension String {
  init(describing instance: Set<Int>) {
    let values = instance.map({ "\($0)" }).joined(separator: ", ")
    self.init("{\(values)}")
  }
}

let str = String(describing: [1,2,3] as Set)
print(str) // prints "{1, 3, 2}"

But if you were to pass your type into code in another module that uses Set's CustomStringConvertible conformance (such as print), or a generic function in your own module that doesn't constrain to your new protocol, you will get the regular behavior:

func myPrint<T: CustomStringConvertible>(_ t: T) {
  let str = String(describing: t)
  print(str)
}

myPrint(intset) // prints [1, 3, 2]

kansaichris · May 27, 2019, 5:43am

Great, thanks for confirming.

In retrospect, it does make sense that Swift wouldn't want you to change behavior built into the standard library. (Or, as @jrose said, Swift is a Safe, Fast, and Expressive language, and "Safe" is paramount.) I guess that protocol extensions are just so convenient and powerful that I get a bit carried away sometimes.

All that said, I'm still not 100% sure I understand why, given this extension:

extension Set where Element == Int {
    var description: String {
        let values = self.map({ "\($0)" }).joined(separator: ", ")
        return "{\(values)}"
    }
}

these two statements don't produce the same result:

String(describing: set) // prints "[7, 3, 15, 31]"
set.description // prints "{7, 3, 15, 31}"

Why does the former use the default implementation in the standard library but the latter uses the custom implementation in our extension? Is it just because init(describing:) takes a CustomStringConvertible argument but description is being called on a Set<Int>?

Ultimately, I think that the best route to take in this case would probably be to extend DefaultStringInterpolation and add an appendInterpolation(_:) method:

extension DefaultStringInterpolation {
    mutating func appendInterpolation(_ set: Set<Int>) {
        appendLiteral("{")
        appendLiteral(set.map({ "\($0)" }).joined(separator: ", "))
        appendLiteral("}")
    }
}

This provides the aesthetically pleasing syntax that I was trying to achieve at the call site in the first place:

"\(set)" // prints "{7, 3, 15, 31}"

I think that "a fair amount of boilerplate" may be a bit of an understatement for a custom struct that wraps Set.

For the sake of argument, though, let's suppose that I wanted to go to the trouble of creating my own custom wrapper type around Set (however ill-advised that may be). How would I go about creating this new type? For example, what would be the easiest way to get a list of all the methods, properties, and protocol conformances that I would have to implement in order to make my wrapper as interchangeable with an ordinary Set as possible?

Indeed. You know, that did occur to me in the hours after I published my original post, but I neglected to edit it later to clarify that point…

Ben_Cohen · May 27, 2019, 4:26pm

Yes, this is the distinction. There is a difference between protocol dispatch and overloading.

Protocol Dispatch

When you define a protocol, what you are defining is a table of functions that need to be filled in by the type that conforms to the protocol. In CustomStringConvertible's case, it's a table with just one single function to get the description property.
When you conform to a protocol, you create that table and fill it in. So Set conforms to CustomStringConvertible and fills it in with its copy of description. Once filled in, at the point Set is compiled, that table cannot be altered. And a type can have one and only one table for a protocol conformance. It can't have multiple tables depending on different circumstances like different conditional conformance, or multiple protocols that further refine the original protocol.
When you write generic code constrained to a protocol, you are writing code that takes that table and uses it. It looks up the table for the passed-in type and then calls the functions it finds in that table. Since you cannot alter the table, nothing you do outside that generic function can change the function that is going to be called.

So, once Set has conformed to CustomStringConvertible there is nothing you can do that will change the version of description that will be used inside a function that uses that protocol conformance.

Overloading

Swift supports overloading. You can write two versions of a function, both with the same name, and when you call that function, Swift will pick which version to use. The rule of thumb for which one it will call is "the most specific" one. So for example:

// least specific – could be any T
func f<T>(_ t: T) { print(1) }
// slightly more specific, any T that conforms to CustomStringConvertible
func f<T: CustomStringConvertible>(_ t: T) { print(2) }
// more specific still: takes a Set, but containing any T
func f<T>(_ t: Set<T>) { print(3) }
// very specific: takes only a Set of Int
func f(_ t: Set<Int>) { print(4) }

The choice of which function gets called is made at the call site, at compile time. When you call a function, the compiler takes all the implementations of a function it can see at the time of compilation, ranks them, and picks the most specific one that will work:

let intset: Set = [1,2,3]
f(intset) // prints 4
let stringset: Set = ["a", "b"]
f(stringset) // prints 3
f("foo") // prints 2, String conforms to CustomStringConvertible
let tuple = (1,2) // tuples don't conform to protocols
f(tuple) // so this prints 1, the only option

Protocols and Overloads Together

Now suppose we write another function, g, that is generic:

func g<T: CustomStringConvertible>(_ t: T) {
  f(t)
}

g(intset) // prints 2

Why does this print 2? Why not 4?

It does so because within the function g, all that is known about T is that it is some type that conforms to CustomStringConvertible. But what type is not known. So, it calls the most specific function f that works for that set of constraints. Outside of g, it might be known that t happens to be a Set of Int, and so there is a more specific overload for that. But that information cannot be used inside g, which only knows about what it is told via its constraints.

Why can't g consider overloads beyond that? Well, from a practical perspective, the compilation and runtime for that kind of system would be way more complicated, and near-impossible to optimize. Swift would need to keep tables for every type with every possible overload. When taking type-based overloading into account, this would end up being far more complicated equivalent of Objective C's message sending. And everything would need to happen at runtime, because you can't know ahead of time if a library is loaded into an app that defined another method. This rules out most specialization and optimization (which types such as Set benefits from for efficient operation) entirely.

But more importantly, arbitrary code injection into functions is very much a non-goal in Swift. As powerful as it is, being able to override fundamental behaviors of a type is dangerous when applied to types that have not explicitly been designed to account for that possibility (and in a world where anything can be monkey patched, no types would reasonably be able to account for that possibility).

Even without the framework issue, the behavior of g above is a good thing for local reasoning. In a world where specific overloads were taken into account, knowing exactly what function would be called based on what type is passed in would be very hard to follow. Instead, you have a simple rule for which description is going to be called: the one the type gained when it conformed to CustomStringConvertible, every time.