Hi all,
So, I was thinking about Objective-C's class clusters last week, and how they were basically our "Protocol-Oriented Programming" before we had Protocol-Oriented Programming. A cluster like NSArray behaved very much like a Swift protocol; it established a basic set of primitive operations that each subclass was expected to implement, and provided a bunch of default behaviors that were built on top of those primitives. Most crucially, NSArrays were accepted by virtually every array-taking API on the system, which meant you could take a bridged CFArrayRef, any of the other 38 NSArray subclasses included with the frameworks, your own homegrown array type, or even something crazy like that thing that -[NSObject mutableArrayValueForKey:] returned, and use it exactly like any other array, without ever having to convert or translate it.
In Swift, of course, we have protocols like Collection which do much the same thing as the old class clusters. In some ways, though, they are not as useful, because most of the framework APIs don't support them. Instead, these APIs require a certain specific type of collection—an Array—with the result being that anything else you happen to be using needs to be converted in order to be passed to any of these APIs. In addition to being less convenient, this can result in performance penalties resulting from all the conversions.
Does it need to be this way? It seems to me that a method declared thus:
- (void)fooWithBar:(NSArray<SomeType *> *)bar;
could simply be imported as taking a Collection, like so:
func foo<C: Collection>(bar: C) where C.Element == SomeType
Similarly, methods taking NSString parameters could be imported as taking StringProtocol, methods taking NSSet parameters could be imported as taking Sequence & SetAlgebra (or something of that nature), methods taking NSData parameters could take a hypothetical DataProtocol, and perhaps we could make a protocol for Dictionary to belong to as well. When crossing the Objective-C bridge, these could all simply be bridged to custom subclasses of NSArray, NSString, et al that, instead of wrapping an Array or a String, instead wrapped a Collection or a StringProtocol. This seems not too difficult to implement—many of these wrappers exist already, and would only need to be changed to wrap the generic type instead of the concrete. In return, the usability of Swift protocol types would be greatly increased all across the board, and I don't think it would break much if any code at all. It's a slam dunk.
The next thing to think about would be return types, which is likely to be more controversial because it would require some non-trivial changes in usage patterns. However, I think it's at least worth discussing, because I can see some not insubstantial benefits. Methods that return generic types like NSArray would need to be left alone until we have generalized existentials in the language, but for non-generic types such as String, we could conceivably replace those with a protocol today:
- (NSString *)returnsAString
becomes:
func returnsAString() -> StringProtocol
Since StringProtocol contains an impressive amount of String's interface, many strings returned in this manner could be used as is, in an immutable context. In a mutable context, source changes would be required, as the string would need to be converted:
var mutable = String(returnedFromObjC)
However, strings returned in this manner would inherit one large benefit over the status quo, which is a significant reduction in the amount of bridging magic required. Since NSString can simply be extended to conform to StringProtocol, the only automatic conversion that would strictly be needed would be to call -copy on the returned object to make sure it won't mutate on us after the fact, and otherwise it could just be passed to us as is. In addition, the returned string could then be passed to another Objective-C API taking a StringProtocol without incurring the performance hit of the two-way round-trip conversion that is currently required. This would also allow for great simplification of String's internal implementation, improving the maintainability of the code. Finally, we could also extend the CoreFoundation types such as CFString to conform to the protocols, and almost all of our bridging hassles would just magically disappear.
However, the one type that I feel would be overwhelmingly improved by a protocol-oriented Objective-C bridge is Data. Currently, any given Data struct can be, under the hood:
- A simple native-Swift wrapper around a memory buffer.
- A slice of the aforementioned wrapper.
- A bridged
CFData object.
- A slice of a bridged
CFData object.
- A contiguous
dispatch_data_t object.
- A non-contiguous
dispatch_data_t object.
- A slice of either a contiguous or a non-contiguous
dispatch_data_t object.
- One of the various other
NSData subclasses defined by the frameworks, such as NSConcreteData, NSPurgeableData, or NSSubrangeData.
- A custom user-defined subclass of
NSData.
- A slice of another framework-defined or user-defined
NSData subclass.
- Honestly there are probably more cases that I'm forgetting.
Having this one Data type take on such a staggering number of jobs has been notoriously difficult to implement, resulting in numerous bugs. Here is a list of bugs involving Data that I have run into, personally:
Note that this is not an exhaustive list; these are only the ones that I have managed to find on my own; there have surely been others. Most of these bugs are not trivial, either; some of them caused crashes, and others, more insidiously, have been capable of silently causing data corruption without giving any outward warnings. Even once the bugs listed above are all fixed, it's difficult to be certain that there aren't going to turn out to be more data-corrupting edge cases that we simply haven't discovered yet because the Data type contains more magic than Willy Wonka's chocolate factory.
In addition, the panoply of implementations that Data may have makes it impossible to reason about its performance. Will processing a Data simply involve running through a memory buffer? Will it involve Objective-C message sends and/or autoreleases? Will it involve disk access? There is no way to tell without making a copy first. If Data is a simple type with one implementation, we can use it and know exactly what to expect, in cases where performance is important. In cases where it isn't, we can use the protocol.
There are other secondary hygienic improvements that could come from this as well, such as eliminating the current awkwardness that results when you have a DispatchData that you need to pass to a function that takes a Data, or vice-versa. By rewriting our functions to take a DataProtocol, this quickly becomes a non-issue.
Finally, separating out the slice types for Data (and DispatchData) would eliminate the land-mine that currently exists in the form of integer index subscripts, by which you can, for example, call [0..<4] to get the first four bytes of a Data which you don't realize is a slice. Depending on the slice range and the indexes used, this can result in a crash or, worse, silent data corruption.
At one point I asked the question of why Data was made to be its own slice, given the implementation difficulties that that has posed, and the answer given was that before this change was made, Data slices were scarcely used, because most of the frameworks require Data, and so developers were typically just copying the bytes every time. This is understandable, but I believe it is the wrong solution to the problem; basically we have a struct type—Data—which is doing the job of a protocol, by thinking like a class cluster.
We should just use a protocol.
What are your thoughts on the matter?