Allow `override` of `open` methods in & from extensions in same file as main class

jrose · April 6, 2020, 7:13pm

Way back in 2016 I was looking at allowing overridable methods in cross-module extensions (with an implementation similar to objc_msgSend), as [Pitch] Overridable Members in Extensions. I never got the chance to work on it, but at the time I included this:

Note: it's already plan-of-record that if the extension is in the same module as the class, the methods will be treated as if they were declared in the class itself. This proposal only applies to extensions declared in a different module.

Whoops. That didn't happen. But I think it still could happen, for open methods and for non-open methods alike. The implementation would be something like this:

When emitting the class, scan all extensions for non-final members and add them to the dispatch table (vtable). This is a little expensive but not very expensive; these methods already need to be parsed to detect naming conflicts, and computing declaration types is a lot cheaper than type-checking function bodies. The implementations of these members are still emitted in their own file.

(I don't see any reason to limit this to one file except maybe compilation time, and in practice I don't think it'll be a significant difference anyway. Did you have a reason why "just the main file"?)

So given this implementation strategy, should we make this change? The main advantage seems to be "my code would be clearer if I could put these methods in an extension, but they also need to be overridable, so I don't have a choice", with a follow-on advantage being "this is one less limitation on extensions to teach". In my mind, there aren't really any major disadvantages.

There is one interesting point here, which is that "all Xs must be in the original declaration rather than extensions" can be useful for people reading the code, especially when they need to know the complete set of Xs to write the code properly. There are a few of these in particular:

Stored properties for a struct must be in the main declaration
- anyone writing a non-delegating initializer (within the module) needs to know all of these
- unless they have initial values
Stored properties for a class must be in the main declaration
- anyone writing a designated initializer (must be in the main declaration also) needs to know all of these
- unless they have initial values
Designated initializers for a class must be in the main declaration
- anyone subclassing the class who wants to inherit convenience initializers needs to know these
- but not interesting if the class is final
EDIT: a deinit for a class must be in the main declaration
- this is mostly because there's only one; I don't think it's the same category as the rest of these, but I think it's worth mentioning here for completeness
Cases for an enum must be in the enum
- pretty much necessary for anyone switching over the enum
Protocol requirements must be in the protocol
- anyone adopting the protocol needs to know all of these
- unless the requirement has a default (which is actually rather hard to find out without something like Apple's docs…)
Overridable members need to be in the class
- you pretty much never have to know all of these

It's true that you'd no longer be able to just look at the class declaration to see what methods are overridable, but I don't think that's a huge loss. It seems more valuable to me to allow overridable methods to live in different files and therefore be grouped with other code that might be related. So overall I'm in favor of this proposal. (It seems minorly telling that no one's spoken up against it either.)

You did also mention allowing stored properties, which I think is a proposal best done separately even if the implementation would be rather similar. I feel a bit iffy about doing this for structs, but I'm not sure there's any real reason to be concerned; someone outside the module should mostly be able to treat a struct's stored and computed properties uniformly. With classes in particular it's a more favorable tradeoff: the only people who have to know all of the stored properties are the ones implementing the designated initializers, which are always at least in the same module. I do suspect some people to want the language to enforce that all the stored properties are in one place, though.