Deep Generic Specialization

Hello Community,

I recently built a hobby framework on the assumption that the compiler can perform even deeply nested generic specializations, but this does not seem to work here.

In my framework RedCat, I have implemented reducers as a protocol taking only the state as an associatedtype. Actions are given to the reducer using generics:

public protocol ErasedReducer {
   associatedtype State 
   func apply<Action : ActionProtocol>(_ action: Action, 
                                       to state: inout State,
                                       environment: Dependencies)
...
}

There are a couple of more concrete reducer types that actually do have an Action associatedtype:

public protocol ReducerProtocol : ErasedReducer {
   associatedtype Action : ActionProtocol
   func apply(_ action: Action, to state: inout State)
}

public extension ReducerProtocol {
   func apply<Action : ActionProtocol>(_ action: Action,
                                       to state: inout State,
                                       environment: Dependencies) {
      guard let action = action as? Self.Action else {
        return
      }
     apply(action, to: &state)
   }
}

Finally, composed reducers that can consume different actions essentially just forward apply calls to two sub-reducers.

When I mark ReducerProtocol's generic apply function as inlinable, then thould be possible to highly optimize the function to the point that it will only be called when the guard condition actually succeeds - which is statically knowable.

However, this seems to only work up to a certain level of nesting of reducers. When I jus tried to compile this example project with optimizations, the disassembly showed that the dynamic cast is still there. I confirmed this by setting break points at the return statements - the break points were actually hit (which wouldn't be the case in properly optimized code).

I'm not 100% sure if it's the nesting level, but there are things that point to this as an explanation. Here's my reasoning:

The point from which I call the generic apply function is actually some kind of erasure:

public extension ErasedReducer {
   func applyDynamic(_ action: ActionProtocol,
                     to state: inout State,
                     environment: Dependencies) {
...
   }
}

but this should be ok, because this only calls the following:

extension ActionProtocol {
   @inlinable
   func apply<R : ErasedReducer>(using: R,
                                 state: inout State,
                                 env: Dependencies){
...
 } 
}

which in turn calls the reducer's generic apply method using self. And I actually confirmed the guard is actually properly eliminated using a composition of just two reducers. It doesn't work for deeply nested reducers though and I can't figure out why.

How would I work around this? What to do now?

Edit: Correction: originally, I did not test this with the composition of two reducers, but I tried to create a minimal failing example by writing a 50 lines swift package containing only the relevant code, but the generic specialization succeeded. I concluded that it had to do with the level of nesting, but it may have other reasons. For instance, the 50 lines package was in one file only. Also, the top level protocol did not have any associatedtypes.

For completeness' sake, here's the gist of composed reducers:

public extension ErasedReducer {
   func compose<Next: ErasedReducer>(with next: Next)
   -> ComposedReducer<Self, Next> {
      ComposedReducer(r1: self, r2: next)
  }
}

public stuct ComposedReducer<R1 : ErasedReducer, R2 : ErasedReducer> : 
ErasedReducer where R1.State == R2.State {

   @usablefrominline
   let r1: R1
   @usablefrominline
   let r2: R2

   @inlinable
   func apply<Action: ActionProtocol>(_ action: Action, 
                                      to state: inout State,
                                      environment: Dependencies) {
      r1.apply(action, to: &state, environment: environment)
      r2.apply(action, to: &state, environment: environment)
   }

}

Ok, damn. It has nothing to do with the depth of the reducer, it does already fail with just two composed almost trivial reducers:


enum AppState {
    
    case string(String)
    case int(Int)
    
    static let reducer = toString.compose(with: toInt)
    
    static let toString = ClosureReducer {
        (_: Actions.String, state: inout AppState) in
        state = .string("Hello, World")
    }
    
    static let toInt = ClosureReducer {
        (_: Actions.Int, state: inout AppState) in
        state = .int(42)
    }
    
}


extension Actions {
    
    struct String : ActionProtocol {}
    struct Int : ActionProtocol {}
    
}

ClosureReducers conform to a reducer protocol that gets its generic apply method with the dynamic downcast and the guard mentioned in OP. I tested this in a trivial app with optimization compiler flags.

Edit: ok, I found that the implementation of the "trivial app" matters. I essentially sent the two actions to the composed reducer using two buttons, but when I did this using a closure, the function could not be optimized, whereas when making them instance methods of my ContentView and handing those methods to the buttons, elimination did happen.

However, this was only the case after moving back to @inlinable rather than @inline(__always) (which I experimented with out of desparation).

So now I'm back again at this observation: for small reducers, the function will get optimized, for longer ones, it doesn't.

My understanding is that the compiler uses a heuristic to not inline functions that are just too long. This makes sense in usual scenarios. And actually I don't give a damn about inlining, what I want is generic specialization and optimization - and I want this for all action types that I actually send to my reducer (which is only knowable in the project where I import RedCat, not in RedCat itself).

The example project is quite large, and I probably won't be looking too much into it.

FWIW, as? is a dynamic operation. Instead, I've been using something akin to

guard Action.self === Self.Action.self else {}

which has quite a different meaning (esp. around classes), but I do consistently get specialization with it. (It's still not guaranteed, though.)

2 Likes

Thanks for the answer, I will try. I'm aware that what I'm doing is a dynamic operation, but isn't that the case for virtually anything inside a function body?

Example project: yes, in the current form of the framework, the problem seems to only arise for deeply nested reducers, so it is necessarily large. It wasn't planned as a showcase for the problem though, but as a showcase for how to use the framework.

Works like a charm, thanks!

Now, I discovered a new problem: thanks to specialization, most reducers should now be empty at any given call. But my ComposedReducer doesn't see that. Hence, a lot of empty functions are called :frowning: But hey, progress is progress :smiley:

If no-op is the only valid behaviour for type mismatch, I'd also suggest that you remove ErasedReducer protocol.

Instead, have erasure as a struct that retain the type of Action:

struct AnyReducer<Action: ActionProtocol>: Reducer {
  ...
}

which may or may not include initializer from reducer with different Action, depending on how important it is to your logic.

Hmm, I don't know if I understand what you mean.

Well, *.self is known at compile time should the function be specialized. So I guess the compiler would have an easier time with it.

On that note, it may not work if you need to support any non-final class. Since if Action.self is A.self, the action itself may also be any subclass of A.

TBH, I don't know why you'd accept different Action other than the associated type either. Since it should be obvious that you can't call it given you propagate the Action all the way through.

The reason why my top level ErasedReducer has no associated action type is that I want to be able to compose reducers with different actions. The usual solution is to have all reducers use the same action type, which means you need an enum and do a lot of dynamic switching to figue out which reducer should respond. This is what I was trying to avoid, and my idea was that generic types and functions can be specialized so that after specialization the huge app reducer collapses to only the tiny little functions that you actually need at runtime - for any given action. Essentially, I want to achieve static dispatch.

In that case, I'd also suggest that you name the generic version and specialized version differently.

Maybe change ErasedReducer.apply to ErasedReducer.applyAny. You'd have easier time spotting wrong function call more easily.

PS

I don't really know why the compiler still call the empty function. Did you make sure the caller of apply is also @inlinable?

Changed it to applyErased. The calls are exactly as I would expect in non-optimized builds. The composed reducer's applyErased body calls applyErased of its two children (the type of which is known). All applyErased implementations inside the package are inlinable and all callers inside the framework and their callers are inlinable as well. Still, the no-ops don't seem to get removed.

Yeah, I'm just saying in general. I don't think it'd apply specifically to the current version. I still have no idea what would cause it. Tho if probably means the caller of applyErased is not specialized, or that it couldn't proof that it's using an empty applyErased in particular.

PS

On totally unrelated note, make sure to have applyErased call apply ;) You could save some type checking time that way.

applyErased calls apply in all reducers that have an associated action type. In ComposedReducer, this isn't possible because I can only assume the child reducers to be ErasedReducers. Otherwise I couldn't build a tree of ComposedReducers.

Welp, that's about all I can think of on top of my head :v:

Thanks for trying, the guard Action.self == Self.Action.self was actually helpful.

Well, then no static dispatch anytime soon :D Sad, the dynamic dispatch is the one thing that bothers me about most redux style architectures.

Okay, I looked into the disassembly again and I don't think that the individual ErasedReducers (not talking about the ComposedReducers) got specialized. Yesterday, I did confirm that the break point on the return statement in the guard block isn't hit anymore, but the way this is achieved was not by specialization apparently.

The disassembly contains a jne to the end of the function body, and the rest of the function body still seems to be intact. I can tell because in order to make it work, I put an as! after the successful guard, and the disassembly shows a dynamic cast. So the new guard condition did change how this function is treated by the optimizer, but it did not lead to specialization - or at the very least, the compiler did not further optimize the potentially specalized code.

Today I've been reading a bit about @_specialize and I found that not even this could deal with this problem, since you don't know in advance for which types you want to specialize this. Consequently, I've been thinking about the possibility of an unconditional argument for the @_specialize attribute - or even a standalone @unconditionalSpecialize attribute. Semantics: generate specialized versions for this function wherever possible in the current compilation context, i.e. for every call to this function visible from the place where the compiler is invoked.

What do you think? Reasonable idea? Useful idea? How the hell would I implement this (which is kind of required when suggesting a feature)? I Wanted to pose these questions in this thread (to someone who sees where I'm coming from) before starting yet another thread :D

If the compiler already specialize to the point of removing guard, it should have everything to remove casting as necessary. That is still doesn't do that then probably due to some ABI restriction, which I don't know too much of.

Plus specialization isn't free. Forcing specialization for every usage just invite explosion of binary size. Also that sounds like a lot of work to avoid writing the sieve yourself. Maybe it might work better if Reducer is a separated protocol rather than refining ErasedReducer.

Well, the theory is that after generic specialization, only the few reducers handling the specific action remains, as everything else can be removed. That means that the specialized methods will be very small. Also, since there's really only one reducer per app (other reducers live in non-optimized test classes; there, the specialization can be skipped), the number of methods that need to be generated by specialization is O(n), where n is the number of concrete actions sent to the reducer - this is totally ok, since you have at least n reducers anyway to handle all those actions.

As stated two posts ago, I'm no longer sure that anything is being specialized in my example app. I thought it was specialized, because the else block is no longer reached, but the disassembly shows a lot of dynamic things going on, not only the dynamic cast, but also protocol witness lookup, associatedtype lookup and importantly jne.

Hmm, the selling point of unidirectional data flow libraries is, that you are able to compose your business logic up to one single central side-effect free reducer and all side-effects that the app needs to handle are just decorators for that reducer. Not sure how I could achieve this clean architecture with a Reducer protocol that is aware of actions, if the point I want to make with this framework is static dispatch of actions. Maybe it's just not possible with Swift :confused: