Avoiding copying when casting a struct to protocol?

davidbaraff · June 8, 2023, 4:14am

public protocol Asset {
    var member1: Int { get }
    ...
    var memberN: Int { get }
}

internal protocol SerializableItem {
    var encodedJSON: String { get }
}

public struct VeryHeavyStruct : Asset, SerializableItem {
    // implements everything it needs to, and has enough data
    // that copying a bazillion of these wouldn't make me happy.
}

The idea is that we vend Assets, but behind the scenes we expect any struct of type Asset to also conform to SerializableItem. We check at runtime that this is true, and issue a fatal error if it's not. In practice this means we can expect our code will meet this requirement, though we can't enforce it at compile time.

public class Document {
    public var assets = [Asset]()

    public func encode() -> [String] {
      assets.map {
         guard let item = $0 as? SerializableItem else {
             fatalError("...")  // scold programmer for non-conformance
         }

         // Q: did we just copy the asset here?
         return item.encodedJSON()
   }
}

What I'm trying to do is hide the serialization aspect from anyone outside the library that implements this. So while I expect every Asset object to be serializable, I have to cast to get to my protocol at runtime. But is this copying my items, which would make me sad?

A similar concern comes up, where I might like to allow a mutation of certain private fields of the Asset object, but only by an internal protocol of the library. Now copying isn't just slow, it's inducing incorrectness, because if I cast and then call a mutating method in my internal protocol, I've mutated the copy.

I've wanted to be able to have non-public members that satisfy (public) protocol requirements for this reason, but haven't been able to. Any good solutions to these two problems?

David_Smith · June 8, 2023, 5:45am

Honestly this sounds like you might want to make your struct into a class

lukasa · June 8, 2023, 10:54am

Or, to keep the semantics the same, have your struct store a class that contains all the stored properties and then forward behaviours to it.

Joe_Groff · June 8, 2023, 3:21pm

Copying can't really be avoided easily with casting as it's currently expressed and implemented in the language, though I'm hoping as we build out runtime support for noncopyable types we'll have some solutions eventually. David and Cory's advice about indirecting some of your struct's storage is a good idea in general.

If that's not acceptable, beware that this is a bit circuitous, but I think you could manage casting without copying by testing whether the type itself conforms to the protocol rather than the value, and doing pointer cast trickery to turn a pointer to the original value into a pointer to the type with the added conformance:

// Open the type of `$0` as a generic `Asset`
func open1<A: Asset>(asset: A) -> [String] {
  // Cast the type to get its SerializableItem conformance
  guard let itemType = A.self as? SerializableItem.Type else { fatalError("...") }
  // Open the conformance
  func open2<I: SerializableItem>(itemType: I.Type) -> [String] {
    return withUnsafePointer(to: asset) {
      // We're really casting to the same pointer type here since A == I
      assert(A.self == I.self)
      let itemPointer = unsafeBitCast($0, to: UnsafePointer<I>.self)
      return itemPointer.pointee.encodedJSON()
    }
  }
  return open2(itemType: itemType)
}
return open1(asset: $0)

Joe_Groff · June 8, 2023, 3:30pm

Another thing you could possibly do is statically require that Assets conform to SerializableItem indirectly, but hide that fact in an implementation struct.

public protocol Asset {
  // We can't hide the fact this requirement exists, but we can hide what it
  // is inside a struct with no public accessors or initializers.
  // The `<Self>` generic parameter prevents external code from
  // making invalid conformances by "stealing" the opaqueRequirements
  // from a conforming type
  static var opaqueRequirements: AssetOpaqueRequirements<Self> { get }
}

public struct AssetOpaqueRequirements<T: Asset> {
  internal var serializableItemType: any SerializableItem.Type
}

// Any Asset that's also a SerializableItem gets a default conformance
extension Asset where Self: SerializableItem {
  static var opaqueRequirements: AssetOpaqueRequirements<Self> {
    return AssetOpaqueRequirements(serializableItemType: Self.self)
  }
}

That might allow you to both get a structural guarantee that Assets are all also SerializableItems, and let you recover the SerializableItem conformance directly from the Asset conformance instead of doing an expensive dynamic lookup. I think you'd still need to do pointer juggling to use that conformance on assets without copying them currently, though.

davidbaraff · June 8, 2023, 3:43pm

Is there any way I can create a type that says “I’m an Asset AND I’m a SerializableItem”. Can I do

var assets = [Asset & SerializableItem]()

or

 public protocol Both : Asset, SerializableItem { }
 var assets = [Both]()

The latter would likely require that SerializableItem become public, which is annoying, but a reasonable price to pay for better performance/no copying…

Joe_Groff · June 8, 2023, 3:48pm

Yeah, either of those would work if it's acceptable to expose SerializableItem. Could the var assets = [Asset & SerializableItem]() in Document be made internal, and you expose an AnySequence<Asset> as the public interface? That could be a way to allow for efficient static access to the conformances while still hiding the presence of SerializableItem from the public.

davidbaraff · June 8, 2023, 4:54pm

You’re saying make a computed property which returns an AnySequence when clients want to iterate over the assets? But wouldn’t each call of that method copy the entire array of assets?

It’s highly desirable to access a container of the assets so you can map/reduce over it, e.g. computing the bounding box of all the items when assets are geometric entities, etc. Having to copy the entire array for purposes such as that would be a no-go.

I might simply expose SerializableItem to the public view, though it’s a bit distasteful. Still haven’t found a good way of handling the case when I need to mutate the asset slightly, and that for sure wants to be internal…

Not making these things classes has made the programming simple, so I don’t think I want to go there unfortunately.

Joe_Groff · June 8, 2023, 5:23pm

What I had in mind is that you'd implement it as a lazy map over the underlying array:

var assets: AnySequence<Asset> {
  return AnySequence(_assets.lazy.map { $0 })
}

The array is copy-on-write, so its contents wouldn't be eagerly copied this way. Client code would still copy each element that it does visit, to do the conversion from Asset & SerializableItem to Asset; I don't have a good idea in mind to avoid that yet unfortunately.

davidbaraff · June 9, 2023, 3:11pm

var items = [Item]()
// populate items, assume Item is a struct

for item in items {
    // A. look at item  
   //  B. mutate item
}

Just to be clear: whenever I do the above, i’m already copying my item right, i.e. what I see at A is a copy of the data that was in the array. And B won’t affect the contents of the array itself, since it is mutating a local copy?

Does

let result = items.map { $0.someDataItem }

copy each item during the traversal, because $0 is a copy of the array item?
It occurs to me that maybe my worrying about extraneous copies is misplaced, because there’s a whole bunch of copying going on anyway…

costinAndronache · June 9, 2023, 6:49pm

This is not the way to go;
Essentially you're vending a public protocol that your users' types shouldn't conform to because they will crash their app?

When you want to ship a bundle of [public_info(conforms to Asset), internal_info(can serialize itself)] into the same type you actually really want a base class Asset.

Public protocols must expose all their requirements publicly, public classes can opt out to keep some of their requirements internal to the framework. (I'm not sure if there's ever a chance to implement private dynamic dispatch info for protocols in the compiler)

You can also make the base-class non-derivable so that you can vend all of your concrete sub-classes like VeryHeavyClass with the guarantee that the client code won't pass into Document any of their own implementations of Asset

You leak no unnecessary type information in this way but you will have to convert your structs to classes;

IF you insist on value type semantics still then the other way would be to turn Serializable into an actual opaque type to the client code, i.e. a class with zero public definition but internal implementation details that are customized by each concrete asset type; Then you must make it a requirement in Asset to return a instance of Serializable; (This is a similar but simpler version to @Joe_Groff 's approach)

This way you keep your initial type hierarchy although you're back at leaking unnecesary information into client code; But at least now you'll be sure no one other than your asset types can ever implement Asset because there will be no way to construct a Serializable outside of your framework;

// PublicPrivateFunctionality framework

import Foundation

public class SerializationWitnessTable {
    internal var encodedEndResult: String
    init(encodedEndResult: String) {
        self.encodedEndResult = encodedEndResult
    }
}

public protocol Asset {
    var value: Int { get }
    var serialization: SerializationWitnessTable { get }
}

public struct AssetVariantA: Asset {
    public var value: Int
    public init(value: Int) {
        self.value = value
    }
    
    public var serialization: SerializationWitnessTable {
        .init(encodedEndResult: "AssetVariantA-\(self.value)")
    }
}

public struct AssetVariantB: Asset {
    public var value: Int
    public init(value: Int) {
        self.value = value
    }
    public var serialization: SerializationWitnessTable {
        .init(encodedEndResult: "AssetVariantB-\(self.value)")
    }
}


public class Document {
    var assets: [Asset]
    
    public init(assets: [Asset]) {
        self.assets = assets
    }
    
    public func encode() -> [String] {
        assets.map({ $0.serialization.encodedEndResult })
    }
}

// Client code
import Foundation
import PublicPrivateFunctionality

func use() {
    let document = Document(assets: [
        AssetVariantA(value: 1),
        AssetVariantA(value: 2),
        AssetVariantB(value: 3)
    ])
    
    print(document.encode())
}

You can be sure now that no instance of your heavy structs are unnecessarily copied;