Add escape hatch to treat Substring as a String without copying

Working with Substring in Swift 4, I've found myself needing to convert to String frequently just to call APIs that would otherwise work fine with substrings. This is of course a performance regression because converting to String copies the backing buffer, whereas in Swift 2.x because String acted as its own substring, calling these APIs didn't require copying anything.

In an ideal world, every API that takes a string but doesn't hold onto it long-term would actually take a <S: StringProtocol> (or maybe even just take a Substring and have the compiler automatically cast String into Substring). But practically speaking that won't happen. Heck, even String.append(_:) takes a String as input, so I can't even append a substring to a string without copying it first!

To that end, I'd love to see an escape hatch that lets me treat a Substring as a String without copying it. Basically a way to get limited-scope Swift 2.x behavior back. This might look something like

substr.withString({ result.append($0) })

Obviously I could leak the String out of the block, but that's a problem with all of our existing block-scoped .withFoo methods already. In any case, this escape hatch, while not ideal, would allow me to call APIs that I know would work just fine with a substring without incurring a string copy.

Incidentally, Substring's implementation of TextOutputStreamable copies the string even though the whole point of that protocol is to avoid unnecessary string construction :man_facepalming:t2:.

I filed a ticket about this (SR-6949) because it's a performance regression, and am posting here because it's an API addition.

2 Likes

Another example I just ran into where the "ideal world" solution doesn't even work is subscripting a dictionary of type [String: Any]. This type is obviously correct, and the subscript getter really shouldn't require copying the key.

In theory if we had parameterized protocols we could get around this by having a protocol EquivalentHashable<T: Hashable> protocol where one type can declare that it can be compared against another type and guarantees it has the same equality and hashValue, but besides the fact that we don't have parameterized protocols, this is also a bit hacky and really only would ever be used with String/Substring.

I'm actually working on carving out room in String's ABI right now that would allow this kind of thing. There are other tradeoffs and the ever-pesky pigeonhole principle that dictates some constraints, though. The ABI (if I can carve this out in time) would allow two ways of constructing a String that shares storage. Note that this is ABI, not API, so I'm just talking about what is physically possible without any particular spelling or API design in mind.

  1. An opaque-string-like entity. Forming this in order to share storage would require allocating a class instance, but would not involve copying the underlying String data, which may be very large. Additionally, reads would call through a witness table unless de-virtualized. This would retain and track the outer String to safely manage the lifetime of the storage.

  2. As an unmanaged-string-like entity. This could manifest as a block-based API similar to your example, but with suitably scary language denoting that the block should not try to persist the String, similar to Array's withUnsafe methods. In the future, this could be prettified with the ownership model to give better and safer APIs. This does not retain or track the outer String, and thus shouldn't be used in a situation where it might outlive the outer String.

Yup, this is a separate issue that needs to be fixed.

edit: Could you file SRs for the Dictionary and the TextOutputStreamable issues?

1 Like

Filed as SR-6951 and SR-6950.

1 Like