I'll draft a post and send it to you to make sure I've correctly summarized the issues with my Swift 4-era proposal.
So, let's kick this discussion off!
I've been working on a redesign of Swift's ExpressibleByStringInterpolation
type. As a refresher, when you write code like this in Swift 4:
let foo: MyString = "foo \(bar) baz"
We currently generate code like this:
let foo = MyString(stringInterpolation:
MyString(stringInterpolationSegment: "foo " as String),
MyString(stringInterpolationSegment: bar),
MyString(stringInterpolationSegment: " baz" as String)
)
Besides various shortcomings in its feature set, this design has a performance problem: It creates a lot of intermediate strings. The above code, for instance, generates two String
s (for the two literal segments), then three MyString
s (one for each segment), and finally concatenates them into one larger MyString
. Depending on the implementation of MyString
, this means it generates somewhere between two and six reference-counted heap objects which are purely temporary and are discarded as soon as the string interpolation is finished. (Plus there's the Array
the varargs are passed in!) This may be a problem in performance-sensitive code which constructs interpolated strings; for example, you might want to avoid using interpolated strings for logging.
I wrote a proposal in 2017 for a new ExpressibleByStringInterpolation
design, but I wasn't aware of this performance issue at the time. The proposed design is sort of a mixed bag: On the one hand, it replaces the instances created by MyString(stringInterpolationSegment:)
with instances of a standard library enum which never creates new object instances; on the other hand, it requires interpolated segments to pass through a String
initializer for most conforming types. In the example above, we would probably generate three String
instances (two for the literals, one for the interpolation), but only one MyString
(the final value).
So, how can we redesign string interpolation to generate fewer temporary String
s? I don't have any bright ideas here; I'm hoping some of you will.
Could the variadic parameter be replaced by a custom Sequence/Collection type that stores a single String, indices indicating the interpolation points, and the interpolated objects? Then you can produce Substrings, etc on demand when iterated. I don't know what the performance impact would be, though. What is the source compatibility requirement here?
ExpressibleByStringInterpolation
is currently deprecated with a message saying that it will be redesigned, so we give it any design we want, as long as it doesn't break string interpolation use sites and it's locked down by Swift 5.
There's a chance for some generally useful additions to String as a part of this, as well as a chance to hand-roll something specifically for interpolation.
There's the general need to be able to efficiently format data directly into a String's excess capacity. E.g., if formatting a floating point number, we want to reserve some excess capacity and write bytes directly into that capacity, updating the code-unit count when we're done. I think Array does this kind of dance in its usage of _adoptStorage
but I don't know the details.
Alternatively, we could form a kind of twine on the stack composed of literals and to-be-interpolated expressions, which we pass to the consumer and String's internals would use an efficient means to construct a String. If this can be efficient with autoclosures, it could solve the lazy vs eager debate in the proposal. You could think of this as swapping the current situation, which is a call tree, with a single call that's passed the tree as an argument.
We could generalize this to a StringBuilder-like type, which will try to use excess capacity when possible, but avert grow-and-copy until the end by linking a new reference. That's probably future work as such a builder type would be burdened with guaranteeing lifetimes.
Update on the topic: In discussions last night, I realized that I had misunderstood what's needed here. Literal segments are not actually the main problem; the string internals redesign is already going to avoid new allocations for String
s backed by literals. The real problem is interpolated values—we don't want them to have to fill a buffer with characters and then pass it off to something that needs to copy them into a different buffer.
We already sort of have a version of this with the TextOutputStreamable
protocol; it has a single method, write<Target: TextOutputStream>(to: inout Target)
, which tells a type to convert itself into characters. The problem with this is that TextOutputStream
itself only has a write(_: String)
method, so you still end up constructing String
s. You could maybe use repeated write(_:)
calls with small strings and literals to carefully construct the full value without allocations, but this would be pretty finicky. Perhaps we should instead augment TextOutputStream
with calls to reserve additional capacity, append Unicode scalars, etc. We could probably provide default implementations on TextOutputStream
for existing/low-efficiency clients.
Interpolated values would then—at least for "ordinary" conforming types; this could be controlled by an associated type—be boxed as TextOutputStreamable
existentials before being passed to the conforming type. If I understand this boxing correctly, it would be efficient for String
and many other small types, but might require a container to be allocated on the heap for larger types. This value would then be put into an enum to mark it as interpolated or literal. The current proposed formatting mechanism wouldn't work because we don't have factory initializers on protocol existential types, but I could imagine using, for instance, ad-hoc overloading of format(...)
free functions to do the same thing.
Does this sound like it would improve performance enough? Does anyone have another suggestion?
This is mentioned in Unmanaged Strings, but is it definitely planned for Swift 5? Otherwise, the literal segments could be static strings instead. I was thinking of how to improve StaticString
for Swift 5, but it would be great if String
could fulfill these requirements.
The package manager has an OutputByteStream class, and various other types for formatted output.