String interpolation revamp


(Brent Royal-Gordon) #1

I’ll draft a post and send it to you to make sure I’ve correctly summarized the issues with my Swift 4-era proposal.


String interpolation revamp: design decisions
(Brent Royal-Gordon) #2

So, let’s kick this discussion off!

I’ve been working on a redesign of Swift’s ExpressibleByStringInterpolation type. As a refresher, when you write code like this in Swift 4:

let foo: MyString = "foo \(bar) baz"

We currently generate code like this:

let foo = MyString(stringInterpolation: 
	MyString(stringInterpolationSegment: "foo " as String),
	MyString(stringInterpolationSegment: bar),
	MyString(stringInterpolationSegment: " baz" as String)
)

Besides various shortcomings in its feature set, this design has a performance problem: It creates a lot of intermediate strings. The above code, for instance, generates two Strings (for the two literal segments), then three MyStrings (one for each segment), and finally concatenates them into one larger MyString. Depending on the implementation of MyString, this means it generates somewhere between two and six reference-counted heap objects which are purely temporary and are discarded as soon as the string interpolation is finished. (Plus there’s the Array the varargs are passed in!) This may be a problem in performance-sensitive code which constructs interpolated strings; for example, you might want to avoid using interpolated strings for logging.

I wrote a proposal in 2017 for a new ExpressibleByStringInterpolation design, but I wasn’t aware of this performance issue at the time. The proposed design is sort of a mixed bag: On the one hand, it replaces the instances created by MyString(stringInterpolationSegment:) with instances of a standard library enum which never creates new object instances; on the other hand, it requires interpolated segments to pass through a String initializer for most conforming types. In the example above, we would probably generate three String instances (two for the literals, one for the interpolation), but only one MyString (the final value).

So, how can we redesign string interpolation to generate fewer temporary Strings? I don’t have any bright ideas here; I’m hoping some of you will.


#3

Could the variadic parameter be replaced by a custom Sequence/Collection type that stores a single String, indices indicating the interpolation points, and the interpolated objects? Then you can produce Substrings, etc on demand when iterated. I don’t know what the performance impact would be, though. What is the source compatibility requirement here?


String interpolation revamp: design decisions
(Brent Royal-Gordon) #4

ExpressibleByStringInterpolation is currently deprecated with a message saying that it will be redesigned, so we give it any design we want, as long as it doesn’t break string interpolation use sites and it’s locked down by Swift 5.


(Michael Ilseman) #5

There’s a chance for some generally useful additions to String as a part of this, as well as a chance to hand-roll something specifically for interpolation.

There’s the general need to be able to efficiently format data directly into a String’s excess capacity. E.g., if formatting a floating point number, we want to reserve some excess capacity and write bytes directly into that capacity, updating the code-unit count when we’re done. I think Array does this kind of dance in its usage of _adoptStorage but I don’t know the details.

Alternatively, we could form a kind of twine on the stack composed of literals and to-be-interpolated expressions, which we pass to the consumer and String’s internals would use an efficient means to construct a String. If this can be efficient with autoclosures, it could solve the lazy vs eager debate in the proposal. You could think of this as swapping the current situation, which is a call tree, with a single call that’s passed the tree as an argument.

We could generalize this to a StringBuilder-like type, which will try to use excess capacity when possible, but avert grow-and-copy until the end by linking a new reference. That’s probably future work as such a builder type would be burdened with guaranteeing lifetimes.


(Brent Royal-Gordon) #6

Update on the topic: In discussions last night, I realized that I had misunderstood what’s needed here. Literal segments are not actually the main problem; the string internals redesign is already going to avoid new allocations for Strings backed by literals. The real problem is interpolated values—we don’t want them to have to fill a buffer with characters and then pass it off to something that needs to copy them into a different buffer.


We already sort of have a version of this with the TextOutputStreamable protocol; it has a single method, write<Target: TextOutputStream>(to: inout Target), which tells a type to convert itself into characters. The problem with this is that TextOutputStream itself only has a write(_: String) method, so you still end up constructing Strings. You could maybe use repeated write(_:) calls with small strings and literals to carefully construct the full value without allocations, but this would be pretty finicky. Perhaps we should instead augment TextOutputStream with calls to reserve additional capacity, append Unicode scalars, etc. We could probably provide default implementations on TextOutputStream for existing/low-efficiency clients.

Interpolated values would then—at least for “ordinary” conforming types; this could be controlled by an associated type—be boxed as TextOutputStreamable existentials before being passed to the conforming type. If I understand this boxing correctly, it would be efficient for String and many other small types, but might require a container to be allocated on the heap for larger types. This value would then be put into an enum to mark it as interpolated or literal. The current proposed formatting mechanism wouldn’t work because we don’t have factory initializers on protocol existential types, but I could imagine using, for instance, ad-hoc overloading of format(...) free functions to do the same thing.

Does this sound like it would improve performance enough? Does anyone have another suggestion?


(Ben Rimmington) #7

This is mentioned in Unmanaged Strings, but is it definitely planned for Swift 5? Otherwise, the literal segments could be static strings instead. I was thinking of how to improve StaticString for Swift 5, but it would be great if String could fulfill these requirements.

The package manager has an OutputByteStream class, and various other types for formatted output.