I've been continuing work on redesigning string interpolation with four goals in mind:
-
Freezing for ABI stability: String interpolation is definitely ABI, and the current design has been known to be deficient at least since Swift 3. We really want to lock something down soon.
-
Efficiency: The current design generates a lot of temporaries and manages buffers very naïvely. The new one pre-allocates storage based on an estimate of the literal's size and encourages types to write themselves into the buffer using
TextOutputStreamable
instead of generating temporary strings. -
Preserving information: The current design provides no (supported) way for the constructed instance to know which parts are literal and which are interpolated. Although
String
doesn't care, many types want to treat them differently. The new design calls different methods for each one. -
New features: Ideally, I'd like to permit parameters on interpolations (for instance, allowing
\(myInt, radix: 16)
) and support compile-time type safety and efficiency through overloads (so that a type can declare that only certain types are valid as interpolations), but these are stretch goals.
Currently, I'm testing various standard library prototypes and hand-writing the code that would eventually be generated. There's an important design decision we need to make, and I need some input on it and possibly a final decision from the folks who make final decisions on the standard library. I've also called out several other questions where I could use some input.
Quick primer on string interpolation
The Swift compiler divides each interpolated string into segments, and each segment is processed separately. For example, this string has three segments:
"foo \(bar) baz"
1~~~ 2~~ 3~~~
Each segment is either literal (meaning the exact text is in the source code) or interpolated (meaning it's an expression which needs to be evaluated). In the previous example, segments 1 and 3 are literal, while segment 2 is interpolated.
An interpolated string is built up by appending each segment to it in turn. This is faster than, say, inserting the interpolated segments into a string containing all the literal content, which would require you to move some of the literal content each time.
The standard library currently provides an _ExpressibleByStringInterpolation
protocol for string interpolation, but it is inefficient and feature-poor, and has been deprecated since Swift 3.
Proposed design, in a nutshell
Suppose you're writing an expression like this:
let myStr = "foo \(bar) baz" as MyString
I'm proposing that the compiler convert that into code along the lines of this:
let myStr = MyString(stringInterpolationWithEstimatedSize: 9, builtBy: { buffer in
buffer.appendLiteralSegment("foo ")
buffer.appendInterpolatedSegment(bar)
buffer.appendLiteralSegment(" baz")
})
(Note: The names are not final, and we're considering a couple different ways to do the size estimate. The important parts are that there is a size estimate and that there's a closure containing code which appends each segment to the buffer.)
Types like MyString
opt into string interpolation by conforming to an ExpressibleByStringInterpolation
protocol which looks like this:
protocol ExpressibleByStringInterpolation: ExpressibleByStringLiteral {
/// A type used to collect the segments of a string interpolation.
asssociatedtype StringInterpolationBuffer: StringInterpolationBufferProtocol
where StringInterpolationBuffer.StringLiteralType == StringLiteralType
/// Creates an instance from an interpolated string literal.
///
/// - Parameter estimatedCodeUnits: A conservative estimate of the number
/// of code units in the final string. This will be at least as large as the number of
/// code units in all literal segments, but may be larger.
/// - Parameter appendSegments: A compiler-generated function which appends
/// the segments of this string literal to the `StringInterpolationBuffer`. Can throw
/// if any of the interpolated segments can throw.
init(
stringInterpolationWithEstimatedSize estimatedCodeUnits: Int,
builtBy appendSegments: (inout StringInterpolationBuffer) throws -> Void
) rethrows
}
StringInterpolationBufferProtocol
looks something like this:
/// Conforming types are used to collect the segments of a string literal, helping to
/// build up an instance of an `ExpressibleByStringInterpolation`-conforming type.
///
/// The mechanism used to communicate the segments to the
/// `ExpressibleByStringInterpolation` initializer is not part of this protocol. Typically, the
/// methods in this protocol will add the segments to some property known by the
/// initializer.
protocol StringInterpolationBufferProtocol {
associatedtype StringLiteralType: _BuiltinExpressibleByStringLiteral
/// Called to provide a literal segment to the string interpolation. For instance, in
/// `"foo \(bar) baz"`, this method would be called with `"foo "` and `" baz"`.
mutating func appendLiteralSegment(_ literal: StringLiteralType)
/// Called to provide a literal segment to the string interpolation. For instance, in
/// `"foo \(bar) baz"`, this method would be called with `bar`.
mutating func appendInterpolatedSegment<T>(_ value: T)
}
The buffer type is separate so that it can cheat for speed, and also so that it can represent partially-initialized states of types without allowing all instances everywhere to have those states. It also partitions off the append…Segment(_:)
methods so people don't accidentally use them directly.
Benchmark details
I'll be talking about some benchmark results below. In the benchmarks, I use three examples adapted from the Swift benchmark suite. I didn't write them, but from what I can tell:
- "StringInterpolation" has two long interpolated segments and two long literal segments. It's meant to be a "typical" case.
- "StringInterpolationSmall" has two short interpolated segments and two short literal segments. It may be especially sensitive to small string optimizations.
- "StringInterpolationManySmallSegments" has eight interpolated segments with one- or two-character literal segments between them. It may be especially sensitive to poor size estimates and conservative buffer resizing.
All of them use only ASCII characters. (Note: The benchmark harness actually includes a fourth example called "Basic", but I've omitted its results. It's designed to check the correctness of the prototypes, not their speed.)
In all of the result tables, 100 is the time taken by the current string interpolation implementation. Smaller is faster than the current design; larger is slower. Note that 100 is the string interpolation speed on the master branch; previous optimization work on String
operations in general has already made interpolation significantly faster than in the stable release.
Remaining issues
The big question: How should we interpolate?
The main decision we still need to make is about the appendInterpolatedSegment(_:)
method. The code above shows it formally declared in StringInterpolationBufferProtocol
as taking an unconstrained generic parameter, but we might want to make it informal instead so that it can be overloaded to support multiple implementations with different types. Here are the options as I see them:
1. Formal requirement for an unconstrained generic implementation which is always used
As shown above: There's an appendInterpolatedSegment<T>(_: T)
method in the protocol, and you have to implement it. Clients of this type will always use the generic method, even if you try to overload it. If you want to interpolate different types in different ways, either for efficiency or formatting, you need to branch based on the type at runtime.
The first benefit from this is cleanliness: the protocol formally states all of its requirements in a way the compiler can verify. Where possible, we prefer doing this for a variety of good reasons—it keeps the overload set small, it helps with tooling, etc.
The second benefit is generic use: If a generic type is constrained to be ExpressibleByStringInterpolation
, that is all that's necessary to construct instances of it from string interpolations. If appendInterpolatedSegment(_:)
was informal, you would need to refine that conformance in some way to make any actual interpolations available in a generic context.
The third benefit is non-extensibility: There is only one implementation of string interpolation for a given type, and there is no means to override it. This means that constructing an instance through string interpolation will always work the same, everywhere. Its behavior will never depend on extensions in other files in your project, or in libraries you've imported.
I may be making a poor case for this approach. If you think I'm underselling this option or explaining it poorly, please chime in.
2. Use overloads instead of a formal requirement
In this design, StringInterpolationBufferProtocol
does not contain any interpolated segment requirement at all. Instead, the compiler generates a call to a method named appendInterpolatedSegment
and then selects an implementation from the overloads available at that point in the source code.
This design is a little weird, but we did it for dynamic member lookup, which faced similar problems. It works poorly in generic contexts, where you would need to constrain to a refined version of StringInterpolationBufferProtocol
with specific requirements for the types you want to support, but where the type is statically known, it has several benefits.
The first benefit from this is that a type can limit which types are allowed to interpolate into it, and can enforce that limit at compile time. For example, you could write a SQLStatement
type which only allowed Int
, Double
, String
, Data
, and SQLStatement
to be interpolated, and which escaped each one appropriately (or bound them to placeholders so no escaping was necessary). With an unconstrained generic approach, you could only reject invalid types at runtime; with the overload-based approach, these rejections would happen at compile time instead.
The second benefit is that we could allow additional parameters or parameter labels; the parentheses after the backslash would just be treated as a parameter list instead of a tuple. For instance, the standard library String
type could allow a radix:
parameter to be placed after an interpolated integer. On the server, an HTML
type which ampersand-escaped ordinary String
s interpolated into it (but not other HTML
instances) could allow you to say \(raw: codeFragment)
to avoid escaping, or \(url: parameterValue)
to percent-encode a particular string. This ability opens up vast design spaces, and it's hard to predict exactly what people would build with it.
(However, it's not the end of the world if the parameter feature isn't included; it's just a syntax, and there are other syntaxes we could use instead, perhaps involving operators or free functions. A design for "formatters" would be a good addition to the language.)
The third benefit is speed. Because overloads are selected at compile time, you can make sure each type is appended in the fastest way possible—for instance, by using TextOutputStreamable
to make the value write itself into the stream buffer piecemeal—without any runtime overhead. When I benchmarked prototypes of both, option 1 was always about 30–40 slower than option 2, while option 2 was either a little slower or faster than the current design (benchmark harness, prototype branch):
Sample/Design | Option 2 | Option 1 |
---|---|---|
StringInterpolation | 110 | 139 |
StringInterpolationSmall | 125 | 166 |
StringInterpolationManySmallSegments | 66 | 102 |
As far as I can tell, option 1 is significantly slower than the current design because it gives up some of the optimizations already used in stable Swift, like using TextOutputStreamable
and CustomStringConvertible
directly when we statically know they're available. Future optimizer work might be able to speed up the unconstrained design, but I have my doubts about this, because evaluating option 1's type branches at compile time would bake in facts about conformances that could change as libraries evolve. And besides, why adopt a slower design and hope it can be made fast when a faster alternative is available already?
3. Have a formal requirement, but use overloads
The third option is to split the difference: Include a formal requirement for an unconstrained generic implementation, but consider all overloads visible at the site of a given interpolation, not just the one that fulfills the protocol requirement. Types would be required to provide an unconstrained generic implementation, but types could overload it with faster implementations as long as they had the same semantics. Conforming types which wanted to control what was interpolated into them could deprecate the unconstrained generic implementation (or possibly even mark it unavailable
) so that you'd get a compiler diagnostic if you used it outside a generic context.
This hybrid design effectively combines benefits 1 and 2 (cleanliness and generic use) from the first option with benefits 1 and 3 (interpolation type safety and speed) from the second option. We might even be able to fit in parameter labels and lists, too. That's not too bad for splitting the baby.
It is also binary-compatible with option 1: If Swift 5 ships with a formal requirement for an unconstrained generic which is always used, Swift 6 can loosen the "always used" part and start looking for overloads. There shouldn't be any existing overloads for the compiler to find, so this would usually be source-compatible too.
The main downside of this approach is that it doesn't forbid extensibility, and that makes the unconstrained generic implementation's job impossible in theory, since it can't know which overloads might have been added by other modules. In types which don't want to actually use it, the unconstrained generic implementation is also rarely-used boilerplate, and I could easily imagine lazy developers implementing it with fatalError()
.
The current design works like this—the protocol formally requires an init<T>(stringInterpolationSegment: T)
initializer, but the compiler will use any visible overloads of that initializer, and you can deprecate it or mark it unavailable if you want to limit the types you support.
3a. Have a formal requirement, but use the overloads from the original module
The same as 3, but the only overloads considered are the ones declared in the module which created the StringInterpolationBufferProtocol
conformance. Overloads from extensions in other modules would be ignored.
This means that the author of the appendInterpolatedSegment<T>(_: T)
method could actually know all of the available overloads and emulate them correctly. However, I don't know if how difficult it'd be to convince the compiler's internals to limit overload resolution like this, and it's kind of a weird behavior. Like option 3, 3a is binary- and mostly source-compatible with option 1.
Other interpolation options
I previously explored using an associated type for interpolated segments, but forcing every interpolated segment to convert to a common type was pretty slow. There may be other options I haven't considered.
Open questions:
- Are there other plausible designs?
- Can the non-overloaded design be optimized enough to be as fast as overloading?
- Which design should we select?
- If we don't adopt a design which permits extra parameters, what sorts of alternatives should we consider for formatting in strings?
Calculating and communicating size estimates
Preallocating storage for interpolated strings is a huge, important optimization. However, the size of interpolated values can't be known ahead of time, so we have to estimate the final size. The simplest estimate is probably of the form literalCodeUnits + numberOfInterpolations * codeUnitsPerInterpolation
, where codeUnitsPerInterpolation
is a constant we will need to select. To ensure my benchmark results were conservative, I used codeUnitsPerInterpolation = 1
, which is a huge underestimate in all three examples.
To test the impact a better estimate would have on performance, I manually modified the "StringInterpolationManySmallSegments" example. This example normally uses an estimate of 17, but the final size is actually 66, so the buffer needs to be grown a couple of times. For this test, I manually tried various estimates:
Estimated size | Option 2 |
---|---|
0 | 101 |
8 | 113 |
17 | 66 |
32 | 65 |
64 | 53 |
128 | 43 |
256 | 44 |
As you can see, a large-enough estimate can change performance from similar to the status quo to more than twice as fast. But if the estimate is too large, you'll start to see slightly degraded performance—and more importantly, you'll see wasted memory. We'll need to evaluate many different examples and select an appropriate value for codeUnitsPerInterpolation
, balancing speed and wasted memory.
Fortunately, the value of codeUnitsPerInterpolation
is not ABI and can change whenever we want, so we can tune it during the beta and in future versions of Swift. In theory, we could use different values for different platforms or optimization options, or even do some PGO-style tuning for each application.
So far, all of the designs in this document and prototypes I've benchmarked have used a single "estimated size" parameter. That would mean that codeUnitsPerInterpolation
would be a constant in the compiler. However, in the course of writing this document, I've actually decided this is the wrong design—the compiler should instead pass separate literalCodeUnits
and numberOfInterpolations
values to the ExpressibleByStringInterpolation
initializer and allow it to do the calculation. This means that String
's estimation algorithm will be part of the standard library, and other types can use their own estimation algorithms or even preallocate more complicated internal data structures.
I haven't yet tested the performance impact of adding an extra parameter, but I don't expect it to be large.
Open questions:
- Should
codeUnitsPerInterpolation
's value be inlined into clients (allowing constant folding to compute the final value at compile time) or made into a@usableFromInline
constant (allowing existing applications to be re-tuned by newer standard library versions)?
Implementing String.StringInterpolationBuffer
The other important factor in efficiency is how String.StringInterpolationBuffer
stores the code units it accumulates. We would like to freeze an implementation if we can—in my testing, even making the type partially resilient makes every benchmark 5-20 points slower.
I've been exploring three different implementations:
- "String": Uses a plain
String
with areserveCapacity(_:)
call at the beginning to grow it to the estimated size. In the previous benchmarks, "Option 2" was a slightly less optimized version of this implementation. - "SS": Uses the internal
_UTF16StringStorage
type to try to avoid a few layers of overhead and indirection. - "BP": Uses an
UnsafeMutableBufferPointer<UTF16.CodeUnit>
directly, which forces an extra copy but might really save on overhead.
For the "SS" and "BP" designs, I'm also evaluating a few different growth strategies:
- "Pow": When the buffer is too small, moves the code units to a new buffer as large as the current code units plus the new code units, rounded up to the nearest power of two.
- "Est": Stores the initial estimate in a property, subtracting from it as content is appended. Resizes buffers in the same way as "Pow", except the remaining estimate is added to the current code units. This could sometimes avoid an unnecessary reallocation, at the cost of remembering and updating the estimate.
- "Agg": The same as "Pow", except that after rounding up, the capacity is doubled again. This is used only with the "BP" design, which always requires an extra copy anyway, so the large amount of memory it may waste is only temporary.
(I also did some informal testing with growth patterns similar to Array
's "doubled or minimum necessary", but each example did slightly worse that way. I've omitted those numbers to avoid clutter.)
Here's the results of benchmarking those six implementations (benchmark harness, prototype implementations):
Sample/Implementation | String | SS/Pow | SS/Est | BP/Agg | BP/Pow | BP/Est |
---|---|---|---|---|---|---|
StringInterpolation | 106 | 67 | 68 | 206 | 227 | 228 |
StringInterpolationSmall | 115 | 128 | 129 | 593 | 593 | 605 |
StringInterpolationManySmallSegments | 59 | 88 | 90 | 120 | 125 | 128 |
Observations:
- "String" gets excellent marks on "StringInterpolationManySmallSegments" (not sure why) but is otherwise slightly worse than the status quo. None of these examples include non-ASCII characters, but if they did, I would expect "String" to perform worse than the others—it's the only implementation which switches between 8-bit and 16-bit storage.
- "SS/Pow" performs poorly on "StringInterpolationSmall" (not surprising since it doesn't get small-string optimizations), but noticeably outperforms the current design on the other two benchmarks.
- "SS/Est" has a small but measurable performance penalty compared to "SS/Pow". I'm also concerned that it may cause weird inlining bugs unless
codeUnitsPerInterpolation
is made@usableFromInline
. - "BP" in all forms varies from "pretty bad" to "atrocious". The extra copy seems to overwhelm any possible advantage in these examples; it might work better in very large strings, but we're not seeing that here.
Based on these results, I think we should adopt either the "SS/Pow" or the "String" implementation, depending on how common we think "StringInterpolationSmall"-like cases will be.
Note that whatever implementation we adopt, we can probably squeeze more speed out of it—my prototypes are pretty straightforward.
Open questions:
- Why does "String" handle "StringInterpolationManySmallSegments" so well?
- How common are small interpolated strings which might perform poorly with the "SS" implementation?
- Are there any boneheaded mistakes in the implementations which might radically change their performance?
- Which implementation should we adopt?
Compiler implementation
I haven't started working on the code to generate these calls.
I think the code for these calls will be relatively easy to generate; it avoids a major pitfall which made a previous attempt difficult. However, I may need to generate the semanticExpr
for the interpolation much earlier, which could break anything between parsing and constraint application; I don't know what the ramifications of that will be.
One problem I anticipate is implicit closure parameters being used in interpolated segments; since we're planning to wrap these expressions in a generated closure, they won't see the outer closure's $n
variables. (And because the generated closure takes a parameter, I don't think it can be an autoclosure.) I'm not sure how to solve this—maybe some kind of flag on the generated closure to say it's transparent to the search for $n
variables?
Open questions:
- How much will we break if we generate
semanticExpr
s early? - How can we solve the
$n
variable problem? - Are there any other obvious pitfalls that I haven't noticed?