String interpolation revamp: design decisions

beccadax · May 13, 2018, 8:34am

I've been continuing work on redesigning string interpolation with four goals in mind:

Freezing for ABI stability: String interpolation is definitely ABI, and the current design has been known to be deficient at least since Swift 3. We really want to lock something down soon.
Efficiency: The current design generates a lot of temporaries and manages buffers very naïvely. The new one pre-allocates storage based on an estimate of the literal's size and encourages types to write themselves into the buffer using TextOutputStreamable instead of generating temporary strings.
Preserving information: The current design provides no (supported) way for the constructed instance to know which parts are literal and which are interpolated. Although String doesn't care, many types want to treat them differently. The new design calls different methods for each one.
New features: Ideally, I'd like to permit parameters on interpolations (for instance, allowing \(myInt, radix: 16)) and support compile-time type safety and efficiency through overloads (so that a type can declare that only certain types are valid as interpolations), but these are stretch goals.

Currently, I'm testing various standard library prototypes and hand-writing the code that would eventually be generated. There's an important design decision we need to make, and I need some input on it and possibly a final decision from the folks who make final decisions on the standard library. I've also called out several other questions where I could use some input.

Quick primer on string interpolation

The Swift compiler divides each interpolated string into segments, and each segment is processed separately. For example, this string has three segments:

"foo \(bar) baz"
 1~~~  2~~ 3~~~

Each segment is either literal (meaning the exact text is in the source code) or interpolated (meaning it's an expression which needs to be evaluated). In the previous example, segments 1 and 3 are literal, while segment 2 is interpolated.

An interpolated string is built up by appending each segment to it in turn. This is faster than, say, inserting the interpolated segments into a string containing all the literal content, which would require you to move some of the literal content each time.

The standard library currently provides an _ExpressibleByStringInterpolation protocol for string interpolation, but it is inefficient and feature-poor, and has been deprecated since Swift 3.

Proposed design, in a nutshell

Suppose you're writing an expression like this:

let myStr = "foo \(bar) baz" as MyString

I'm proposing that the compiler convert that into code along the lines of this:

let myStr = MyString(stringInterpolationWithEstimatedSize: 9, builtBy: { buffer in 
    buffer.appendLiteralSegment("foo ")
    buffer.appendInterpolatedSegment(bar)
    buffer.appendLiteralSegment(" baz")
})

(Note: The names are not final, and we're considering a couple different ways to do the size estimate. The important parts are that there is a size estimate and that there's a closure containing code which appends each segment to the buffer.)

Types like MyString opt into string interpolation by conforming to an ExpressibleByStringInterpolation protocol which looks like this:

protocol ExpressibleByStringInterpolation: ExpressibleByStringLiteral {
  /// A type used to collect the segments of a string interpolation.
  asssociatedtype StringInterpolationBuffer: StringInterpolationBufferProtocol
    where StringInterpolationBuffer.StringLiteralType == StringLiteralType
  
  /// Creates an instance from an interpolated string literal.
  /// 
  /// - Parameter estimatedCodeUnits: A conservative estimate of the number 
  ///   of code units in the final string. This will be at least as large as the number of 
  ///   code units in all literal segments, but may be larger.
  /// - Parameter appendSegments: A compiler-generated function which appends 
  ///   the segments of this string literal to the `StringInterpolationBuffer`. Can throw 
  ///   if any of the interpolated segments can throw.
  init(
    stringInterpolationWithEstimatedSize estimatedCodeUnits: Int, 
    builtBy appendSegments: (inout StringInterpolationBuffer) throws -> Void
  ) rethrows
}

StringInterpolationBufferProtocol looks something like this:

/// Conforming types are used to collect the segments of a string literal, helping to 
/// build up an instance of an `ExpressibleByStringInterpolation`-conforming type.
/// 
/// The mechanism used to communicate the segments to the 
/// `ExpressibleByStringInterpolation` initializer is not part of this protocol. Typically, the 
/// methods in this protocol will add the segments to some property known by the 
/// initializer.
protocol StringInterpolationBufferProtocol {
  associatedtype StringLiteralType: _BuiltinExpressibleByStringLiteral
  
  /// Called to provide a literal segment to the string interpolation. For instance, in 
  /// `"foo \(bar) baz"`, this method would be called with `"foo "` and `" baz"`.
  mutating func appendLiteralSegment(_ literal: StringLiteralType)
  
  /// Called to provide a literal segment to the string interpolation. For instance, in 
  /// `"foo \(bar) baz"`, this method would be called with `bar`.
  mutating func appendInterpolatedSegment<T>(_ value: T)
}

The buffer type is separate so that it can cheat for speed, and also so that it can represent partially-initialized states of types without allowing all instances everywhere to have those states. It also partitions off the append…Segment(_:) methods so people don't accidentally use them directly.

Benchmark details

I'll be talking about some benchmark results below. In the benchmarks, I use three examples adapted from the Swift benchmark suite. I didn't write them, but from what I can tell:

"StringInterpolation" has two long interpolated segments and two long literal segments. It's meant to be a "typical" case.
"StringInterpolationSmall" has two short interpolated segments and two short literal segments. It may be especially sensitive to small string optimizations.
"StringInterpolationManySmallSegments" has eight interpolated segments with one- or two-character literal segments between them. It may be especially sensitive to poor size estimates and conservative buffer resizing.

All of them use only ASCII characters. (Note: The benchmark harness actually includes a fourth example called "Basic", but I've omitted its results. It's designed to check the correctness of the prototypes, not their speed.)

In all of the result tables, 100 is the time taken by the current string interpolation implementation. Smaller is faster than the current design; larger is slower. Note that 100 is the string interpolation speed on the master branch; previous optimization work on String operations in general has already made interpolation significantly faster than in the stable release.

Remaining issues

The big question: How should we interpolate?

The main decision we still need to make is about the appendInterpolatedSegment(_:) method. The code above shows it formally declared in StringInterpolationBufferProtocol as taking an unconstrained generic parameter, but we might want to make it informal instead so that it can be overloaded to support multiple implementations with different types. Here are the options as I see them:

1. Formal requirement for an unconstrained generic implementation which is always used

As shown above: There's an appendInterpolatedSegment<T>(_: T) method in the protocol, and you have to implement it. Clients of this type will always use the generic method, even if you try to overload it. If you want to interpolate different types in different ways, either for efficiency or formatting, you need to branch based on the type at runtime.

The first benefit from this is cleanliness: the protocol formally states all of its requirements in a way the compiler can verify. Where possible, we prefer doing this for a variety of good reasons—it keeps the overload set small, it helps with tooling, etc.

The second benefit is generic use: If a generic type is constrained to be ExpressibleByStringInterpolation, that is all that's necessary to construct instances of it from string interpolations. If appendInterpolatedSegment(_:) was informal, you would need to refine that conformance in some way to make any actual interpolations available in a generic context.

The third benefit is non-extensibility: There is only one implementation of string interpolation for a given type, and there is no means to override it. This means that constructing an instance through string interpolation will always work the same, everywhere. Its behavior will never depend on extensions in other files in your project, or in libraries you've imported.

I may be making a poor case for this approach. If you think I'm underselling this option or explaining it poorly, please chime in.

2. Use overloads instead of a formal requirement

In this design, StringInterpolationBufferProtocol does not contain any interpolated segment requirement at all. Instead, the compiler generates a call to a method named appendInterpolatedSegment and then selects an implementation from the overloads available at that point in the source code.

This design is a little weird, but we did it for dynamic member lookup, which faced similar problems. It works poorly in generic contexts, where you would need to constrain to a refined version of StringInterpolationBufferProtocol with specific requirements for the types you want to support, but where the type is statically known, it has several benefits.

The first benefit from this is that a type can limit which types are allowed to interpolate into it, and can enforce that limit at compile time. For example, you could write a SQLStatement type which only allowed Int, Double, String, Data, and SQLStatement to be interpolated, and which escaped each one appropriately (or bound them to placeholders so no escaping was necessary). With an unconstrained generic approach, you could only reject invalid types at runtime; with the overload-based approach, these rejections would happen at compile time instead.

The second benefit is that we could allow additional parameters or parameter labels; the parentheses after the backslash would just be treated as a parameter list instead of a tuple. For instance, the standard library String type could allow a radix: parameter to be placed after an interpolated integer. On the server, an HTML type which ampersand-escaped ordinary Strings interpolated into it (but not other HTML instances) could allow you to say \(raw: codeFragment) to avoid escaping, or \(url: parameterValue) to percent-encode a particular string. This ability opens up vast design spaces, and it's hard to predict exactly what people would build with it.

(However, it's not the end of the world if the parameter feature isn't included; it's just a syntax, and there are other syntaxes we could use instead, perhaps involving operators or free functions. A design for "formatters" would be a good addition to the language.)

The third benefit is speed. Because overloads are selected at compile time, you can make sure each type is appended in the fastest way possible—for instance, by using TextOutputStreamable to make the value write itself into the stream buffer piecemeal—without any runtime overhead. When I benchmarked prototypes of both, option 1 was always about 30–40 slower than option 2, while option 2 was either a little slower or faster than the current design (benchmark harness, prototype branch):

Sample/Design	Option 2	Option 1
StringInterpolation	110	139
StringInterpolationSmall	125	166
StringInterpolationManySmallSegments	66	102

As far as I can tell, option 1 is significantly slower than the current design because it gives up some of the optimizations already used in stable Swift, like using TextOutputStreamable and CustomStringConvertible directly when we statically know they're available. Future optimizer work might be able to speed up the unconstrained design, but I have my doubts about this, because evaluating option 1's type branches at compile time would bake in facts about conformances that could change as libraries evolve. And besides, why adopt a slower design and hope it can be made fast when a faster alternative is available already?

3. Have a formal requirement, but use overloads

The third option is to split the difference: Include a formal requirement for an unconstrained generic implementation, but consider all overloads visible at the site of a given interpolation, not just the one that fulfills the protocol requirement. Types would be required to provide an unconstrained generic implementation, but types could overload it with faster implementations as long as they had the same semantics. Conforming types which wanted to control what was interpolated into them could deprecate the unconstrained generic implementation (or possibly even mark it unavailable) so that you'd get a compiler diagnostic if you used it outside a generic context.

This hybrid design effectively combines benefits 1 and 2 (cleanliness and generic use) from the first option with benefits 1 and 3 (interpolation type safety and speed) from the second option. We might even be able to fit in parameter labels and lists, too. That's not too bad for splitting the baby.

It is also binary-compatible with option 1: If Swift 5 ships with a formal requirement for an unconstrained generic which is always used, Swift 6 can loosen the "always used" part and start looking for overloads. There shouldn't be any existing overloads for the compiler to find, so this would usually be source-compatible too.

The main downside of this approach is that it doesn't forbid extensibility, and that makes the unconstrained generic implementation's job impossible in theory, since it can't know which overloads might have been added by other modules. In types which don't want to actually use it, the unconstrained generic implementation is also rarely-used boilerplate, and I could easily imagine lazy developers implementing it with fatalError().

The current design works like this—the protocol formally requires an init<T>(stringInterpolationSegment: T) initializer, but the compiler will use any visible overloads of that initializer, and you can deprecate it or mark it unavailable if you want to limit the types you support.

3a. Have a formal requirement, but use the overloads from the original module

The same as 3, but the only overloads considered are the ones declared in the module which created the StringInterpolationBufferProtocol conformance. Overloads from extensions in other modules would be ignored.

This means that the author of the appendInterpolatedSegment<T>(_: T) method could actually know all of the available overloads and emulate them correctly. However, I don't know if how difficult it'd be to convince the compiler's internals to limit overload resolution like this, and it's kind of a weird behavior. Like option 3, 3a is binary- and mostly source-compatible with option 1.

Other interpolation options

I previously explored using an associated type for interpolated segments, but forcing every interpolated segment to convert to a common type was pretty slow. There may be other options I haven't considered.

Open questions:

Are there other plausible designs?
Can the non-overloaded design be optimized enough to be as fast as overloading?
Which design should we select?
If we don't adopt a design which permits extra parameters, what sorts of alternatives should we consider for formatting in strings?

Calculating and communicating size estimates

Preallocating storage for interpolated strings is a huge, important optimization. However, the size of interpolated values can't be known ahead of time, so we have to estimate the final size. The simplest estimate is probably of the form literalCodeUnits + numberOfInterpolations * codeUnitsPerInterpolation, where codeUnitsPerInterpolation is a constant we will need to select. To ensure my benchmark results were conservative, I used codeUnitsPerInterpolation = 1, which is a huge underestimate in all three examples.

To test the impact a better estimate would have on performance, I manually modified the "StringInterpolationManySmallSegments" example. This example normally uses an estimate of 17, but the final size is actually 66, so the buffer needs to be grown a couple of times. For this test, I manually tried various estimates:

Estimated size	Option 2
0	101
8	113
17	66
32	65
64	53
128	43
256	44

As you can see, a large-enough estimate can change performance from similar to the status quo to more than twice as fast. But if the estimate is too large, you'll start to see slightly degraded performance—and more importantly, you'll see wasted memory. We'll need to evaluate many different examples and select an appropriate value for codeUnitsPerInterpolation, balancing speed and wasted memory.

Fortunately, the value of codeUnitsPerInterpolation is not ABI and can change whenever we want, so we can tune it during the beta and in future versions of Swift. In theory, we could use different values for different platforms or optimization options, or even do some PGO-style tuning for each application.

So far, all of the designs in this document and prototypes I've benchmarked have used a single "estimated size" parameter. That would mean that codeUnitsPerInterpolation would be a constant in the compiler. However, in the course of writing this document, I've actually decided this is the wrong design—the compiler should instead pass separate literalCodeUnits and numberOfInterpolations values to the ExpressibleByStringInterpolation initializer and allow it to do the calculation. This means that String's estimation algorithm will be part of the standard library, and other types can use their own estimation algorithms or even preallocate more complicated internal data structures.

I haven't yet tested the performance impact of adding an extra parameter, but I don't expect it to be large.

Open questions:

Should codeUnitsPerInterpolation's value be inlined into clients (allowing constant folding to compute the final value at compile time) or made into a @usableFromInline constant (allowing existing applications to be re-tuned by newer standard library versions)?

Implementing `String.StringInterpolationBuffer`

The other important factor in efficiency is how String.StringInterpolationBuffer stores the code units it accumulates. We would like to freeze an implementation if we can—in my testing, even making the type partially resilient makes every benchmark 5-20 points slower.

I've been exploring three different implementations:

"String": Uses a plain String with a reserveCapacity(_:) call at the beginning to grow it to the estimated size. In the previous benchmarks, "Option 2" was a slightly less optimized version of this implementation.
"SS": Uses the internal _UTF16StringStorage type to try to avoid a few layers of overhead and indirection.
"BP": Uses an UnsafeMutableBufferPointer<UTF16.CodeUnit> directly, which forces an extra copy but might really save on overhead.

For the "SS" and "BP" designs, I'm also evaluating a few different growth strategies:

"Pow": When the buffer is too small, moves the code units to a new buffer as large as the current code units plus the new code units, rounded up to the nearest power of two.
"Est": Stores the initial estimate in a property, subtracting from it as content is appended. Resizes buffers in the same way as "Pow", except the remaining estimate is added to the current code units. This could sometimes avoid an unnecessary reallocation, at the cost of remembering and updating the estimate.
"Agg": The same as "Pow", except that after rounding up, the capacity is doubled again. This is used only with the "BP" design, which always requires an extra copy anyway, so the large amount of memory it may waste is only temporary.

(I also did some informal testing with growth patterns similar to Array's "doubled or minimum necessary", but each example did slightly worse that way. I've omitted those numbers to avoid clutter.)

Here's the results of benchmarking those six implementations (benchmark harness, prototype implementations):

Sample/Implementation	String	SS/Pow	SS/Est	BP/Agg	BP/Pow	BP/Est
StringInterpolation	106	67	68	206	227	228
StringInterpolationSmall	115	128	129	593	593	605
StringInterpolationManySmallSegments	59	88	90	120	125	128

Observations:

"String" gets excellent marks on "StringInterpolationManySmallSegments" (not sure why) but is otherwise slightly worse than the status quo. None of these examples include non-ASCII characters, but if they did, I would expect "String" to perform worse than the others—it's the only implementation which switches between 8-bit and 16-bit storage.
"SS/Pow" performs poorly on "StringInterpolationSmall" (not surprising since it doesn't get small-string optimizations), but noticeably outperforms the current design on the other two benchmarks.
"SS/Est" has a small but measurable performance penalty compared to "SS/Pow". I'm also concerned that it may cause weird inlining bugs unless codeUnitsPerInterpolation is made @usableFromInline.
"BP" in all forms varies from "pretty bad" to "atrocious". The extra copy seems to overwhelm any possible advantage in these examples; it might work better in very large strings, but we're not seeing that here.

Based on these results, I think we should adopt either the "SS/Pow" or the "String" implementation, depending on how common we think "StringInterpolationSmall"-like cases will be.

Note that whatever implementation we adopt, we can probably squeeze more speed out of it—my prototypes are pretty straightforward.

Open questions:

Why does "String" handle "StringInterpolationManySmallSegments" so well?
How common are small interpolated strings which might perform poorly with the "SS" implementation?
Are there any boneheaded mistakes in the implementations which might radically change their performance?
Which implementation should we adopt?

Compiler implementation

I haven't started working on the code to generate these calls.

I think the code for these calls will be relatively easy to generate; it avoids a major pitfall which made a previous attempt difficult. However, I may need to generate the semanticExpr for the interpolation much earlier, which could break anything between parsing and constraint application; I don't know what the ramifications of that will be.

One problem I anticipate is implicit closure parameters being used in interpolated segments; since we're planning to wrap these expressions in a generated closure, they won't see the outer closure's $n variables. (And because the generated closure takes a parameter, I don't think it can be an autoclosure.) I'm not sure how to solve this—maybe some kind of flag on the generated closure to say it's transparent to the search for $n variables?

Open questions:

How much will we break if we generate semanticExprs early?
How can we solve the $n variable problem?
Are there any other obvious pitfalls that I haven't noticed?

griotspeak · May 14, 2018, 6:26am

Based solely on this post, I think that SS/Power looks like a good choice. I'm assuming that, down the road, StringInterpolationSmall could be further optimized if needed.

Option 1 seems like a good option to me partially because 3 and 3a are still viable… but I still have difficulty thinking of what types, other than string and attributed string, will conform to this protocol.

George · May 14, 2018, 7:21pm

This is very exciting.

I haven't dug to deeply but would it be beneficial to the hybrid approach to add a second constraint on StringInterpolationBufferProtocol like so:

protocol StringInterpolationBufferProtocol {
      associatedtype StringLiteralType: _BuiltinExpressibleByStringLiteral
      associatedtype StringInterpolatedType
      
      mutating func appendLiteralSegment(_ literal: StringLiteralType)
      mutating func appendInterpolatedSegment(_ value: StringInterpolatedType)
}

There could be some sort of default that accepts any type for the common case, but one can also specify something like struct SQLInterpolatedValue which only has initializers for Int, String, Data, Date, etc. That way the interpolation becomes syntactic sugar like: "\(StringInterpolatedType(arguments))"

Not sure how this would affect the performance optimizations you are trying to achieve though.

beccadax · May 15, 2018, 2:01am

My original design looked a lot like that, but it ended up being much too slow. Some of the slowdowns were because of other parts of the design, but based on an issue I had to debug in my prototype (where interpolated values were all passed through an Any parameter), I suspect a design like the one you describe would be at least 50–60% slower.

The problem is that if StringInterpolationType is a protocol, then everything you interpolate has to be cast to that protocol's existential, which has several bad consequences:

Casting by itself adds overhead, especially if the instance doesn't fit into an existential buffer and has to be stored in a memory allocation on the heap.
All uses of the existential are virtual calls through the witness table, so using the value is much slower.
The optimizer doesn't seem to be able to eliminate any of this abstraction even when inlining might theoretically allow it. That is, you'd think that it could see what amounts to (myInt as CustomStringConvertible).description.write(to: &buffer) and convert it to myInt.description.write(to: &buffer), but it can't.

jrose · May 16, 2018, 12:37am

Thanks for continuing to push this forward, Brent!

I really want the design to support limits on what can be interpolated, so that it can be used for something like prepared SQL statements. I expect that a bunch of people also want it to do different things for different static types, say, for localized strings. Together those push me towards the static lookup strategy, with overloads, even if it can't be expressed in a protocol. (There is precedent now with the DynamicLookup feature.)

One thing I wish we had that this proposal doesn't cover is static type-safety in interpolations, so that "first: \(name), second: \(age)" could have an inferred type of InterpolatedThingy<String, Int>. Unfortunately, this would probably require variadic generics features to do well, and doesn't play at all with the builder proposal.

I'm pretty unhappy with estimatedCodeUnits, even if it were just a lower bound based on the literal segments and completely ignored the interpolated segments. It improves performance a lot, so I don't think you can take it out, but it's also a weird implementation detail that forces people to think about code units (UTF-8? UTF-16?) and still isn't enough to actually get the final size right.

It also seems like you could simplify the protocol quite a bit:

protocol ExpressibleByStringInterpolation: ExpressibleByStringLiteral {
  /// A type used to collect the segments of a string interpolation.
  asssociatedtype StringInterpolationBuffer: StringInterpolationBufferProtocol
    where StringInterpolationBuffer.Result == Self
}
protocol StringInterpolationBufferProtocol {
  associatedtype Result
  init()
  mutating func appendLiteralSegment(_ literal: Result.StringLiteralType)
  /*mutating func appendInterpolatedSegment(_ value: AnyTypeYouWant)*/
  mutating func getResult() -> Result
}

You could also make StringInterpolationBufferProtocol refine ExpressibleByStringInterpolation for the case where they are the same, or even just to make it easier to test the buffer.

jrose · May 16, 2018, 12:46am

Ah, okay, I see that this new design is a little worse if you wanted to make a class hierarchy ExpressibleByStringInterpolation. An initializer works on subclasses, but this protocol would be non-inheritable.

Slava_Pestov · May 16, 2018, 4:04am

Can we avoid overloading here? I believe it only works right now on accident because of quirks in the implementation of string interpolation. I was working on some type checker cleanups that eliminate the ability to perform overload resolution in this phase of type checking (Remove typeCheckExpressionShallow() by slavapestov · Pull Request #16235 · apple/swift · GitHub) and ran into this exact behavior. I ended up adding a simulation where it resolves only precisely the current set of overloads, but not arbitrary overloads of init(stringInterpolationSegment:). I haven't merged the cleanups yet since I've been busy with other things, but please if we're re-designing string interpolation let's not preserve this behavior.

beccadax · May 16, 2018, 4:07am

Sorry about the delay answering this point. I've written three three toy examples which have been on my mind:

HTML is a wrapper around String which automatically escapes any non-HTML data added to it. It's conceptually similar to a Ruby on Rails type called SafeBuffer which is an important part of how that framework prevents XSS attacks. Unlike SafeBuffer, however, HTML uses formally-declared types and string literals to carry out this task without requiring the use of any special template system syntax.
LocalizableString uses string literals to generate a format string and look it up in a bundle's localization tables, then evaluates the format string with the interpolated values as arguments. It makes it really easy to assemble localized strings with complicated contents.
SQLiteStatement lets you specify a SQL statement as a string literal with interpolated arguments, but instead of directly inlining the interpolated values, it passes them using placeholders. This avoids SQL injection attacks. I'm not aware of another language that can actually use this approach; most need to pretty much reinvent SQL as a DSL to handle it safely.

These are proofs of concept and may not even compile, let alone work correctly, but they're good examples of some of the design space I hope to open up.

beccadax · May 16, 2018, 4:27am

That's option 1 of the three I discuss. We can do it, but it has costs—it's 30–40% slower at runtime (unless we invent new, possibly-unsound optimizations to recover some of that speed) and it takes a bunch of potential features (limiting interpolatable types, using parameter labels and extra parameters to modify interpolation behavior) off the table.

(The optimization issue: Without overloading, the fastest implementation of appendInterpolatedSegment(_:) that I've been able to come up with is something like:

    @inlinable
    public mutating func appendStringInterpolation<T>(_ value: T) {
        switch value {
        case let streamable as TextOutputStreamable:
            streamable.write(to: &self)
        case let convertible as CustomStringConvertible:
            write(convertible.description)
        default:
            _print_unlocked(value, &self)
        }
    }

You can inline that switch into the caller, and I suppose you might be able to optimize away the use of existentials for the calls on streamable and convertible (though it didn't look like we're doing that right now), but you often can't remove the branches based on compile-time knowledge—a type which doesn't conform to TextOutputStreamable at compile time might turn out to conform at runtime. We could perhaps implement something to work around that, but that sounds pretty complicated.)

benrimmington · May 16, 2018, 3:20pm

Thank you for working on this again, Brent.

In the previous thread, @jawbroken suggested there could be a single String (along with indices of its insertion points). I can see how this approach would be slower, but could it be used to implicitly reserve space, instead of the estimatedCodeUnits parameter? Then the removeAll(keepingCapacity:) API could be used, and the literal and interpolated segments would be appended.

The initial string could have one or more embedded null characters for each placeholder, to reserve extra space for interpolated values. Or there might be another control character which is more suitable, but ASCII \0 was originally designed to allow gaps on punched paper tape for later edits (allegedly).

jrose · May 16, 2018, 4:05pm

Note that even if we keep the overloading, Brent's proposal has it effectively happening in a separate statement. That means we can do it in a separate constraint system after the main expression has already type-checked (and given us a buffer type).

Joe_Groff · May 16, 2018, 6:05pm

I think that overloading is the only adequately powerful mechanism we have for expressing interpolation segments. Even for String, we want to be able to pick statically among the TextOutputStreamable/Custom[Debug]StringConvertible/dynamic printing behaviors. As Jordan noted, each segment in this design is effectively "its own statement" for overload resolution purposes, which limits the potential for combinatoric explosions of potential overloads. I also think the ability for user-defined string-interpolable types to finely control what kinds of interpolation segments they allow is crucial for this really being a useful protocol for anything other than strings to adopt. Consider that, if string interpolation weren't part of the language already, the alternative for users would likely be to write either:

var x = "first segment"
x.append(String(secondSegment))
x.append("third segment")
x.append(String(fourthSegment))
...

or:

var x = "first segment" + String(secondSegment) + "third segment" + String(fourthSegment)

both of which are likely to have even more complex overload resolution behavior than the implicit behavior of an overload-based interpolation design (the former due to the need to explicitly String(...)-ize the interpolants, the latter due to being a single expression), so the interpolation syntax is still providing a benefit over those baselines.

Michael_Ilseman · May 16, 2018, 7:10pm

(I participated in some iterations of this design; my responses are to open questions)

Overall, I think this is excellent. Thank you for pushing on this!

1. Are there other plausible designs?

(Jordan mentioned alternatives)

2. Can the non-overloaded design be optimized enough to be as fast as overloading?

The non-overloaded design may be optimizable enough in the immediate term for String (the most common and perf-critical conformer) with suitable inlinable annotations and specialization. It might make sense to have more than just 1 entry point, e.g. there could be an entry point for types which conform to TextOutputStreamable. (We could also consider various conditional conformances or defaults when the buffer type itself conforms to TextOutputStream, etc.).

For the future, we need (for many reasons) a way to provide a different algorithm for some kind of user-guided specialization or partial specialization. There currently exists an underscored @_specialized annotation, which introduces a fast callee-side cache to do a check-type-and-branch. However, it doesn't allow the user to supply different code for that specialization. Trying to emulate this with if checks against meta types doesn't work for protocols with associated types or Self requirements.

I think I remember @Joe_Groff or @Douglas_Gregor mentioning thoughts on specialize blocks inside a function, but this is a future language feature. CC @Erik_Eckstein, @Andrew_Trick, @Arnold, who I've also talked with in the past about this.

3. Which design should we select?

CC @Slava_Pestov @Douglas_Gregor who had some concerns with synthesizing code based on overloads rather than against a protocol interface.

4. If we don’t adopt a design which permits extra parameters, what sorts of alternatives should we consider for formatting in strings?

I think that formatters could be separable. We can introduce string formatters in the future and add syntactic sugar to interpolation to support them. The question here is, would a conformer to ExpressibleByStringInterpolation want to customize formatting or provide a different formatter with different formatting operations while utilizing the same syntactic sugar? How clear would this behavior be when looking at the use site? If this needs to be hooked up explicitly through the protocol, we may have to carve out an extra unused associated type now.

However, I feel like sugar to format directly into a String is more valuable and pragmatic, and avoids many-to-many type relationship issues. But, if you have any concrete ideas here, please share them!

I'm not sure what you mean. I think these annotations would be on the conformer's methods, so it would be up to the conformer.

1. Why does “String” handle “StringInterpolationManySmallSegments” so well?

Currently, it's way faster because of small string optimizations. For your "builder-like" approach, I'm not sure what's going on; have you measured if there's any intermediary allocations happening? Do you have a profile?

2. How common are small interpolated strings which might perform poorly with the “SS” implementation?

I suspect this to be very common in practice. Some examples:

Ints skew towards smaller values (e.g. Collection.count) than their full bit width can accommodate. Small string as it currently exists (and we may add more in the future) will easily fit values in the (rough) range of -2⁴⁶ to 2⁵⁰ if formatted in base-10 (more if hex).
Many Floats/Doubles are well-rounded and format small. @tbkka did recent work here to format them much faster and using the minimal number of digits.
Many Strings are small, hence the importance of small string optimizations. I expect this to hold for strings that are interpolated as segments as well.

3. Are there any boneheaded mistakes in the implementations which might radically change their performance?

Some of it could be the estimate quality. In theory, a small estimate will tell String to stay in the small form. A large estimate may trigger an allocation, and the allocator might give a larger one than requested. This can result in "extra padding" non-deterministically.

Also, the current implementation of reserve capacity is not as smart regarding small strings yet, so it may trigger a UTF-16 allocation even when it could be all ASCII and fit. Have you tried a form where you just don't call reserveCapacity at all (not even for capacity of 0)? If that shows any kind of performance improvement, then I can make a targeted benchmark for reserveCapacity here.

4. Which implementation should we adopt?

I would caution against baking in any assumptions regarding directly allocating String's storage classes. They're likely to have most of their operations behind an ABI boundary and we may make dramatic changes in the future (e.g. more small string forms, UTF-8 storage support, etc).

If the current approach using String is slow, we will fix that.

Slava and/or Doug might help more here.

beccadax · May 16, 2018, 8:42pm

That means the compiler would need to know how String.Indexes were constructed, or have some private way to build String.Indexes.

I think this is unlikely to be better in practice. In the very best case:

You must always do an initial copy of the string so you have a mutable buffer to work with.
If you discover that any segments are longer than estimated, you must create a new buffer and copy all the contents.
If you discover that any segments are shorter than estimated, you must compact the string in a pass at the end by moving the remaining content.

The compacting pass is the killer—nothing else needs to do that. In exchange for that extra cost—plus the cost of communicating indices, mind you!—you gain the following benefits:

If the compiler happens to make a unicorn estimate that's exactly correct, you'll do all of the literal copying in one pass, and you won't need to do a compacting pass.
You save one method call per literal segment.

The first benefit is unlikely to occur in practice; the second removes a minor cost. Maybe it would work out somehow, but on paper, it doesn't look promising.

beccadax · May 16, 2018, 8:48pm

Come to think of it, each interpolation is a separate expression in this design, so each one has its own constraint system.

Jon_Shier · May 16, 2018, 11:52pm

Somewhat off topic question...

Would this be the protocol to use for pattern-based parsing (or whatever it's called)? i.e. A future API that would allow us to parse like this:

let input = "1x2x3"
let (r, g, b) = "\(r)x\(g)x\(b)"

Or would we need something else that just looks similar to string interpolation?

Slava_Pestov · May 17, 2018, 3:46am

To be clear, I'm fine with overloading here, as long as it happens at the protocol requirement level. Right now, what's going on is that the protocol only has one overload, and instead of using the requirement's witness directly, we "devirtualize" the call by performing a name lookup into the concrete segment type and finding the most specific segment initializer.

This is done in a weird way because the segment itself has already been type checked, so we wrap it into a call to init(stringInterpolationSegment:) and use a special typeCheckExpressionShallow() entry point that type checks an Expr whose sub expressions have already been type checked. I'd like to get rid of this entry point because its only use is in string interpolation. This entry point introduces some complexity: Remove typeCheckExpressionShallow() by slavapestov · Pull Request #16235 · apple/swift · GitHub Just recently we had to fix a bug in it: [CSApply] Don't try to force propagate l-value access in "shallow" mode by xedin · Pull Request #15287 · apple/swift · GitHub. While propagateLValueAccessKind() is also something that needs to go away, no doubt there are going to be other bugs in this rarely-encountered code path.

@Douglas_Gregor Do you forsee any difficulties in selecting the init(stringInterpolationSegment:) overload in the main constraint system somehow? It should be equivalent to solving multiple sub-systems since each string interpolation segment is a disconnected component of the overall constraint graph.

Slava_Pestov · May 17, 2018, 3:47am

Right, exactly, I was hoping it might be possible to eliminate the sub-constraint system.

beccadax · May 17, 2018, 4:24am

So we'd have something like this:

protocol StringInterpolationBufferProtocol {
  associatedtype StringLiteralType: _BuiltinExpressibleByStringLiteral
  mutating func appendLiteralSegment(_ literal: StringLiteralType)
  
  mutating func appendInterpolatedSegment<T>(_ value: T)
  mutating func appendInterpolatedSegment<T: CustomStringConvertible>(_ value: T)
  mutating func appendInterpolatedSegment<T: TextOutputStream>(_ value: T)
}

And probably provide default implementations for the constrained ones so you could define only the unconstrained generic to satisfy all of them? It's inelegant, but it would get the job done.

That's a fair point. As a starting point, here's a design we could use with interpolation-based overloading:

// Syntaxes:
//     String(myInt, with: intFormatter)
//     "\(myInt, with: intFormatter)"
protocol TextOutputFormatter {
  associatedtype FormatterInput
  func write<Stream: TextOutputStream>(_ input: FormatterInput, to stream: inout Stream)
}
extension String {
  init<Formatter: TextOutputFormatter>(_ input: Formatter.FormatterInput, with formatter: Formatter) {
    self.init()
    formatter.write(input, to: &self)
  }
}
extension String.StringInterpolationBuffer {
  mutating func appendInterpolatedSegment<Formatter: TextOutputFormatter>(_ input: Formatter.FormatterInput, with formatter: Formatter) {
    formatter.write(input, to: &self)
  }
}

If we couldn't use extra parameters in interpolations to do this, people would either need to use the String initializer at an efficiency and verbosity cost, or construct some sort of auxiliary value. For instance, we might borrow a convention from Python:

// Syntax:
//     String(myInt, with: intFormatter)
//     "\(intFormatter % myInt)"
// Instead of `String(myInt, with: intFormatter)`, we could support 
// `String(intFormatter % myInt)` for consistency.
struct FormattedValue<Formatter: TextOutputFormatter>: TextOutputStreamable {
  let input: Formatter.FormatterInput
  let formatter: Formatter
  func write<Stream: TextOutputStream>(to stream: inout Stream) {
    formatter.write(input, to: &stream)
  }
}
func % <Formatter: TextOutputFormatter>(lhs: Formatter, rhs: Formatter.FormatterInput) -> FormattedValue<Formatter> {
  return FormattedValue(input: rhs, formatter: lhs)
}

That's what I meant when I said the multiple-parameter feature is mostly just a syntax—there will usually be another way to get the same features.

What I meant here is, what should String.StringInterpolationBuffer do? Should it allow its constant's value to be inlined, or should it keep it tunable? It's a tiny decision, but one we have to make.

I haven't yet; I'll take a look at this and maybe contact you privately if I need help.

All of these individual values are small, but I think the more important question is, how frequently will the final value be small enough to fit in a small string? I assume that, if the final value will need storage allocated, there's no advantage in using small strings to begin with and then switching to a buffer partway through.

As long as I have you, quick question: If you ask for a string storage object with capacity n and it ends up allocating a block of size n+m to satisfy it, does it come back with capacity n+m? Or is the excess completely wasted?

Oof, you're right—String.reserveCapacity(_:) is sometimes counterproductive. I built two variants of the "String" implementation—"String/0" omits the reserveCapacity(_:) call entirely, while "String/C" calls it only when the estimate is larger than _SmallUTF8String.capacity. Here's what they get (100=current string interpolation performance; smaller is better):

Example/Design	String/0	String/C	String	SS/Pow
StringInterpolation †	80	102	102	69
StringInterpolationSmall	50	50	115	131
StringInterpolationManySmallSegments †	70	63	61	95

† "String/C" will call reserveCapacity(_:) in this example.

Observations:

The main cause of "String"'s poor performance in some examples is that reserveCapacity(_:) is sometimes detrimental.
However, the size of the estimate is not the only factor which can make reserveCapacity(_:) a bad idea—if it were, "String/0" would not show improvements on any examples where "String/C" performed the same as "String".
"String" outperforms "String/0" in one example, so reserveCapacity(_:) is not a detriment across the board.
"SS/Pow" still outperforms any variant of "String" on the "StringInterpolation" example, but it's an 11-point benefit vs. an 81-point detriment on the worst test. I suspect this is because SS/Pow always uses UTF-16, so the fact that none of the benchmarks include non-ASCII characters is slightly unfair to it. Should I put together UTF-16 examples so we can measure that?

That's fair. If we find that using string internals directly is a big benefit, we should probably design a StringBuilder type or something to encapsulate them.

Going back to something I forgot earlier:

This bit of the design is a bit ugly, I agree, but we don't want to give up the pre-allocation. Some tweaks that might make it more palatable:

Provide the number of Unicode scalars, not code units, in the literal segments. Types can then estimate a code unit size from that.
Provide a string literal containing all the literal segments concatenated together. This would then hide all the messiness of what different sizes mean behind our existing _ExpressibleByBuiltinXStringLiteral protocols. Types could introspect the literal segments to discover anything they needed to know—whether it's length in any encoding, ASCII-ness, or whatever else. In theory, the individual appendLiteralSegment(_:) calls could be backed by the same memory, but I don't know how difficult it would be for the compiler to actually generate code like that.

beccadax · May 17, 2018, 5:08am

I'm hoping to change the constraint solver's entire approach to string interpolations. Basically, currently it does this:

During constraint solving:
1. Solve the type of the InterpolatedStringLiteralExpr.
2. Solve the types of each Expr in its Segments.
During constraint application:
1. Create a semanticExpr for the InterpolatedStringLiteralExpr referencing each of the Segments.
2. Solve the types of the new nodes in the semanticExpr using typeCheckExpressionShallow().

Instead, I hope to do this:

Before constraint solving:
1. Create a semanticExpr for the InterpolatedStringLiteralExpr referencing each of the Segments.
During constraint solving:
1. Solve the types of all the nodes in the semanticExpr, essentially ignoring the InterpolatedStringLiteralExpr.

This simpler approach would not have worked with the old ExpressibleByStringInterpolation design because the segments would all belong to one expression, and that expression would be much too complicated for the constraint solver to handle. But now that each segment is in its own statement, we shouldn't need to do anything special to keep the constraint solver from trying to solve them all at once.

At least, this is what I hope we can do. I don't know if it will actually work.

String interpolation revamp: design decisions

Quick primer on string interpolation

Proposed design, in a nutshell

Benchmark details

Remaining issues

The big question: How should we interpolate?

1. Formal requirement for an unconstrained generic implementation which is always used

2. Use overloads instead of a formal requirement

3. Have a formal requirement, but use overloads

3a. Have a formal requirement, but use the overloads from the original module

Other interpolation options

Calculating and communicating size estimates

Implementing String.StringInterpolationBuffer

Compiler implementation

1. Are there other plausible designs?

2. Can the non-overloaded design be optimized enough to be as fast as overloading?

3. Which design should we select?

4. If we don’t adopt a design which permits extra parameters, what sorts of alternatives should we consider for formatting in strings?

1. Why does “String” handle “StringInterpolationManySmallSegments” so well?

2. How common are small interpolated strings which might perform poorly with the “SS” implementation?

3. Are there any boneheaded mistakes in the implementations which might radically change their performance?

4. Which implementation should we adopt?

Implementing `String.StringInterpolationBuffer`