String interpolation revamp: design decisions

Right, exactly, I was hoping it might be possible to eliminate the sub-constraint system.

So we'd have something like this:

protocol StringInterpolationBufferProtocol {
  associatedtype StringLiteralType: _BuiltinExpressibleByStringLiteral
  mutating func appendLiteralSegment(_ literal: StringLiteralType)
  
  mutating func appendInterpolatedSegment<T>(_ value: T)
  mutating func appendInterpolatedSegment<T: CustomStringConvertible>(_ value: T)
  mutating func appendInterpolatedSegment<T: TextOutputStream>(_ value: T)
}

And probably provide default implementations for the constrained ones so you could define only the unconstrained generic to satisfy all of them? It's inelegant, but it would get the job done.

That's a fair point. As a starting point, here's a design we could use with interpolation-based overloading:

// Syntaxes:
//     String(myInt, with: intFormatter)
//     "\(myInt, with: intFormatter)"
protocol TextOutputFormatter {
  associatedtype FormatterInput
  func write<Stream: TextOutputStream>(_ input: FormatterInput, to stream: inout Stream)
}
extension String {
  init<Formatter: TextOutputFormatter>(_ input: Formatter.FormatterInput, with formatter: Formatter) {
    self.init()
    formatter.write(input, to: &self)
  }
}
extension String.StringInterpolationBuffer {
  mutating func appendInterpolatedSegment<Formatter: TextOutputFormatter>(_ input: Formatter.FormatterInput, with formatter: Formatter) {
    formatter.write(input, to: &self)
  }
}

If we couldn't use extra parameters in interpolations to do this, people would either need to use the String initializer at an efficiency and verbosity cost, or construct some sort of auxiliary value. For instance, we might borrow a convention from Python:

// Syntax:
//     String(myInt, with: intFormatter)
//     "\(intFormatter % myInt)"
// Instead of `String(myInt, with: intFormatter)`, we could support 
// `String(intFormatter % myInt)` for consistency.
struct FormattedValue<Formatter: TextOutputFormatter>: TextOutputStreamable {
  let input: Formatter.FormatterInput
  let formatter: Formatter
  func write<Stream: TextOutputStream>(to stream: inout Stream) {
    formatter.write(input, to: &stream)
  }
}
func % <Formatter: TextOutputFormatter>(lhs: Formatter, rhs: Formatter.FormatterInput) -> FormattedValue<Formatter> {
  return FormattedValue(input: rhs, formatter: lhs)
}

That's what I meant when I said the multiple-parameter feature is mostly just a syntax—there will usually be another way to get the same features.

What I meant here is, what should String.StringInterpolationBuffer do? Should it allow its constant's value to be inlined, or should it keep it tunable? It's a tiny decision, but one we have to make.

I haven't yet; I'll take a look at this and maybe contact you privately if I need help.

All of these individual values are small, but I think the more important question is, how frequently will the final value be small enough to fit in a small string? I assume that, if the final value will need storage allocated, there's no advantage in using small strings to begin with and then switching to a buffer partway through.

As long as I have you, quick question: If you ask for a string storage object with capacity n and it ends up allocating a block of size n+m to satisfy it, does it come back with capacity n+m? Or is the excess completely wasted?

Oof, you're right—String.reserveCapacity(_:) is sometimes counterproductive. I built two variants of the "String" implementation—"String/0" omits the reserveCapacity(_:) call entirely, while "String/C" calls it only when the estimate is larger than _SmallUTF8String.capacity. Here's what they get (100=current string interpolation performance; smaller is better):

Example/Design String/0 String/C String SS/Pow
StringInterpolation † 80 102 102 69
StringInterpolationSmall 50 50 115 131
StringInterpolationManySmallSegments † 70 63 61 95

† "String/C" will call reserveCapacity(_:) in this example.

Observations:

  • The main cause of "String"'s poor performance in some examples is that reserveCapacity(_:) is sometimes detrimental.

  • However, the size of the estimate is not the only factor which can make reserveCapacity(_:) a bad idea—if it were, "String/0" would not show improvements on any examples where "String/C" performed the same as "String".

  • "String" outperforms "String/0" in one example, so reserveCapacity(_:) is not a detriment across the board.

  • "SS/Pow" still outperforms any variant of "String" on the "StringInterpolation" example, but it's an 11-point benefit vs. an 81-point detriment on the worst test. I suspect this is because SS/Pow always uses UTF-16, so the fact that none of the benchmarks include non-ASCII characters is slightly unfair to it. Should I put together UTF-16 examples so we can measure that?

That's fair. If we find that using string internals directly is a big benefit, we should probably design a StringBuilder type or something to encapsulate them.


Going back to something I forgot earlier:

This bit of the design is a bit ugly, I agree, but we don't want to give up the pre-allocation. Some tweaks that might make it more palatable:

  1. Provide the number of Unicode scalars, not code units, in the literal segments. Types can then estimate a code unit size from that.

  2. Provide a string literal containing all the literal segments concatenated together. This would then hide all the messiness of what different sizes mean behind our existing _ExpressibleByBuiltinXStringLiteral protocols. Types could introspect the literal segments to discover anything they needed to know—whether it's length in any encoding, ASCII-ness, or whatever else. In theory, the individual appendLiteralSegment(_:) calls could be backed by the same memory, but I don't know how difficult it would be for the compiler to actually generate code like that.

I'm hoping to change the constraint solver's entire approach to string interpolations. Basically, currently it does this:

  1. During constraint solving:

    1. Solve the type of the InterpolatedStringLiteralExpr.
    2. Solve the types of each Expr in its Segments.
  2. During constraint application:

    1. Create a semanticExpr for the InterpolatedStringLiteralExpr referencing each of the Segments.
    2. Solve the types of the new nodes in the semanticExpr using typeCheckExpressionShallow().

Instead, I hope to do this:

  1. Before constraint solving:

    1. Create a semanticExpr for the InterpolatedStringLiteralExpr referencing each of the Segments.
  2. During constraint solving:

    1. Solve the types of all the nodes in the semanticExpr, essentially ignoring the InterpolatedStringLiteralExpr.

This simpler approach would not have worked with the old ExpressibleByStringInterpolation design because the segments would all belong to one expression, and that expression would be much too complicated for the constraint solver to handle. But now that each segment is in its own statement, we shouldn't need to do anything special to keep the constraint solver from trying to solve them all at once.

At least, this is what I hope we can do. I don't know if it will actually work.

Right. My very limited understanding of the constraint solver is that the presence of multiple overloads alone is not what causes exponential behavior -- it's the presence of multiple overloads in the same strongly connected component of the constraint graph.

If each interpolation segment's type is its own independent type variable that is part of its own connected component, it should be possible to solve the constraint system in linear time, but perhaps @rudkx or @Douglas_Gregor can clarify things here.

Sounds like a reasonable approach that also eliminates the need to have something like typeCheckExpressionShallow(). In the mean time, we can either keep this entry point, or I can hack up string interpolation to hard-code the current set of overloads, depending on how much effort I decide to put into some on-going CSApply cleanups I'm working on.

Yes, that's basically true.

One way to think about it is that we have constraint variables with domains. Those domains are either overload choices or types. If you have multiple constraint variables in a connected component that each have more than one element in their domain, you can potentially end up with exponential behavior. You won't necessarily end up with exponential behavior, though. If you can fail quickly while trying elements of the domain (e.g. you immediately know that a particular overload or particular type cannot work), you don't end up with exponential behavior.

I haven't had a chance to read through the proposal and the rest of this thread but I'll try to get back to it.

To follow up from @benrimmington's mention of my post in the earlier thread about this, I think that this part of your post is basically what I was saying at the time. i.e. not that there would be reserved space for the non-literal segments in a single String, just that you could (theoretically) avoid the overhead of having several separate Strings for the appendLiteralSegment(_:) calls by having a single String containing all the appended literal sections and then some way of indicating where the boundaries are (e.g. indices, Substrings, whatever). I have no idea how that would work out in practice, or how it would work with the StringLiteralType in your new proposal (edit: looking into the details, the only valid types for _ExpressibleByBuiltinStringLiteral are String or StaticString so it might be possible), though.

Great work, and really interesting writeup.

What I'm talking about is more of a low-level implementation detail that wouldn't be visible to the caller. Here's the deal:

Firstly, thanks to some excellent work I believe by @Michael_Ilseman, there really isn't much overhead to a String backed by a string literal anymore. It's pretty much just a pointer to constant memory, a length, and some flags packed away somewhere. Pretty cool, actually.

So, imagine you have the expression "foo \(bar) baz" as MyString in your code, and MyString uses String as its StringLiteralType. If we pass a full StringLiteralType with all the literal segments to the initializer for ExpressibleByStringInterpolation, that would translate to code like:

MyString(stringLiteral: "foo  baz", withInterpolations: 2, builtBy: { buffer in
  buffer.appendLiteralSegment("foo ")
  buffer.appendInterpolatedSegment(bar)
  buffer.appendLiteralSegment(" baz")
})

But that's only the beginning of the story. Later on in compilation, we break down those string literals into _ExpressibleByBuiltinStringLiteral calls. I don't know precisely when or how this is done, but the compiler creates a table of string data and includes it in the binary. For this code, the string table might include ( represents space):

Offset +0 +1 +2 +3 +4 +5 +6 +7
0xBEE0 f o o b a z
0xBEE8 f o o b a z

The above code sample is then translated into something like:

MyString(stringLiteral: String(_builtinStringLiteral: stringsTable + 0xBEE0, count: 8, isASCII: true), withInterpolations: 2, builtBy: { buffer in
  buffer.appendLiteralSegment(String(_builtinStringLiteral: stringsTable + 0xBEE8, count: 4, isASCII: true))
  buffer.appendInterpolatedSegment(bar)
  buffer.appendLiteralSegment(String(_builtinStringLiteral: stringsTable + 0xBEEC, count: 4, isASCII: true))
})

Notice, though, that this is redundant: The characters in 0xBEE0–0xBEE7 are the same as the ones in 0xBEE8–0xBEEF. So there's no reason we couldn't delete the second:

Offset +0 +1 +2 +3 +4 +5 +6 +7
0xBEE0 f o o b a z

And write our code to use the first for both:

MyString(stringLiteral: String(_builtinStringLiteral: stringsTable + 0xBEE0, count: 8, isASCII: true), withInterpolations: 2, builtBy: { buffer in
  buffer.appendLiteralSegment(String(_builtinStringLiteral: stringsTable + 0xBEE0, count: 4, isASCII: true))
  buffer.appendInterpolatedSegment(bar)
  buffer.appendLiteralSegment(String(_builtinStringLiteral: stringsTable + 0xBEE4, count: 4, isASCII: true))
})

Each literal is a fully-formed String instance, completely unaware of any of the others, and yet they're all backed by the same memory, so they don't bloat the binary. What's different from what you're talking about is that there are no indices being communicated or Substrings being created; indeed, there's no formal relationship between the literals at all. It's just that they share memory behind the scenes in such a way that giving the initializer a sneak peek doesn't really cost us anything.

I did a little research and I'm pretty sure the LLVM ConstantMergePass implements this behavior, but I'm not sure if we're using it. (Can someone confirm?) If we are, this would probably work already.

1 Like

I wasn't suggesting that the string be edited in-place and compacted. Instead, the initial string would be used to avoid the String.reserveCapacity(_:) method, which is documented as taking a "minimum number of ASCII character's worth of storage to allocate." The storage would then be overwritten using removeAll(keepingCapacity: true), and segments would be appended as per your proposed design.

The number of embedded nulls for each placeholder would only be the average size of typical interpolated values. They could also be beneficial to types such as LocalizableString and SQLiteStatement, which only need a few extra bytes for each %@ or ? segment.

nit: s/TextOutputStream/TextOutputStreamable/, as the buffer itself is the stream.

Oh, you mean for the standard library's conformance? I don't see much of a downside either way. If it's inlinable and that's beneficial we'll do it. All it means is that a future improved tuning would benefit apps after they are re-compiled, but presumably the prior tuning is already acceptable.

Right, the interpolated segments are pretty small. The final value length will be the sum of the literal segment's code unit count and the interpolated segments count. E.g. if the literal segments add up to 10, and we interpolate one Int that represents a count, I still expect the final result to be small.

We store the actual allocation size

If something's slow, we should fix it. So, I opened a fix :-)

It's a little over 2x speedup when the requested capacity is small. So, all that's left is the (relatively) modest overhead of the function call and early-exit branch.

UTF-16 examples would be a good idea!

ASCII is still very important, not because US-English is, but because most data is formatted into ASCII and many interpolation usage is for logging and other situations which happen to be ASCII regardless of the user's (or even developer's) language. E.g., your example conformer for SQL.

That, or just make it fast and/or expose a more direct API. If you are able to extract a micro benchmark to measure just this effect, I can poke at it some more. Otherwise, we can try to optimize it later.

I'm not as violently opposed to a StringBuilder as some; I think ropes and twines are neat and could be valuable. But, as presented here it looks more like a TextOutputStream, which String already is.

FWIW, String's reserveCapacity is already in terms of "ASCII characters" (meaning code units... CR-LF makes everything painful). The Swift compiler can easily give ASCII/UTF-8 code unit underestimates, and even underestimates based on segment type (e.g. Int is at least 1, Double at least 3, Array at least 2, String may be 0, ...).

We can also have a default implementation that calls a non-reserving requirement, but I think having a parameter that conformers are free to ignore is simpler.

Excellent. I'll rebase on the current master and re-run my benchmarks; in the mean time, the results below do not reflect this fix.

I made variants of each of the three benchmarks we currently have. "StringInterpolation16" replaces an "n" with an "ñ" in a literal segment in the middle of the string; "StringInterpolationSmall16" replaces an "n" with an "ñ" in the first interpolated segment; and "StringInterpolationManySmallSegments16" adds an emoji at the very beginning. (Benchmark harness)

Here are the results. To keep the table smaller, I'm dropping the "BP" implementations; suffice it to say that the "16" variants didn't redeem them. Because the "current" value changes radically between "StringInterpolationSmall" and its "16" variant, 100 = current time on the non-"16" variant, and I'm showing "current" values for the "16" variants so you can see the current implementation's 16-bit penalty.

String/0 String/C String SS/Pow SS/Est current
StringInterpolation 81 102 102 68 69
StringInterpolation16 76 102 102 67 68 100
StringInterpolationSmall 50 50 116 131 132
StringInterpolationSmall16 117 117 120 126 127 181
StringInterpolationManySmallSegments 68 61 63 91 92
StringInterpolationManySmallSegments16 86 64 64 95 96 118

I think the bottom line is that the Unicode string efficiency gains from always using a 16-bit buffer are modest. On "StringInterpolationSmall16" and "StringInterpolationManySmallSegments16", the "String" variants see large drops in performance, but still do better than "SS".

Curiously, the "StringInterpolation" test—where "SS" beat "String" before—doesn't seem to be sensitive to 16-bit-ness at all. After investigating, what I think we're seeing here is the difference in their growth algorithms. I instrumented the "StringInterpolation" test to measure the actual capacity of its buffer before and after each segment is appended. It looks like on this particular test, the slightly more aggressive buffer growth algorithm used by "SS" saves us a reallocation and makes it faster than what String would do on its own:

Segment String/0 String/C String SS/Pow SS/Est
0 (35 estimate) 0 40 40 64 64
1 (19) 32 40 40 64 64
2 (+16=35) 64 40 40 64 64
3 (+50=85) 128 88 88 128 128
4 (+14=99) 128 176 176 128 128
Allocations 3 3 3 2 2

I'm thinking we should chalk this up to a quirk of the benchmark and conclude that the "String" implementation, with a corrected _StringGuts.reserveCapacity(_:), is our best bet; however, if we think this actually reflects something real, we can either look at String's growth algorithm or have String.StringInterpolationBuffer manually resize the string more aggressively. Merely using a less conservative interpolation size estimate might improve this, though.

That would be cool, but we've been discussing separating the estimate into a literal code unit count and an interpolated segment count. Would this mean that the interpolated segment count would become an estimate of interpolated code units? Or would you formally pass the interpolated segment count, but somehow detect where it was used and "correct" the estimate by rewriting the expression? Or would we do this by going back to passing a single, opaque estimate of the total code unit size?

1 Like

Results after rebasing on last night's master:

Example/Implementation String/0 String/C String SS/Pow SS/Est current
StringInterpolation 87 103 103 77 77 100
StringInterpolation16 82 109 109 79 79 100
StringInterpolationSmall 36 37 39 108 110 100
StringInterpolationSmall16 134 134 135 118 120 216
StringInterpolationManySmallSegments 68 59 60 75 76 100
StringInterpolationManySmallSegments16 89 67 71 82 84 124

Unsurprisingly, "String" still performs poorly on "StringInterpolation" and "StringInterpolation16", since the growth algorithm quirk hasn't changed. On "StringInterpolationSmall", "String" vastly outperforms "SS" on ASCII strings and does moderately worse on 16-bit strings. On "StringInterpolationManySmallSegments", "String" edges out "SS" in both tests (!). On all examples except the seemingly coincidentally bad "StringInterpolation", "String" beats the current design.

I think "String" is our winner.


So, let's review the open questions I started the thread with:

I don't think any were presented.

I didn't see much to imply that it could.

I think we've settled on overloads. We believe that the one-statement-per-interpolation design will avoid creating complex expressions that are hard on the type-checker, and will allow for the internal simplifications that @Slava_Pestov is hoping for.

We didn't talk too much about this, except to note that this is mostly just a syntax issue.

I think we've mostly settled on the latter, but I'm not sure how that meshes with @Michael_Ilseman's suggestion that we could use different estimates for different types of interpolations, and there's the dark horse "pass a literal containing all the literal segments together, and hope that the optimizer combines all the strings later" option.

I'm leaning towards inlining, but it sounds like this doesn't matter that much.

The real question turned out to be why "String" was so slow on the other tests, and the real answer turned out to be "reserveCapacity(_:) was disabling small string optimizations, and one of the tests happened to grow the buffer in a suboptimal pattern".

Michael thinks they're very common and we should care a lot about them.

There was a small issue in String.reserveCapacity(_:) which has been fixed, and which improved the performance of the "String" implementations.

"String". Now that the reserveCapacity(_:) issue has been fixed, it does better on most of the examples, except for one where it happens to grow its buffer less aggressively than "SS". The "SS" designs otherwise don't perform nearly as well with ASCII strings, get only a tiny gain on non-ASCII strings, and touch all sorts of internal stuff that we don't really want them to.

We don't know yet. Find out in our next episode—same Swift-Time, same Swift-Network!

3 Likes

I now have a basic implementation of code generation for this in the compiler and I'm working on fixing broken tests. Most of the test breakage seems to be a simple matter of programming, but there are two matters I will need some input on.

1. Partially initialized captures

There are a couple tests which expose a limitation of the proposed design: Because the string interpolation is a closure, it has to capture the variables it accesses, and if those variables aren't fully initialized, it can't. Here are two examples from the test suite that it can't handle:

// test/Serialization/typealias.swift:25
var named : ThreeNamedInts    // this is a typealias for (a: Int, b: Int, c: Int)
named.b = 64
print("\(named.b)\n", terminator: "")
// test/SILGen/witness-init-requirement-with-base-class-init.swift:13
  required convenience init() { self.init(species: "\(type(of: self))") }

I'm not sure what to do about this. If left alone, it's source-breaking, albeit in relatively rare cases. It seems like capturing only the parts of a partially-initialized instance that are actually used might work, but I don't know if that's realistic.

Is there an easy fix I'm not aware of (for example, something autoclosures do that we could borrow)? Is this issue serious enough that we'll need to delay adoption, or redesign to avoid closures? Or are we willing to accept a regression in functionality, either temporarily or not?

2. AST decisions

Setting a SemanticExpr is working well for generating code, but it's not great for diagnostics, IDE features, etc. The problem is that although the InterpolatedStringLiteralExpr still has an array of segment expressions, subsequent steps like constraint solving operate on the SemanticExpr and ignore the array of segment expressions. When something wants to access the segments, much of the information it's looking for can only be found by digging through the SemanticExpr.

I'm not sure what the best way to address this would be. One hacky solution is to update the Segments array at various points with expressions collected from the SemanticExpr. This might make a good stopgap.

A cleaner solution is to replace the segments array with a pointer to the closure node; people could access the segments through the statements in the closure's body. The full SemanticExpr would only be generated later, after type-checking was finished. That means changing clients of the AST to handle the segments through the closure, though.

A third, limited possibility: Outside of tooling, the only problem in the Swift compiler itself is diagnoseUnintendedOptionalBehavior(), which tries to inspect the types of interpolated segments after inference. We could change its implementation: We can add an overload like func appendInterpolation<T>(_ value: T?) and then have that pass look for calls to the overload instead of looking at the segment list. We could almost do this simply by adding the call and deprecating it, but that wouldn't allow fix-its.

Any thoughts on either of these issues?

Our experience using closures behind the scenes to implement features has been somewhat spotty, particularly with defer, where there's a long tail of issues that have leaked the closure-ness of the defer block. Since string interpolation is an expression, there's hopefully less surface area for bad interactions than there is with a control flow statement like defer, but I'm still a bit nervous about it, especially because you've already discovered source breakages as fallout. Maybe we could have an internal SequentiallyEvaluateExpr behind the scenes to evaluate a series of expressions in order, C comma operator or Rust semicolon style, for this. SILGen for such an expr should be straightforward.

That'd be handy, especially if it behaved as a boundary for type inference the way closures do (or only did type inference into the last expression). We could then do a design like:

protocol ExpressibleByStringInterpolation: ExpressibleByStringLiteral {
  associatedtype StringInterpolation: StringInterpolationProtocol where StringInterpolation.StringLiteralType == StringLiteralType
  init(stringInterpolation: StringInterpolation)
}
protocol StringInterpolation {
  associatedtype StringLiteralType: _ExpressibleByBuiltinStringLiteral
  
  init(totalLiteralSize: Int, interpolationCount: Int)
  mutating func appendLiteral(_ literal: StringLiteralType)
  // Informal: mutating func appendInterpolation(…)
}

// Generated code looks something like:
//   
//   #_doSeriesOfStatements {
//     var $interpolation = String.StringInterpolation(totalLiteralSize: 8, interpolationCount: 1)
//     $interpolation.appendLiteral("foo ")
//     $interpolation.appendInterpolation(bar)
//     $interpolation.appendLiteral(" baz")
//     String(stringInterpolation: $interpolation)
//   }

It'd be nice to have a solution that didn't involve digging into SIL-land, though.

Fun little discovery I made a couple days ago: What do you think this prints?

let x=1, y=2
print(“\(x, y)”)

Turns out that string interpolations are parsed as lists, so this implicitly forms a tuple. Oops.

Are we okay to put a deprecation warning and fix-it into Swift 4.2 telling the user to add an extra set of parentheses? That would allow Swift 5 to parse this as an argument list instead.

7 Likes

That seems like a good idea. Does it affect any code in the compatibility suite?

While attempting to answer that question, I ran across a source break in SwiftPM! The code in question is:

// `required` on the RHS is of type `[SwiftLanguageVersion]`;
// `SwiftLanguageVersion` conforms to `CustomStringConvertible`.
let required = required.map(String.init).joined(separator: ", ")

This worked before because one of the String.init(stringInterpolationSegment:) overloads was more specific than either String.init(describing:) or String.init(reflecting:). Now that the String.init(stringInterpolationSegment:)s are gone, it's ambiguous. (Adding init(stringInterpolationSegment:) back in a private extension makes the file compile correctly.)

This is totally busted, of course—there's no way the author thought they were using init(stringInterpolationSegment:)—but I bet other people have done this. I haven't looked at what it would take to warn here.

We could also leave these initializers in place, possibly with new underscored names. If we gave CustomStringConvertible and TextOutputStreamable a common underscored base protocol, we could reduce the overload set from three to one.

Yeah, that behavior's been a thorn in our side. The user almost certainly intends String.init(describing:) when they write String.init; I wonder if there's a principled way we can make that the favored choice. (Horrible idea: make init(reflecting:) take Any and init(describing:) generic for all T…)

I think it’d actually be the other way around—the type checker prefers concrete functions over generic ones.

We could fix this specific case by providing overloads of init(describing:) like we use for string interpolation. This would also have the benefit of giving us fast paths when you have those conformances. It wouldn’t make String.init select describing over reflecting for types that don’t conform to those protocols, though.

So the part of the compatibility suite that will build at all on my machine doesn’t seem to include any of these implicit tuple interpolations, but that doesn’t include anything with SwiftPM, because I couldn’t easily get it to work with my development copy. What should I do here? Submit a PR to apple/swift and have the CI bot run the suite there?