String interpolation revamp: design decisions

Right. My very limited understanding of the constraint solver is that the presence of multiple overloads alone is not what causes exponential behavior -- it's the presence of multiple overloads in the same strongly connected component of the constraint graph.

If each interpolation segment's type is its own independent type variable that is part of its own connected component, it should be possible to solve the constraint system in linear time, but perhaps @rudkx or @Douglas_Gregor can clarify things here.

Sounds like a reasonable approach that also eliminates the need to have something like typeCheckExpressionShallow(). In the mean time, we can either keep this entry point, or I can hack up string interpolation to hard-code the current set of overloads, depending on how much effort I decide to put into some on-going CSApply cleanups I'm working on.

Yes, that's basically true.

One way to think about it is that we have constraint variables with domains. Those domains are either overload choices or types. If you have multiple constraint variables in a connected component that each have more than one element in their domain, you can potentially end up with exponential behavior. You won't necessarily end up with exponential behavior, though. If you can fail quickly while trying elements of the domain (e.g. you immediately know that a particular overload or particular type cannot work), you don't end up with exponential behavior.

I haven't had a chance to read through the proposal and the rest of this thread but I'll try to get back to it.

To follow up from @benrimmington's mention of my post in the earlier thread about this, I think that this part of your post is basically what I was saying at the time. i.e. not that there would be reserved space for the non-literal segments in a single String, just that you could (theoretically) avoid the overhead of having several separate Strings for the appendLiteralSegment(_:) calls by having a single String containing all the appended literal sections and then some way of indicating where the boundaries are (e.g. indices, Substrings, whatever). I have no idea how that would work out in practice, or how it would work with the StringLiteralType in your new proposal (edit: looking into the details, the only valid types for _ExpressibleByBuiltinStringLiteral are String or StaticString so it might be possible), though.

Great work, and really interesting writeup.

What I'm talking about is more of a low-level implementation detail that wouldn't be visible to the caller. Here's the deal:

Firstly, thanks to some excellent work I believe by @Michael_Ilseman, there really isn't much overhead to a String backed by a string literal anymore. It's pretty much just a pointer to constant memory, a length, and some flags packed away somewhere. Pretty cool, actually.

So, imagine you have the expression "foo \(bar) baz" as MyString in your code, and MyString uses String as its StringLiteralType. If we pass a full StringLiteralType with all the literal segments to the initializer for ExpressibleByStringInterpolation, that would translate to code like:

MyString(stringLiteral: "foo  baz", withInterpolations: 2, builtBy: { buffer in
  buffer.appendLiteralSegment("foo ")
  buffer.appendInterpolatedSegment(bar)
  buffer.appendLiteralSegment(" baz")
})

But that's only the beginning of the story. Later on in compilation, we break down those string literals into _ExpressibleByBuiltinStringLiteral calls. I don't know precisely when or how this is done, but the compiler creates a table of string data and includes it in the binary. For this code, the string table might include (␠ represents space):

Offset +0 +1 +2 +3 +4 +5 +6 +7
0xBEE0 f o o ␠ ␠ b a z
0xBEE8 f o o ␠ ␠ b a z

The above code sample is then translated into something like:

MyString(stringLiteral: String(_builtinStringLiteral: stringsTable + 0xBEE0, count: 8, isASCII: true), withInterpolations: 2, builtBy: { buffer in
  buffer.appendLiteralSegment(String(_builtinStringLiteral: stringsTable + 0xBEE8, count: 4, isASCII: true))
  buffer.appendInterpolatedSegment(bar)
  buffer.appendLiteralSegment(String(_builtinStringLiteral: stringsTable + 0xBEEC, count: 4, isASCII: true))
})

Notice, though, that this is redundant: The characters in 0xBEE0–0xBEE7 are the same as the ones in 0xBEE8–0xBEEF. So there's no reason we couldn't delete the second:

Offset +0 +1 +2 +3 +4 +5 +6 +7
0xBEE0 f o o ␠ ␠ b a z

And write our code to use the first for both:

MyString(stringLiteral: String(_builtinStringLiteral: stringsTable + 0xBEE0, count: 8, isASCII: true), withInterpolations: 2, builtBy: { buffer in
  buffer.appendLiteralSegment(String(_builtinStringLiteral: stringsTable + 0xBEE0, count: 4, isASCII: true))
  buffer.appendInterpolatedSegment(bar)
  buffer.appendLiteralSegment(String(_builtinStringLiteral: stringsTable + 0xBEE4, count: 4, isASCII: true))
})

Each literal is a fully-formed String instance, completely unaware of any of the others, and yet they're all backed by the same memory, so they don't bloat the binary. What's different from what you're talking about is that there are no indices being communicated or Substrings being created; indeed, there's no formal relationship between the literals at all. It's just that they share memory behind the scenes in such a way that giving the initializer a sneak peek doesn't really cost us anything.

I did a little research and I'm pretty sure the LLVM ConstantMergePass implements this behavior, but I'm not sure if we're using it. (Can someone confirm?) If we are, this would probably work already.

1 Like

I wasn't suggesting that the string be edited in-place and compacted. Instead, the initial string would be used to avoid the String.reserveCapacity(_:) method, which is documented as taking a "minimum number of ASCII character's worth of storage to allocate." The storage would then be overwritten using removeAll(keepingCapacity: true), and segments would be appended as per your proposed design.

The number of embedded nulls for each placeholder would only be the average size of typical interpolated values. They could also be beneficial to types such as LocalizableString and SQLiteStatement, which only need a few extra bytes for each %@ or ? segment.

nit: s/TextOutputStream/TextOutputStreamable/, as the buffer itself is the stream.

Oh, you mean for the standard library's conformance? I don't see much of a downside either way. If it's inlinable and that's beneficial we'll do it. All it means is that a future improved tuning would benefit apps after they are re-compiled, but presumably the prior tuning is already acceptable.

Right, the interpolated segments are pretty small. The final value length will be the sum of the literal segment's code unit count and the interpolated segments count. E.g. if the literal segments add up to 10, and we interpolate one Int that represents a count, I still expect the final result to be small.

We store the actual allocation size

If something's slow, we should fix it. So, I opened a fix :-)

It's a little over 2x speedup when the requested capacity is small. So, all that's left is the (relatively) modest overhead of the function call and early-exit branch.

UTF-16 examples would be a good idea!

ASCII is still very important, not because US-English is, but because most data is formatted into ASCII and many interpolation usage is for logging and other situations which happen to be ASCII regardless of the user's (or even developer's) language. E.g., your example conformer for SQL.

That, or just make it fast and/or expose a more direct API. If you are able to extract a micro benchmark to measure just this effect, I can poke at it some more. Otherwise, we can try to optimize it later.

I'm not as violently opposed to a StringBuilder as some; I think ropes and twines are neat and could be valuable. But, as presented here it looks more like a TextOutputStream, which String already is.

FWIW, String's reserveCapacity is already in terms of "ASCII characters" (meaning code units... CR-LF makes everything painful). The Swift compiler can easily give ASCII/UTF-8 code unit underestimates, and even underestimates based on segment type (e.g. Int is at least 1, Double at least 3, Array at least 2, String may be 0, ...).

We can also have a default implementation that calls a non-reserving requirement, but I think having a parameter that conformers are free to ignore is simpler.

Excellent. I'll rebase on the current master and re-run my benchmarks; in the mean time, the results below do not reflect this fix.

I made variants of each of the three benchmarks we currently have. "StringInterpolation16" replaces an "n" with an "ñ" in a literal segment in the middle of the string; "StringInterpolationSmall16" replaces an "n" with an "ñ" in the first interpolated segment; and "StringInterpolationManySmallSegments16" adds an emoji at the very beginning. (Benchmark harness)

Here are the results. To keep the table smaller, I'm dropping the "BP" implementations; suffice it to say that the "16" variants didn't redeem them. Because the "current" value changes radically between "StringInterpolationSmall" and its "16" variant, 100 = current time on the non-"16" variant, and I'm showing "current" values for the "16" variants so you can see the current implementation's 16-bit penalty.

String/0 String/C String SS/Pow SS/Est current
StringInterpolation 81 102 102 68 69 —
StringInterpolation16 76 102 102 67 68 100
StringInterpolationSmall 50 50 116 131 132 —
StringInterpolationSmall16 117 117 120 126 127 181
StringInterpolationManySmallSegments 68 61 63 91 92 —
StringInterpolationManySmallSegments16 86 64 64 95 96 118

I think the bottom line is that the Unicode string efficiency gains from always using a 16-bit buffer are modest. On "StringInterpolationSmall16" and "StringInterpolationManySmallSegments16", the "String" variants see large drops in performance, but still do better than "SS".

Curiously, the "StringInterpolation" test—where "SS" beat "String" before—doesn't seem to be sensitive to 16-bit-ness at all. After investigating, what I think we're seeing here is the difference in their growth algorithms. I instrumented the "StringInterpolation" test to measure the actual capacity of its buffer before and after each segment is appended. It looks like on this particular test, the slightly more aggressive buffer growth algorithm used by "SS" saves us a reallocation and makes it faster than what String would do on its own:

Segment String/0 String/C String SS/Pow SS/Est
0 (35 estimate) 0 40 40 64 64
1 (19) 32 40 40 64 64
2 (+16=35) 64 40 40 64 64
3 (+50=85) 128 88 88 128 128
4 (+14=99) 128 176 176 128 128
Allocations 3 3 3 2 2

I'm thinking we should chalk this up to a quirk of the benchmark and conclude that the "String" implementation, with a corrected _StringGuts.reserveCapacity(_:), is our best bet; however, if we think this actually reflects something real, we can either look at String's growth algorithm or have String.StringInterpolationBuffer manually resize the string more aggressively. Merely using a less conservative interpolation size estimate might improve this, though.

That would be cool, but we've been discussing separating the estimate into a literal code unit count and an interpolated segment count. Would this mean that the interpolated segment count would become an estimate of interpolated code units? Or would you formally pass the interpolated segment count, but somehow detect where it was used and "correct" the estimate by rewriting the expression? Or would we do this by going back to passing a single, opaque estimate of the total code unit size?

1 Like

Results after rebasing on last night's master:

Example/Implementation String/0 String/C String SS/Pow SS/Est current
StringInterpolation 87 103 103 77 77 100
StringInterpolation16 82 109 109 79 79 100
StringInterpolationSmall 36 37 39 108 110 100
StringInterpolationSmall16 134 134 135 118 120 216
StringInterpolationManySmallSegments 68 59 60 75 76 100
StringInterpolationManySmallSegments16 89 67 71 82 84 124

Unsurprisingly, "String" still performs poorly on "StringInterpolation" and "StringInterpolation16", since the growth algorithm quirk hasn't changed. On "StringInterpolationSmall", "String" vastly outperforms "SS" on ASCII strings and does moderately worse on 16-bit strings. On "StringInterpolationManySmallSegments", "String" edges out "SS" in both tests (!). On all examples except the seemingly coincidentally bad "StringInterpolation", "String" beats the current design.

I think "String" is our winner.


So, let's review the open questions I started the thread with:

I don't think any were presented.

I didn't see much to imply that it could.

I think we've settled on overloads. We believe that the one-statement-per-interpolation design will avoid creating complex expressions that are hard on the type-checker, and will allow for the internal simplifications that @Slava_Pestov is hoping for.

We didn't talk too much about this, except to note that this is mostly just a syntax issue.

I think we've mostly settled on the latter, but I'm not sure how that meshes with @Michael_Ilseman's suggestion that we could use different estimates for different types of interpolations, and there's the dark horse "pass a literal containing all the literal segments together, and hope that the optimizer combines all the strings later" option.

I'm leaning towards inlining, but it sounds like this doesn't matter that much.

The real question turned out to be why "String" was so slow on the other tests, and the real answer turned out to be "reserveCapacity(_:) was disabling small string optimizations, and one of the tests happened to grow the buffer in a suboptimal pattern".

Michael thinks they're very common and we should care a lot about them.

There was a small issue in String.reserveCapacity(_:) which has been fixed, and which improved the performance of the "String" implementations.

"String". Now that the reserveCapacity(_:) issue has been fixed, it does better on most of the examples, except for one where it happens to grow its buffer less aggressively than "SS". The "SS" designs otherwise don't perform nearly as well with ASCII strings, get only a tiny gain on non-ASCII strings, and touch all sorts of internal stuff that we don't really want them to.

We don't know yet. Find out in our next episode—same Swift-Time, same Swift-Network!

3 Likes

I now have a basic implementation of code generation for this in the compiler and I'm working on fixing broken tests. Most of the test breakage seems to be a simple matter of programming, but there are two matters I will need some input on.

1. Partially initialized captures

There are a couple tests which expose a limitation of the proposed design: Because the string interpolation is a closure, it has to capture the variables it accesses, and if those variables aren't fully initialized, it can't. Here are two examples from the test suite that it can't handle:

// test/Serialization/typealias.swift:25
var named : ThreeNamedInts    // this is a typealias for (a: Int, b: Int, c: Int)
named.b = 64
print("\(named.b)\n", terminator: "")
// test/SILGen/witness-init-requirement-with-base-class-init.swift:13
  required convenience init() { self.init(species: "\(type(of: self))") }

I'm not sure what to do about this. If left alone, it's source-breaking, albeit in relatively rare cases. It seems like capturing only the parts of a partially-initialized instance that are actually used might work, but I don't know if that's realistic.

Is there an easy fix I'm not aware of (for example, something autoclosures do that we could borrow)? Is this issue serious enough that we'll need to delay adoption, or redesign to avoid closures? Or are we willing to accept a regression in functionality, either temporarily or not?

2. AST decisions

Setting a SemanticExpr is working well for generating code, but it's not great for diagnostics, IDE features, etc. The problem is that although the InterpolatedStringLiteralExpr still has an array of segment expressions, subsequent steps like constraint solving operate on the SemanticExpr and ignore the array of segment expressions. When something wants to access the segments, much of the information it's looking for can only be found by digging through the SemanticExpr.

I'm not sure what the best way to address this would be. One hacky solution is to update the Segments array at various points with expressions collected from the SemanticExpr. This might make a good stopgap.

A cleaner solution is to replace the segments array with a pointer to the closure node; people could access the segments through the statements in the closure's body. The full SemanticExpr would only be generated later, after type-checking was finished. That means changing clients of the AST to handle the segments through the closure, though.

A third, limited possibility: Outside of tooling, the only problem in the Swift compiler itself is diagnoseUnintendedOptionalBehavior(), which tries to inspect the types of interpolated segments after inference. We could change its implementation: We can add an overload like func appendInterpolation<T>(_ value: T?) and then have that pass look for calls to the overload instead of looking at the segment list. We could almost do this simply by adding the call and deprecating it, but that wouldn't allow fix-its.

Any thoughts on either of these issues?

Our experience using closures behind the scenes to implement features has been somewhat spotty, particularly with defer, where there's a long tail of issues that have leaked the closure-ness of the defer block. Since string interpolation is an expression, there's hopefully less surface area for bad interactions than there is with a control flow statement like defer, but I'm still a bit nervous about it, especially because you've already discovered source breakages as fallout. Maybe we could have an internal SequentiallyEvaluateExpr behind the scenes to evaluate a series of expressions in order, C comma operator or Rust semicolon style, for this. SILGen for such an expr should be straightforward.

That'd be handy, especially if it behaved as a boundary for type inference the way closures do (or only did type inference into the last expression). We could then do a design like:

protocol ExpressibleByStringInterpolation: ExpressibleByStringLiteral {
  associatedtype StringInterpolation: StringInterpolationProtocol where StringInterpolation.StringLiteralType == StringLiteralType
  init(stringInterpolation: StringInterpolation)
}
protocol StringInterpolation {
  associatedtype StringLiteralType: _ExpressibleByBuiltinStringLiteral
  
  init(totalLiteralSize: Int, interpolationCount: Int)
  mutating func appendLiteral(_ literal: StringLiteralType)
  // Informal: mutating func appendInterpolation(
)
}

// Generated code looks something like:
//   
//   #_doSeriesOfStatements {
//     var $interpolation = String.StringInterpolation(totalLiteralSize: 8, interpolationCount: 1)
//     $interpolation.appendLiteral("foo ")
//     $interpolation.appendInterpolation(bar)
//     $interpolation.appendLiteral(" baz")
//     String(stringInterpolation: $interpolation)
//   }

It'd be nice to have a solution that didn't involve digging into SIL-land, though.

Fun little discovery I made a couple days ago: What do you think this prints?

let x=1, y=2
print(“\(x, y)”)

Turns out that string interpolations are parsed as lists, so this implicitly forms a tuple. Oops.

Are we okay to put a deprecation warning and fix-it into Swift 4.2 telling the user to add an extra set of parentheses? That would allow Swift 5 to parse this as an argument list instead.

7 Likes

That seems like a good idea. Does it affect any code in the compatibility suite?

While attempting to answer that question, I ran across a source break in SwiftPM! The code in question is:

// `required` on the RHS is of type `[SwiftLanguageVersion]`;
// `SwiftLanguageVersion` conforms to `CustomStringConvertible`.
let required = required.map(String.init).joined(separator: ", ")

This worked before because one of the String.init(stringInterpolationSegment:) overloads was more specific than either String.init(describing:) or String.init(reflecting:). Now that the String.init(stringInterpolationSegment:)s are gone, it's ambiguous. (Adding init(stringInterpolationSegment:) back in a private extension makes the file compile correctly.)

This is totally busted, of course—there's no way the author thought they were using init(stringInterpolationSegment:)—but I bet other people have done this. I haven't looked at what it would take to warn here.

We could also leave these initializers in place, possibly with new underscored names. If we gave CustomStringConvertible and TextOutputStreamable a common underscored base protocol, we could reduce the overload set from three to one.

Yeah, that behavior's been a thorn in our side. The user almost certainly intends String.init(describing:) when they write String.init; I wonder if there's a principled way we can make that the favored choice. (Horrible idea: make init(reflecting:) take Any and init(describing:) generic for all T
)

I think it’d actually be the other way around—the type checker prefers concrete functions over generic ones.

We could fix this specific case by providing overloads of init(describing:) like we use for string interpolation. This would also have the benefit of giving us fast paths when you have those conformances. It wouldn’t make String.init select describing over reflecting for types that don’t conform to those protocols, though.

So the part of the compatibility suite that will build at all on my machine doesn’t seem to include any of these implicit tuple interpolations, but that doesn’t include anything with SwiftPM, because I couldn’t easily get it to work with my development copy. What should I do here? Submit a PR to apple/swift and have the CI bot run the suite there?

This seems like another situation in the long path towards labels being part of names, where it would probably be better to require String.init(describing:) here than trying to resolve this ambiguity through overload ranking. Maybe that is too onerous, though.

What if we treat:

  foo(String.init)

as

  foo({ String.init($0) })

and require the more explicit

  foo(String.init(describing:))

if they really want that particular overload.

In addition to being completely consistent with what a user would get if they manually expand the body of foo replacing the function parameter with "String.init", we would get the added benefit that we should be able to make default arguments work.

You get similar issues in other contexts so I would prefer if it could be solved in a way that isn't a special case.

struct S: CustomStringConvertible {
  public var description: String = "S"
}

let f1: (S) -> String = String.init // picks init<T>(stringInterpolationSegment expr: T) where T : CustomStringConvertible
let f2: (S) -> String = String.init(_:) // error: ambiguous reference to member 'init'

But this is only really an issue with the heavily overloaded members like initialisers, and is very convenient and clear in cases with no overloading. It would be nice if there was a way to limit inference (e.g. something about only picking between options with the same full name, including labels, so you're not mixing init(_:), init(describing:), init(stringInterpolationSegment:), etc) but I can't think of a clear set of rules that is reasonable and retains most of the convenience.

1 Like