String interpolation revamp: design decisions

Slava_Pestov · May 17, 2018, 5:12am

Right. My very limited understanding of the constraint solver is that the presence of multiple overloads alone is not what causes exponential behavior -- it's the presence of multiple overloads in the same strongly connected component of the constraint graph.

If each interpolation segment's type is its own independent type variable that is part of its own connected component, it should be possible to solve the constraint system in linear time, but perhaps @rudkx or @Douglas_Gregor can clarify things here.

Sounds like a reasonable approach that also eliminates the need to have something like typeCheckExpressionShallow(). In the mean time, we can either keep this entry point, or I can hack up string interpolation to hard-code the current set of overloads, depending on how much effort I decide to put into some on-going CSApply cleanups I'm working on.

rudkx · May 17, 2018, 5:43am

Yes, that's basically true.

One way to think about it is that we have constraint variables with domains. Those domains are either overload choices or types. If you have multiple constraint variables in a connected component that each have more than one element in their domain, you can potentially end up with exponential behavior. You won't necessarily end up with exponential behavior, though. If you can fail quickly while trying elements of the domain (e.g. you immediately know that a particular overload or particular type cannot work), you don't end up with exponential behavior.

I haven't had a chance to read through the proposal and the rest of this thread but I'll try to get back to it.

jawbroken · May 17, 2018, 5:57am

To follow up from @benrimmington's mention of my post in the earlier thread about this, I think that this part of your post is basically what I was saying at the time. i.e. not that there would be reserved space for the non-literal segments in a single String, just that you could (theoretically) avoid the overhead of having several separate Strings for the appendLiteralSegment(_:) calls by having a single String containing all the appended literal sections and then some way of indicating where the boundaries are (e.g. indices, Substrings, whatever). I have no idea how that would work out in practice, or how it would work with the StringLiteralType in your new proposal (edit: looking into the details, the only valid types for _ExpressibleByBuiltinStringLiteral are String or StaticString so it might be possible), though.

Great work, and really interesting writeup.

beccadax · May 17, 2018, 9:04am

What I'm talking about is more of a low-level implementation detail that wouldn't be visible to the caller. Here's the deal:

Firstly, thanks to some excellent work I believe by @Michael_Ilseman, there really isn't much overhead to a String backed by a string literal anymore. It's pretty much just a pointer to constant memory, a length, and some flags packed away somewhere. Pretty cool, actually.

So, imagine you have the expression "foo \(bar) baz" as MyString in your code, and MyString uses String as its StringLiteralType. If we pass a full StringLiteralType with all the literal segments to the initializer for ExpressibleByStringInterpolation, that would translate to code like:

MyString(stringLiteral: "foo  baz", withInterpolations: 2, builtBy: { buffer in
  buffer.appendLiteralSegment("foo ")
  buffer.appendInterpolatedSegment(bar)
  buffer.appendLiteralSegment(" baz")
})

But that's only the beginning of the story. Later on in compilation, we break down those string literals into _ExpressibleByBuiltinStringLiteral calls. I don't know precisely when or how this is done, but the compiler creates a table of string data and includes it in the binary. For this code, the string table might include (␠ represents space):

Offset	+0	+1	+2	+3	+4	+5	+6	+7
0xBEE0	`f`	`o`	`o`	`␠`	`␠`	`b`	`a`	`z`
0xBEE8	`f`	`o`	`o`	`␠`	`␠`	`b`	`a`	`z`

The above code sample is then translated into something like:

MyString(stringLiteral: String(_builtinStringLiteral: stringsTable + 0xBEE0, count: 8, isASCII: true), withInterpolations: 2, builtBy: { buffer in
  buffer.appendLiteralSegment(String(_builtinStringLiteral: stringsTable + 0xBEE8, count: 4, isASCII: true))
  buffer.appendInterpolatedSegment(bar)
  buffer.appendLiteralSegment(String(_builtinStringLiteral: stringsTable + 0xBEEC, count: 4, isASCII: true))
})

Notice, though, that this is redundant: The characters in 0xBEE0–0xBEE7 are the same as the ones in 0xBEE8–0xBEEF. So there's no reason we couldn't delete the second:

Offset	+0	+1	+2	+3	+4	+5	+6	+7
0xBEE0	`f`	`o`	`o`	`␠`	`␠`	`b`	`a`	`z`

And write our code to use the first for both:

MyString(stringLiteral: String(_builtinStringLiteral: stringsTable + 0xBEE0, count: 8, isASCII: true), withInterpolations: 2, builtBy: { buffer in
  buffer.appendLiteralSegment(String(_builtinStringLiteral: stringsTable + 0xBEE0, count: 4, isASCII: true))
  buffer.appendInterpolatedSegment(bar)
  buffer.appendLiteralSegment(String(_builtinStringLiteral: stringsTable + 0xBEE4, count: 4, isASCII: true))
})

Each literal is a fully-formed String instance, completely unaware of any of the others, and yet they're all backed by the same memory, so they don't bloat the binary. What's different from what you're talking about is that there are no indices being communicated or Substrings being created; indeed, there's no formal relationship between the literals at all. It's just that they share memory behind the scenes in such a way that giving the initializer a sneak peek doesn't really cost us anything.

I did a little research and I'm pretty sure the LLVM ConstantMergePass implements this behavior, but I'm not sure if we're using it. (Can someone confirm?) If we are, this would probably work already.

benrimmington · May 17, 2018, 9:16pm

I wasn't suggesting that the string be edited in-place and compacted. Instead, the initial string would be used to avoid the String.reserveCapacity(_:) method, which is documented as taking a "minimum number of ASCII character's worth of storage to allocate." The storage would then be overwritten using removeAll(keepingCapacity: true), and segments would be appended as per your proposed design.

The number of embedded nulls for each placeholder would only be the average size of typical interpolated values. They could also be beneficial to types such as LocalizableString and SQLiteStatement, which only need a few extra bytes for each %@ or ? segment.

Michael_Ilseman · May 17, 2018, 10:12pm

beccadax:

So we'd have something like this:
protocol StringInterpolationBufferProtocol {
  associatedtype StringLiteralType: _BuiltinExpressibleByStringLiteral
  mutating func appendLiteralSegment(_ literal: StringLiteralType)
  
  mutating func appendInterpolatedSegment<T>(_ value: T)
  mutating func appendInterpolatedSegment<T: CustomStringConvertible>(_ value: T)
  mutating func appendInterpolatedSegment<T: TextOutputStream>(_ value: T)
}
And probably provide default implementations for the constrained ones so you could define only the unconstrained generic to satisfy all of them? It's inelegant, but it would get the job done.

nit: s/TextOutputStream/TextOutputStreamable/, as the buffer itself is the stream.

Oh, you mean for the standard library's conformance? I don't see much of a downside either way. If it's inlinable and that's beneficial we'll do it. All it means is that a future improved tuning would benefit apps after they are re-compiled, but presumably the prior tuning is already acceptable.

Right, the interpolated segments are pretty small. The final value length will be the sum of the literal segment's code unit count and the interpolated segments count. E.g. if the literal segments add up to 10, and we interpolate one Int that represents a count, I still expect the final result to be small.

We store the actual allocation size

If something's slow, we should fix it. So, I opened a fix :-)

It's a little over 2x speedup when the requested capacity is small. So, all that's left is the (relatively) modest overhead of the function call and early-exit branch.

UTF-16 examples would be a good idea!

ASCII is still very important, not because US-English is, but because most data is formatted into ASCII and many interpolation usage is for logging and other situations which happen to be ASCII regardless of the user's (or even developer's) language. E.g., your example conformer for SQL.

That, or just make it fast and/or expose a more direct API. If you are able to extract a micro benchmark to measure just this effect, I can poke at it some more. Otherwise, we can try to optimize it later.

I'm not as violently opposed to a StringBuilder as some; I think ropes and twines are neat and could be valuable. But, as presented here it looks more like a TextOutputStream, which String already is.

beccadax:

This bit of the design is a bit ugly, I agree, but we don't want to give up the pre-allocation. Some tweaks that might make it more palatable:

Provide the number of Unicode scalars, not code units, in the literal segments. Types can then estimate a code unit size from that.

Provide a string literal containing all the literal segments concatenated together. This would then hide all the messiness of what different sizes mean behind our existing _ExpressibleByBuiltinXStringLiteral protocols. Types could introspect the literal segments to discover anything they needed to know—whether it's length in any encoding, ASCII-ness, or whatever else. In theory, the individual appendLiteralSegment(_:) calls could be backed by the same memory, but I don't know how difficult it would be for the compiler to actually generate code like that.

FWIW, String's reserveCapacity is already in terms of "ASCII characters" (meaning code units... CR-LF makes everything painful). The Swift compiler can easily give ASCII/UTF-8 code unit underestimates, and even underestimates based on segment type (e.g. Int is at least 1, Double at least 3, Array at least 2, String may be 0, ...).

We can also have a default implementation that calls a non-reserving requirement, but I think having a parameter that conformers are free to ignore is simpler.

beccadax · May 20, 2018, 1:20am

Excellent. I'll rebase on the current master and re-run my benchmarks; in the mean time, the results below do not reflect this fix.

I made variants of each of the three benchmarks we currently have. "StringInterpolation16" replaces an "n" with an "ñ" in a literal segment in the middle of the string; "StringInterpolationSmall16" replaces an "n" with an "ñ" in the first interpolated segment; and "StringInterpolationManySmallSegments16" adds an emoji at the very beginning. (Benchmark harness)

Here are the results. To keep the table smaller, I'm dropping the "BP" implementations; suffice it to say that the "16" variants didn't redeem them. Because the "current" value changes radically between "StringInterpolationSmall" and its "16" variant, 100 = current time on the non-"16" variant, and I'm showing "current" values for the "16" variants so you can see the current implementation's 16-bit penalty.

	String/0	String/C	String	SS/Pow	SS/Est	current
StringInterpolation	81	102	102	68	69	—
StringInterpolation16	76	102	102	67	68	100
StringInterpolationSmall	50	50	116	131	132	—
StringInterpolationSmall16	117	117	120	126	127	181
StringInterpolationManySmallSegments	68	61	63	91	92	—
StringInterpolationManySmallSegments16	86	64	64	95	96	118

I think the bottom line is that the Unicode string efficiency gains from always using a 16-bit buffer are modest. On "StringInterpolationSmall16" and "StringInterpolationManySmallSegments16", the "String" variants see large drops in performance, but still do better than "SS".

Curiously, the "StringInterpolation" test—where "SS" beat "String" before—doesn't seem to be sensitive to 16-bit-ness at all. After investigating, what I think we're seeing here is the difference in their growth algorithms. I instrumented the "StringInterpolation" test to measure the actual capacity of its buffer before and after each segment is appended. It looks like on this particular test, the slightly more aggressive buffer growth algorithm used by "SS" saves us a reallocation and makes it faster than what String would do on its own:

Segment	String/0	String/C	String	SS/Pow	SS/Est
0 (35 estimate)	0	40	40	64	64
1 (19)	32	40	40	64	64
2 (+16=35)	64	40	40	64	64
3 (+50=85)	128	88	88	128	128
4 (+14=99)	128	176	176	128	128
Allocations	3	3	3	2	2

I'm thinking we should chalk this up to a quirk of the benchmark and conclude that the "String" implementation, with a corrected _StringGuts.reserveCapacity(_:), is our best bet; however, if we think this actually reflects something real, we can either look at String's growth algorithm or have String.StringInterpolationBuffer manually resize the string more aggressively. Merely using a less conservative interpolation size estimate might improve this, though.

That would be cool, but we've been discussing separating the estimate into a literal code unit count and an interpolated segment count. Would this mean that the interpolated segment count would become an estimate of interpolated code units? Or would you formally pass the interpolated segment count, but somehow detect where it was used and "correct" the estimate by rewriting the expression? Or would we do this by going back to passing a single, opaque estimate of the total code unit size?

beccadax · May 21, 2018, 5:29am

Results after rebasing on last night's master:

Example/Implementation	String/0	String/C	String	SS/Pow	SS/Est	current
StringInterpolation	87	103	103	77	77	100
StringInterpolation16	82	109	109	79	79	100
StringInterpolationSmall	36	37	39	108	110	100
StringInterpolationSmall16	134	134	135	118	120	216
StringInterpolationManySmallSegments	68	59	60	75	76	100
StringInterpolationManySmallSegments16	89	67	71	82	84	124

Unsurprisingly, "String" still performs poorly on "StringInterpolation" and "StringInterpolation16", since the growth algorithm quirk hasn't changed. On "StringInterpolationSmall", "String" vastly outperforms "SS" on ASCII strings and does moderately worse on 16-bit strings. On "StringInterpolationManySmallSegments", "String" edges out "SS" in both tests (!). On all examples except the seemingly coincidentally bad "StringInterpolation", "String" beats the current design.

I think "String" is our winner.

So, let's review the open questions I started the thread with:

I don't think any were presented.

I didn't see much to imply that it could.

I think we've settled on overloads. We believe that the one-statement-per-interpolation design will avoid creating complex expressions that are hard on the type-checker, and will allow for the internal simplifications that @Slava_Pestov is hoping for.

We didn't talk too much about this, except to note that this is mostly just a syntax issue.

I think we've mostly settled on the latter, but I'm not sure how that meshes with @Michael_Ilseman's suggestion that we could use different estimates for different types of interpolations, and there's the dark horse "pass a literal containing all the literal segments together, and hope that the optimizer combines all the strings later" option.

I'm leaning towards inlining, but it sounds like this doesn't matter that much.

The real question turned out to be why "String" was so slow on the other tests, and the real answer turned out to be "reserveCapacity(_:) was disabling small string optimizations, and one of the tests happened to grow the buffer in a suboptimal pattern".

Michael thinks they're very common and we should care a lot about them.

There was a small issue in String.reserveCapacity(_:) which has been fixed, and which improved the performance of the "String" implementations.

"String". Now that the reserveCapacity(_:) issue has been fixed, it does better on most of the examples, except for one where it happens to grow its buffer less aggressively than "SS". The "SS" designs otherwise don't perform nearly as well with ASCII strings, get only a tiny gain on non-ASCII strings, and touch all sorts of internal stuff that we don't really want them to.

We don't know yet. Find out in our next episode—same Swift-Time, same Swift-Network!

beccadax · May 25, 2018, 9:20pm

I now have a basic implementation of code generation for this in the compiler and I'm working on fixing broken tests. Most of the test breakage seems to be a simple matter of programming, but there are two matters I will need some input on.

1. Partially initialized captures

There are a couple tests which expose a limitation of the proposed design: Because the string interpolation is a closure, it has to capture the variables it accesses, and if those variables aren't fully initialized, it can't. Here are two examples from the test suite that it can't handle:

// test/Serialization/typealias.swift:25
var named : ThreeNamedInts    // this is a typealias for (a: Int, b: Int, c: Int)
named.b = 64
print("\(named.b)\n", terminator: "")

// test/SILGen/witness-init-requirement-with-base-class-init.swift:13
  required convenience init() { self.init(species: "\(type(of: self))") }

I'm not sure what to do about this. If left alone, it's source-breaking, albeit in relatively rare cases. It seems like capturing only the parts of a partially-initialized instance that are actually used might work, but I don't know if that's realistic.

Is there an easy fix I'm not aware of (for example, something autoclosures do that we could borrow)? Is this issue serious enough that we'll need to delay adoption, or redesign to avoid closures? Or are we willing to accept a regression in functionality, either temporarily or not?

2. AST decisions

Setting a SemanticExpr is working well for generating code, but it's not great for diagnostics, IDE features, etc. The problem is that although the InterpolatedStringLiteralExpr still has an array of segment expressions, subsequent steps like constraint solving operate on the SemanticExpr and ignore the array of segment expressions. When something wants to access the segments, much of the information it's looking for can only be found by digging through the SemanticExpr.

I'm not sure what the best way to address this would be. One hacky solution is to update the Segments array at various points with expressions collected from the SemanticExpr. This might make a good stopgap.

A cleaner solution is to replace the segments array with a pointer to the closure node; people could access the segments through the statements in the closure's body. The full SemanticExpr would only be generated later, after type-checking was finished. That means changing clients of the AST to handle the segments through the closure, though.

A third, limited possibility: Outside of tooling, the only problem in the Swift compiler itself is diagnoseUnintendedOptionalBehavior(), which tries to inspect the types of interpolated segments after inference. We could change its implementation: We can add an overload like func appendInterpolation<T>(_ value: T?) and then have that pass look for calls to the overload instead of looking at the segment list. We could almost do this simply by adding the call and deprecating it, but that wouldn't allow fix-its.

Any thoughts on either of these issues?

Joe_Groff · May 25, 2018, 9:25pm

Our experience using closures behind the scenes to implement features has been somewhat spotty, particularly with defer, where there's a long tail of issues that have leaked the closure-ness of the defer block. Since string interpolation is an expression, there's hopefully less surface area for bad interactions than there is with a control flow statement like defer, but I'm still a bit nervous about it, especially because you've already discovered source breakages as fallout. Maybe we could have an internal SequentiallyEvaluateExpr behind the scenes to evaluate a series of expressions in order, C comma operator or Rust semicolon style, for this. SILGen for such an expr should be straightforward.

beccadax · May 25, 2018, 9:36pm

That'd be handy, especially if it behaved as a boundary for type inference the way closures do (or only did type inference into the last expression). We could then do a design like:

protocol ExpressibleByStringInterpolation: ExpressibleByStringLiteral {
  associatedtype StringInterpolation: StringInterpolationProtocol where StringInterpolation.StringLiteralType == StringLiteralType
  init(stringInterpolation: StringInterpolation)
}
protocol StringInterpolation {
  associatedtype StringLiteralType: _ExpressibleByBuiltinStringLiteral
  
  init(totalLiteralSize: Int, interpolationCount: Int)
  mutating func appendLiteral(_ literal: StringLiteralType)
  // Informal: mutating func appendInterpolation(…)
}

// Generated code looks something like:
//   
//   #_doSeriesOfStatements {
//     var $interpolation = String.StringInterpolation(totalLiteralSize: 8, interpolationCount: 1)
//     $interpolation.appendLiteral("foo ")
//     $interpolation.appendInterpolation(bar)
//     $interpolation.appendLiteral(" baz")
//     String(stringInterpolation: $interpolation)
//   }

It'd be nice to have a solution that didn't involve digging into SIL-land, though.

beccadax · June 5, 2018, 5:31pm

Fun little discovery I made a couple days ago: What do you think this prints?

let x=1, y=2
print(“\(x, y)”)

Turns out that string interpolations are parsed as lists, so this implicitly forms a tuple. Oops.

Are we okay to put a deprecation warning and fix-it into Swift 4.2 telling the user to add an extra set of parentheses? That would allow Swift 5 to parse this as an argument list instead.

Joe_Groff · June 5, 2018, 5:34pm

That seems like a good idea. Does it affect any code in the compatibility suite?

beccadax · June 6, 2018, 1:49am

While attempting to answer that question, I ran across a source break in SwiftPM! The code in question is:

// `required` on the RHS is of type `[SwiftLanguageVersion]`;
// `SwiftLanguageVersion` conforms to `CustomStringConvertible`.
let required = required.map(String.init).joined(separator: ", ")

This worked before because one of the String.init(stringInterpolationSegment:) overloads was more specific than either String.init(describing:) or String.init(reflecting:). Now that the String.init(stringInterpolationSegment:)s are gone, it's ambiguous. (Adding init(stringInterpolationSegment:) back in a private extension makes the file compile correctly.)

This is totally busted, of course—there's no way the author thought they were using init(stringInterpolationSegment:)—but I bet other people have done this. I haven't looked at what it would take to warn here.

We could also leave these initializers in place, possibly with new underscored names. If we gave CustomStringConvertible and TextOutputStreamable a common underscored base protocol, we could reduce the overload set from three to one.

Joe_Groff · June 6, 2018, 7:57pm

Yeah, that behavior's been a thorn in our side. The user almost certainly intends String.init(describing:) when they write String.init; I wonder if there's a principled way we can make that the favored choice. (Horrible idea: make init(reflecting:) take Any and init(describing:) generic for all T…)

beccadax · June 6, 2018, 8:03pm

I think it’d actually be the other way around—the type checker prefers concrete functions over generic ones.

We could fix this specific case by providing overloads of init(describing:) like we use for string interpolation. This would also have the benefit of giving us fast paths when you have those conformances. It wouldn’t make String.init select describing over reflecting for types that don’t conform to those protocols, though.

beccadax · June 7, 2018, 3:10am

So the part of the compatibility suite that will build at all on my machine doesn’t seem to include any of these implicit tuple interpolations, but that doesn’t include anything with SwiftPM, because I couldn’t easily get it to work with my development copy. What should I do here? Submit a PR to apple/swift and have the CI bot run the suite there?

jawbroken · June 7, 2018, 5:04am

This seems like another situation in the long path towards labels being part of names, where it would probably be better to require String.init(describing:) here than trying to resolve this ambiguity through overload ranking. Maybe that is too onerous, though.

rudkx · June 7, 2018, 7:09am

What if we treat:

  foo(String.init)

as

  foo({ String.init($0) })

and require the more explicit

  foo(String.init(describing:))

if they really want that particular overload.

In addition to being completely consistent with what a user would get if they manually expand the body of foo replacing the function parameter with "String.init", we would get the added benefit that we should be able to make default arguments work.

jawbroken · June 7, 2018, 12:40pm

You get similar issues in other contexts so I would prefer if it could be solved in a way that isn't a special case.

struct S: CustomStringConvertible {
  public var description: String = "S"
}

let f1: (S) -> String = String.init // picks init<T>(stringInterpolationSegment expr: T) where T : CustomStringConvertible
let f2: (S) -> String = String.init(_:) // error: ambiguous reference to member 'init'

But this is only really an issue with the heavily overloaded members like initialisers, and is very convenient and clear in cases with no overloading. It would be nice if there was a way to limit inference (e.g. something about only picking between options with the same full name, including labels, so you're not mixing init(_:), init(describing:), init(stringInterpolationSegment:), etc) but I can't think of a clear set of rules that is reasonable and retains most of the convenience.