[Draft] Fix ExpressibleByStringInterpolation


(Brent Royal-Gordon) #1

...is the title of a proposal I wrote today. The full proposal is available at:

  <https://github.com/brentdax/swift-evolution/blob/bdce720e9d0f7015f8741716035198c41fa117a0/proposals/NNNN-fix-expressible-by-string-interpolation.md>

But the tl;dr is that, if you write this:

  "Hello, \(name)!"

Instead of generating this:

  .init(stringInterpolation:
    .init(stringInterpolationSegment: .init(stringLiteral: "Hello, ")),
    .init(stringInterpolationSegment: name),
    .init(stringInterpolationSegment: .init(stringLiteral: "!"))
  )

We should generate this:

  .init(stringInterpolation:
    .init(stringLiteral: "Hello, "),
    .init(stringInterpolationSegment: name),
    .init(stringLiteral: "!")
  )

I actually have an implementation of it, available here:

  <https://github.com/apple/swift/compare/master...brentdax:new-interpolation>

But I'm literally messing with the constraint generator the very first time I modify the compiler, so I'm assuming it'll need some serious code review before it actually gets pulled in. (All the regular tests *do* at least pass.)

Despite the word "fix" in the title, this really only does half the job; I think we also need to tackle string formatting, at least by adding a mechanism, if not by settling on exact APIs. The proposal includes a sketch of my thoughts about that as well.

Comments extremely welcome.

···

--
Brent Royal-Gordon
Architechies


(Brent Royal-Gordon) #2

I've already made a couple of small updates. This URL will reflect them as I make them:

  <https://github.com/brentdax/swift-evolution/blob/new-interpolation/proposals/NNNN-fix-expressible-by-string-interpolation.md>

···

On Mar 9, 2017, at 6:18 PM, Brent Royal-Gordon <brent@architechies.com> wrote:

...is the title of a proposal I wrote today. The full proposal is available at:

  <https://github.com/brentdax/swift-evolution/blob/bdce720e9d0f7015f8741716035198c41fa117a0/proposals/NNNN-fix-expressible-by-string-interpolation.md>

--
Brent Royal-Gordon
Architechies


(Joe Groff) #3

Having ExpressibleByStringInterpolation refine ExpressibleByStringLiteral makes sense. I think there's a more powerful alternative design you should also consider. If the protocol looked like this:

protocol ExpressibleByStringInterpolation: ExpressibleByStringLiteral {
  associatedtype LiteralSegment: ExpressibleByStringLiteral
  associatedtype InterpolatedSegment
  init(forStringInterpolation: Void)

  mutating func append(literalSegment: LiteralSegment)
  mutating func append(interpolatedSegment: InterpolatedSegment)
}

Then an interpolation expression like this in `Thingy` type context:

"foo \(bar) bas \(zim: 1, zang: 2)\n"

could desugar to something like:

{
  var x = Thingy(forStringInterpolation: ())
  // Literal segments get appended using append(literalSegment: "literal")
  x.append(literalSegment: "foo ")
  // \(...) segments are arguments to a InterpolatedSegment constructor
  x.append(interpolatedSegment: Thingy.InterpolatedSegment(bar))
  x.append(literalSegment: " bas ")
  x.append(interpolatedSegment: Thingy.InterpolatedSegment(zim: 1, zang: 2))

  return x
}()

This design should be more efficient, since there's no temporary array of segments that needs to be formed for a variadic argument, you don't need to homogenize everything to Self type up front, and the string can be built up in-place. It also provides means to address problems 3 and 4, since the InterpolatedSegment associated type can control what types it's initializable from, and can provide initializers with additional arguments for formatting or other purposes.

-Joe


(David Waite) #4

Hi Joe,

The trade-offs for this approach would be:
- each append would need to return a valid object w.r.t the type’s invariants.
- an implementation could use the init(stringInterpolation:) could be a final building step, while append would not indicate that the object construction was complete.

One example where this could be a problem would be if someone used the segments to build up a localized representation of the interpolated string.

-DW

···

On Mar 10, 2017, at 9:49 AM, Joe Groff via swift-evolution <swift-evolution@swift.org> wrote:

Having ExpressibleByStringInterpolation refine ExpressibleByStringLiteral makes sense. I think there's a more powerful alternative design you should also consider. If the protocol looked like this:

protocol ExpressibleByStringInterpolation: ExpressibleByStringLiteral {
  associatedtype LiteralSegment: ExpressibleByStringLiteral
  associatedtype InterpolatedSegment
  init(forStringInterpolation: Void)

  mutating func append(literalSegment: LiteralSegment)
  mutating func append(interpolatedSegment: InterpolatedSegment)
}

Then an interpolation expression like this in `Thingy` type context:

"foo \(bar) bas \(zim: 1, zang: 2)\n"

could desugar to something like:

{
  var x = Thingy(forStringInterpolation: ())
  // Literal segments get appended using append(literalSegment: "literal")
  x.append(literalSegment: "foo ")
  // \(...) segments are arguments to a InterpolatedSegment constructor
  x.append(interpolatedSegment: Thingy.InterpolatedSegment(bar))
  x.append(literalSegment: " bas ")
  x.append(interpolatedSegment: Thingy.InterpolatedSegment(zim: 1, zang: 2))

  return x
}()

This design should be more efficient, since there's no temporary array of segments that needs to be formed for a variadic argument, you don't need to homogenize everything to Self type up front, and the string can be built up in-place. It also provides means to address problems 3 and 4, since the InterpolatedSegment associated type can control what types it's initializable from, and can provide initializers with additional arguments for formatting or other purposes.


(Brent Royal-Gordon) #5

On the other hand, you end up with an `init(forStringInterpolation: ())` initializer which is explicitly intended to return an incompletely initialized instance. I don't enjoy imagining this. For instance, you might find yourself having to change certain properties from `let` to `var` so that the `append` methods can operate.

If we *do* go this direction, though, I might suggest a slightly different design which uses fewer calls and makes the finalization explicit:

  protocol ExpressibleByStringLiteral {
    associatedtype StringLiteralSegment: ExpressibleByStringLiteral
    
    init(startingStringLiteral: ())
    func endStringLiteral(with segment: StringLiteralSegment)
  }
  protocol ExpressibleByStringInterpolation: ExpressibleByStringLiteral {
    associatedtype StringInterpolationSegment
    
    func continueStringLiteral(with literal: StringLiteralSegment, followedBy interpolation: StringInterpolationSegment)
  }

Your `"foo \(bar) bas \(zim: 1, zang: 2)\n"` example would then become:

  {
    var x = Thingy(startingStringLiteral: ())
    x.continueStringLiteral(with: "Foo ", followedBy: .init(bar))
    x.continueStringLiteral(with: " bas ", followedBy: .init(zim: 1, zang: 2))
    x.endStringLiteral(with: "\n")
    return x
  }

While a plain old string literal would have a more complicated pattern than they do currently, but one which would have completely compatible semantics with an interpolated string:

  {
    var x = Thingy(startingStringLiteral: ())
    x.endStringLiteral(with: "Hello, world!")
    return x
  }

* * *

Another possible design would separate the intermediate type from the final one. For instance, suppose we had:

  protocol ExpressibleByStringInterpolation: ExpressibleByStringLiteral {
    associatedtype StringInterpolationBuffer = Self
    associatedtype StringInterpolationType
    
    static func makeStringLiteralBuffer(startingWith firstLiteralSegment: StringLiteralType) -> StringLiteralBuffer
    static func appendInterpolationSegment(_ expr: StringInterpolationType, to stringLiteralBuffer: inout StringLiteralBuffer)
    static func appendLiteralSegment(_ string: StringLiteralType, to stringLiteralBuffer: inout StringLiteralBuffer)
  
    init(stringInterpolation buffer: StringInterpolationBuffer)
  }
  // Automatically provide a parent protocol conformance
  extension ExpressibleByStringInterpolation {
    init(stringLiteral: StringLiteralType) {
      let buffer = Self.makeStringLiteralBuffer(startingWith: stringLiteral)
      self.init(stringInterpolation: buffer)
    }
  }

Then your example would be:

  {
    var buffer = Thingy.makeStringLiteralBuffer(startingWith: "foo ")
    Thingy.appendInterpolationSegment(Thingy.StringInterpolationSegment(bar), to: &buffer)
    Thingy.appendLiteralSegment(" bas ", to: &buffer)
    Thingy.appendInterpolationSegment(Thingy.StringInterpolationSegment(zim: 1, zang: 2), to: &buffer)
    Thingy.appendLiteralSegment("\n", to: &buffer)

    return Thingy(stringInterpolation: x)
  }()

For arbitrary string types, `StringInterpolationBuffer` would probably be `Self`, but if you had something which could only create an instance of itself once the entire literal was gathered together, it could use `String` or `Array` or whatever else it wanted.

* * *

One more design possibility. Would it make sense to handle all the segments in a single initializer call, instead of having one call for each segment, plus a big call at the end? Suppose we did this:

  protocol ExpressibleByStringInterpolation: ExpressibleByStringLiteral {
    associatedtype StringInterpolationType
    
    init(stringInterpolation segments: StringInterpolationSegment...)
  }
  @fixed_layout enum StringInterpolationSegment<StringType: ExpressibleByStringInterpolation> {
    case literal(StringType.StringLiteralType)
    case interpolation(StringType.StringInterpolationType)
  }
  extension ExpressibleByStringInterpolation {
    typealias StringInterpolationSegment = Swift.StringInterpolationSegment<Self>
    
    init(stringLiteral: StringLiteralType) {
      self.init(stringInterpolation: .literal(stringLiteral))
    }
  }

Code pattern would look like this:

  Thingy(stringInterpolation:
    .literal("Foo "),
    .interpolation(.init(bar)),
    .literal(" bas "),
    .interpolation(.init(zim: 1, zang: 2)),
    .literal("\n")
  )

I suppose it just depends on whether the array or the extra calls are more costly. (Well, it also depends on whether we want to be expanding single expressions into big, complex, multi-statement messes like we discussed before.)

(Actually, I realized after writing this that you mentioned a similar design downthread. Oops.)

* * *

As for validation, which is mentioned downthread: I think we will really want plain old string literal validation to happen at compile time. Doing that in a general way means macros, so that's just not in the cards yet.

However, once we *do* have that, I think we can actually handle runtime-failable interpolated literals pretty easily. For this example, I'll assume we adopt the `StringInterpolationSegment`-enum-based option, but any of them could be adapted in the same way:

  protocol ExpressibleByFailableStringInterpolation: ExpressibleByStringLiteral {
    associatedtype StringInterpolationType
    
    init(stringInterpolation: StringInterpolationSegment...)
  }
  extension ExpressibleByFailableStringInterpolation {
    typealias StringInterpolationSegment = Swift.StringInterpolationSegment<Self?>
    
    init(stringLiteral: StringLiteralType) {
      self.init(stringInterpolation segments: .literal(stringLiteral))
    }
  }
  extension Optional: ExpressibleByStringInterpolation where Wrapped: ExpressibleByFailableStringInterpolation {
    typealias StringLiteralType = Wrapped.StringLiteralType
    typealias StringInterpolationType = Wrapped.StringInterpolationType
    
    init(stringInterpolation segments: StringInterpolationSegment...) {
      self = Wrapped(stringInterpolation: segments)
    }
  }

If we think we'd rather support throwing inits instead of failable inits, that could be supported directly by `ExpressibleByStringInterpolation` if we get throwing types and support `Never` as a "doesn't throw" type.

* * *

Related question: Can the construction of the variadic parameter array be optimized? For instance, the arity of any given call site is known at compile time; can the array buffer be allocated on the stack and somehow marked so that attempting to retain it (while copying the `Array` instance) will copy it into the heap? (Are we doing that already?) I suspect that would make variadic calls a lot cheaper, perhaps enough so that we just don't need to worry about this problem at all.

···

On Mar 10, 2017, at 8:49 AM, Joe Groff <jgroff@apple.com> wrote:

I think there's a more powerful alternative design you should also consider. If the protocol looked like this:

protocol ExpressibleByStringInterpolation: ExpressibleByStringLiteral {
  associatedtype LiteralSegment: ExpressibleByStringLiteral
  associatedtype InterpolatedSegment
  init(forStringInterpolation: Void)

  mutating func append(literalSegment: LiteralSegment)
  mutating func append(interpolatedSegment: InterpolatedSegment)
}

Then an interpolation expression like this in `Thingy` type context:

"foo \(bar) bas \(zim: 1, zang: 2)\n"

could desugar to something like:

{
  var x = Thingy(forStringInterpolation: ())
  // Literal segments get appended using append(literalSegment: "literal")
  x.append(literalSegment: "foo ")
  // \(...) segments are arguments to a InterpolatedSegment constructor
  x.append(interpolatedSegment: Thingy.InterpolatedSegment(bar))
  x.append(literalSegment: " bas ")
  x.append(interpolatedSegment: Thingy.InterpolatedSegment(zim: 1, zang: 2))

  return x
}()

This design should be more efficient, since there's no temporary array of segments that needs to be formed for a variadic argument, you don't need to homogenize everything to Self type up front, and the string can be built up in-place. It also provides means to address problems 3 and 4, since the InterpolatedSegment associated type can control what types it's initializable from, and can provide initializers with additional arguments for formatting or other purposes.

--
Brent Royal-Gordon
Architechies


(Joe Groff) #6

Validation is a general problem with the literal protocols, since none of the literal protocols allow for failed initialization, and if you can write "foo \(bar) bas", you can write "foo " or "foo \(bar)", so you need to have a representation for those intermediate states already. I think allowing the literal and interpolated types to be different is important. You could achieve that with an initializer that took a variadic list of enums, perhaps:

protocol ExpressibleByStringInterpolation: ExpressibleByStringLiteral {
  associatedtype LiteralSegment: ExpressibleByStringLiteral
  associatedtype InterpolatedSegment

  enum Segment { case literal(LiteralSegment), interpolated(InterpolatedSegment) }

  init(stringInterpolation: Segment...)
}

That still requires the argument array to be constructed up front, though.

-Joe

···

On Mar 10, 2017, at 11:27 AM, David Waite <david@alkaline-solutions.com> wrote:

On Mar 10, 2017, at 9:49 AM, Joe Groff via swift-evolution <swift-evolution@swift.org> wrote:

Having ExpressibleByStringInterpolation refine ExpressibleByStringLiteral makes sense. I think there's a more powerful alternative design you should also consider. If the protocol looked like this:

protocol ExpressibleByStringInterpolation: ExpressibleByStringLiteral {
  associatedtype LiteralSegment: ExpressibleByStringLiteral
  associatedtype InterpolatedSegment
  init(forStringInterpolation: Void)

  mutating func append(literalSegment: LiteralSegment)
  mutating func append(interpolatedSegment: InterpolatedSegment)
}

Then an interpolation expression like this in `Thingy` type context:

"foo \(bar) bas \(zim: 1, zang: 2)\n"

could desugar to something like:

{
  var x = Thingy(forStringInterpolation: ())
  // Literal segments get appended using append(literalSegment: "literal")
  x.append(literalSegment: "foo ")
  // \(...) segments are arguments to a InterpolatedSegment constructor
  x.append(interpolatedSegment: Thingy.InterpolatedSegment(bar))
  x.append(literalSegment: " bas ")
  x.append(interpolatedSegment: Thingy.InterpolatedSegment(zim: 1, zang: 2))

  return x
}()

This design should be more efficient, since there's no temporary array of segments that needs to be formed for a variadic argument, you don't need to homogenize everything to Self type up front, and the string can be built up in-place. It also provides means to address problems 3 and 4, since the InterpolatedSegment associated type can control what types it's initializable from, and can provide initializers with additional arguments for formatting or other purposes.

Hi Joe,

The trade-offs for this approach would be:
- each append would need to return a valid object w.r.t the type’s invariants.
- an implementation could use the init(stringInterpolation:) could be a final building step, while append would not indicate that the object construction was complete.
One example where this could be a problem would be if someone used the segments to build up a localized representation of the interpolated string.


(Jacob Bandes-Storch) #7

>
> I think there's a more powerful alternative design you should also
consider. If the protocol looked like this:
>
> protocol ExpressibleByStringInterpolation: ExpressibleByStringLiteral {
> associatedtype LiteralSegment: ExpressibleByStringLiteral
> associatedtype InterpolatedSegment
> init(forStringInterpolation: Void)
>
> mutating func append(literalSegment: LiteralSegment)
> mutating func append(interpolatedSegment: InterpolatedSegment)
> }
>
> Then an interpolation expression like this in `Thingy` type context:
>
> "foo \(bar) bas \(zim: 1, zang: 2)\n"
>
> could desugar to something like:
>
> {
> var x = Thingy(forStringInterpolation: ())
> // Literal segments get appended using append(literalSegment:
"literal")
> x.append(literalSegment: "foo ")
> // \(...) segments are arguments to a InterpolatedSegment constructor
> x.append(interpolatedSegment: Thingy.InterpolatedSegment(bar))
> x.append(literalSegment: " bas ")
> x.append(interpolatedSegment: Thingy.InterpolatedSegment(zim: 1,
zang: 2))
>
> return x
> }()
>
> This design should be more efficient, since there's no temporary array
of segments that needs to be formed for a variadic argument, you don't need
to homogenize everything to Self type up front, and the string can be built
up in-place. It also provides means to address problems 3 and 4, since the
InterpolatedSegment associated type can control what types it's
initializable from, and can provide initializers with additional arguments
for formatting or other purposes.

On the other hand, you end up with an `init(forStringInterpolation: ())`
initializer which is explicitly intended to return an incompletely
initialized instance. I don't enjoy imagining this. For instance, you might
find yourself having to change certain properties from `let` to `var` so
that the `append` methods can operate.

If we *do* go this direction, though, I might suggest a slightly different
design which uses fewer calls and makes the finalization explicit:

        protocol ExpressibleByStringLiteral {
                associatedtype StringLiteralSegment:
ExpressibleByStringLiteral

                init(startingStringLiteral: ())
                func endStringLiteral(with segment: StringLiteralSegment)
        }
        protocol ExpressibleByStringInterpolation:
ExpressibleByStringLiteral {
                associatedtype StringInterpolationSegment

                func continueStringLiteral(with literal:
StringLiteralSegment, followedBy interpolation: StringInterpolationSegment)
        }

Your `"foo \(bar) bas \(zim: 1, zang: 2)\n"` example would then become:

        {
                var x = Thingy(startingStringLiteral: ())
                x.continueStringLiteral(with: "Foo ", followedBy:
.init(bar))
                x.continueStringLiteral(with: " bas ", followedBy:
.init(zim: 1, zang: 2))
                x.endStringLiteral(with: "\n")
                return x
        }

While a plain old string literal would have a more complicated pattern
than they do currently, but one which would have completely compatible
semantics with an interpolated string:

        {
                var x = Thingy(startingStringLiteral: ())
                x.endStringLiteral(with: "Hello, world!")
                return x
        }

* * *

Another possible design would separate the intermediate type from the
final one. For instance, suppose we had:

        protocol ExpressibleByStringInterpolation:
ExpressibleByStringLiteral {
                associatedtype StringInterpolationBuffer = Self
                associatedtype StringInterpolationType

                static func makeStringLiteralBuffer(startingWith
firstLiteralSegment: StringLiteralType) -> StringLiteralBuffer
                static func appendInterpolationSegment(_ expr:
StringInterpolationType, to stringLiteralBuffer: inout StringLiteralBuffer)
                static func appendLiteralSegment(_ string:
StringLiteralType, to stringLiteralBuffer: inout StringLiteralBuffer)

                init(stringInterpolation buffer: StringInterpolationBuffer)
        }
        // Automatically provide a parent protocol conformance
        extension ExpressibleByStringInterpolation {
                init(stringLiteral: StringLiteralType) {
                        let buffer = Self.makeStringLiteralBuffer(startingWith:
stringLiteral)
                        self.init(stringInterpolation: buffer)
                }
        }

Then your example would be:

        {
                var buffer = Thingy.makeStringLiteralBuffer(startingWith:
"foo ")
                Thingy.appendInterpolationSegment(Thingy.
StringInterpolationSegment(bar), to: &buffer)
                Thingy.appendLiteralSegment(" bas ", to: &buffer)
                Thingy.appendInterpolationSegment(Thingy.
StringInterpolationSegment(zim: 1, zang: 2), to: &buffer)
                Thingy.appendLiteralSegment("\n", to: &buffer)

                return Thingy(stringInterpolation: x)
        }()

For arbitrary string types, `StringInterpolationBuffer` would probably be
`Self`, but if you had something which could only create an instance of
itself once the entire literal was gathered together, it could use `String`
or `Array` or whatever else it wanted.

* * *

One more design possibility. Would it make sense to handle all the
segments in a single initializer call, instead of having one call for each
segment, plus a big call at the end? Suppose we did this:

        protocol ExpressibleByStringInterpolation:
ExpressibleByStringLiteral {
                associatedtype StringInterpolationType

                init(stringInterpolation segments:
StringInterpolationSegment...)
        }
        @fixed_layout enum StringInterpolationSegment<StringType:
> {
                case literal(StringType.StringLiteralType)
                case interpolation(StringType.StringInterpolationType)
        }
        extension ExpressibleByStringInterpolation {
                typealias StringInterpolationSegment = Swift.
StringInterpolationSegment<Self>

                init(stringLiteral: StringLiteralType) {
                        self.init(stringInterpolation:
.literal(stringLiteral))
                }
        }

Code pattern would look like this:

        Thingy(stringInterpolation:
                .literal("Foo "),
                .interpolation(.init(bar)),
                .literal(" bas "),
                .interpolation(.init(zim: 1, zang: 2)),
                .literal("\n")
        )

I suppose it just depends on whether the array or the extra calls are more
costly. (Well, it also depends on whether we want to be expanding single
expressions into big, complex, multi-statement messes like we discussed
before.)

(Actually, I realized after writing this that you mentioned a similar
design downthread. Oops.)

This seems friendlier to me than the other designs. Not that it's a huge
deal, but passing the segments one at a time introduces an unnecessary
requirement that the construction happen left-to-right.

* * *

As for validation, which is mentioned downthread: I think we will really
want plain old string literal validation to happen at compile time. Doing
that in a general way means macros, so that's just not in the cards yet.

However, once we *do* have that, I think we can actually handle
runtime-failable interpolated literals pretty easily. For this example,
I'll assume we adopt the `StringInterpolationSegment`-enum-based option,
but any of them could be adapted in the same way:

        protocol ExpressibleByFailableStringInterpolation:
ExpressibleByStringLiteral {
                associatedtype StringInterpolationType

                init(stringInterpolation: StringInterpolationSegment...)
        }
        extension ExpressibleByFailableStringInterpolation {
                typealias StringInterpolationSegment = Swift.
StringInterpolationSegment<Self?>

                init(stringLiteral: StringLiteralType) {
                        self.init(stringInterpolation segments:
.literal(stringLiteral))
                }
        }
        extension Optional: ExpressibleByStringInterpolation where
Wrapped: ExpressibleByFailableStringInterpolation {
                typealias StringLiteralType = Wrapped.StringLiteralType
                typealias StringInterpolationType = Wrapped.
StringInterpolationType

                init(stringInterpolation segments:
StringInterpolationSegment...) {
                        self = Wrapped(stringInterpolation: segments)
                }
        }

If we think we'd rather support throwing inits instead of failable inits,
that could be supported directly by `ExpressibleByStringInterpolation` if
we get throwing types and support `Never` as a "doesn't throw" type.

I'm confused by this example — was
ExpressibleByFailableStringInterpolation's init() supposed to be failable
here?

* * *

Related question: Can the construction of the variadic parameter array be
optimized? For instance, the arity of any given call site is known at
compile time; can the array buffer be allocated on the stack and somehow
marked so that attempting to retain it (while copying the `Array` instance)
will copy it into the heap? (Are we doing that already?) I suspect that
would make variadic calls a lot cheaper, perhaps enough so that we just
don't need to worry about this problem at all.

--
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Just to throw out another idea, what about keeping the entirety of the
string in one contiguous block and providing String.Indexes to the
initializer?

protocol ExpressibleByStringInterpolation {
    associatedtype Interpolation
    init(_ string: String, with interpolations: (String.Index,
Interpolation)...)
}

On the other hand, unless I've missed something, it seems like most of the
suggestions so far are assuming that for any
ExpressibleByStringInterpolation type, the interpolated values' types will
be homogeneous. In the hypothetical printf-replacement case, you'd really
want the value types to depend on the format specifiers, so that a Float
couldn't be passed to %d without explicitly converting to an integer type.

Although I suppose that could simply be achieved with a typealias
Interpolation = enum { case f(Float), d(Int), ... }

···

On Fri, Mar 10, 2017 at 4:44 PM, Brent Royal-Gordon via swift-evolution < swift-evolution@swift.org> wrote:

> On Mar 10, 2017, at 8:49 AM, Joe Groff <jgroff@apple.com> wrote:


(Brent Royal-Gordon) #8

I'm confused by this example — was ExpressibleByFailableStringInterpolation's init() supposed to be failable here?

Ugh, yes, I'm sorry. That should have been:

  protocol ExpressibleByFailableStringInterpolation: ExpressibleByStringLiteral {
    associatedtype StringInterpolationType
    
    init?(stringInterpolation: StringInterpolationSegment...)
  }

  ...
  
  extension Optional: ExpressibleByStringInterpolation where Wrapped: ExpressibleByFailableStringInterpolation {
    typealias StringLiteralType = Wrapped.StringLiteralType
    typealias StringInterpolationType = Wrapped.StringInterpolationType
    
    init(stringInterpolation segments: StringInterpolationSegment...) {
      self = Wrapped(stringInterpolation: segments)
    }
  }

Just to throw out another idea, what about keeping the entirety of the string in one contiguous block and providing String.Indexes to the initializer?

protocol ExpressibleByStringInterpolation {
    associatedtype Interpolation
    init(_ string: String, with interpolations: (String.Index, Interpolation)...)
}

I've thought about that too. It's a little bit limiting—you have no choice except to use `String` as your input type. Also, the obvious way to use these parameters:

  init(stringLiteral string: String, with interpolations: (String.Index, Interpolation)...) {
    var copy = string
    for (i, expr) in interpolations {
      let exprString = doSomething(with: expr)
      copy.insert(exprString, at: i)
    }
    self.string = copy
  }

Will probably be slow, since you're inserting into the middle instead of appending to the end. Obviously a clever programmer can avoid doing that, but why create the attractive nuisance in the first place?

On the other hand, unless I've missed something, it seems like most of the suggestions so far are assuming that for any ExpressibleByStringInterpolation type, the interpolated values' types will be homogeneous. In the hypothetical printf-replacement case, you'd really want the value types to depend on the format specifiers, so that a Float couldn't be passed to %d without explicitly converting to an integer type.

Although I suppose that could simply be achieved with a typealias Interpolation = enum { case f(Float), d(Int), ... }

Yup. Given Swift's current feature set, the only way to design this is to have a single type which all interpolations funnel through. That type could be `Any`, of course, but that doesn't really help anybody.

If we had variadic generics, you could of course have a variadic initializer with heterogeneous types. And I've often given thought to a "multiple associated types" feature where a protocol could be conformed to multiple times by specifying more than one concrete type for specific associated types. But these are both exotic features. In their absence, an enum (or something like that) is probably the best choice.

* * *

I'm going to try to explore some of these other designs, but they all seem to assume the new formatting system I sketched out in "future directions", so I implemented that first:

  https://github.com/brentdax/swift/compare/new-interpolation...brentdax:new-interpolation-formatting

The switch to `\(describing: foo)` has more impact than I expected; just the code that's built by `utils/build-script`—not including tests—has over a hundred lines with changes like this:

- expectationFailure("\(lhs) < \(rhs)", trace: ${trace})
+ expectationFailure("\(describing: lhs) < \(describing: rhs)", trace: ${trace})

On the other hand, I like what it does to other formatting (I've only applied this kind of change in a few places):

- return "CollectionOfOne(\(String(reflecting: _element)))"
+ return "CollectionOfOne(\(reflecting: _element))"

And it *does* make you think about whether you want to use `describing:` or `reflecting:`:

- expectEqual(expected, actual, "where the argument is: \(a)")
+ expectEqual(expected, actual, "where the argument is: \(describing: a)")

And, thanks to LosslessStringConvertible, it also does a pretty good job of calling out the difference between interpolations that will probably look good to a user and ones that will look a little funny:

- return "watchOS(\(major).\(minor).[\(bugFixRange)], reason: \(reason))"
+ return "watchOS(\(major).\(minor).[\(describing: bugFixRange)], reason: \(reason))"

All in all, it's a bit of a mixed bag:

- return "<\(type(of: x)): 0x\(String(asNumericValue(x), radix: 16, uppercase: false))>"
+ return "<\(describing: type(of: x)): 0x\(asNumericValue(x), radix: 16, uppercase: false)>"

We could probably improve this situation with a few targeted `String.init(_:)`s for things like type names, `Error` instances, and `FloatingPoint` types. (Actually, I think that `FloatingPoint` should probably conform to `LosslessStringConvertible`, but that's a different story.) Possibly `Array`s of `LosslessStringConvertible` types as well.

But ultimately, this might just be too source-breaking. If it is, we'll need to think about changing the design.

The simplest fix is to add a leading parameter label if there isn't one—that is, `\(foo)` becomes `.init(formatting: foo)`—but then you lose the ability to use full-width initializers which are already present and work well outside of interpolation. Perhaps we could hack overload checking so that, if a particular flag is set on a call, it will consider both methods with *and* without the first parameter label? But that's kind of bizarre, and probably above my pay grade to implement.

In any case, I really think this is in the right general direction, and with it done, I can start exploring some of the alternatives we've discussed here. I'm hoping to build several and run the string interpolation benchmark against them—we'll see how that goes.

···

On Mar 10, 2017, at 11:17 PM, Jacob Bandes-Storch <jtbandes@gmail.com> wrote:

--
Brent Royal-Gordon
Architechies


(Jacob Bandes-Storch) #9

>
> I'm confused by this example — was ExpressibleByFailableStringInterpolation's
init() supposed to be failable here?

Ugh, yes, I'm sorry. That should have been:

        protocol ExpressibleByFailableStringInterpolation:
ExpressibleByStringLiteral {
                associatedtype StringInterpolationType

                init?(stringInterpolation: StringInterpolationSegment...)
        }

        ...

        extension Optional: ExpressibleByStringInterpolation where
Wrapped: ExpressibleByFailableStringInterpolation {
                typealias StringLiteralType = Wrapped.StringLiteralType
                typealias StringInterpolationType =
Wrapped.StringInterpolationType

                init(stringInterpolation segments:
StringInterpolationSegment...) {
                        self = Wrapped(stringInterpolation: segments)
                }
        }

> Just to throw out another idea, what about keeping the entirety of the
string in one contiguous block and providing String.Indexes to the
initializer?
>
> protocol ExpressibleByStringInterpolation {
> associatedtype Interpolation
> init(_ string: String, with interpolations: (String.Index,
Interpolation)...)
> }

I've thought about that too. It's a little bit limiting—you have no choice
except to use `String` as your input type. Also, the obvious way to use
these parameters:

        init(stringLiteral string: String, with interpolations:
(String.Index, Interpolation)...) {
                var copy = string
                for (i, expr) in interpolations {
                        let exprString = doSomething(with: expr)
                        copy.insert(exprString, at: i)
                }
                self.string = copy
        }

Will probably be slow, since you're inserting into the middle instead of
appending to the end. Obviously a clever programmer can avoid doing that,
but why create the attractive nuisance in the first place?

It's also easy to get wrong :wink: Docs for insert(_:at:) say "Calling this
method invalidates any existing indices for use with this string." And even
if they weren't invalidated, but simple numerical indices as into an Array,
you'd need to use them offsetBy however much content you'd inserted so far.

> On the other hand, unless I've missed something, it seems like most of
the suggestions so far are assuming that for any
ExpressibleByStringInterpolation type, the interpolated values' types
will be homogeneous. In the hypothetical printf-replacement case, you'd
really want the value types to depend on the format specifiers, so that a
Float couldn't be passed to %d without explicitly converting to an integer
type.
>
> Although I suppose that could simply be achieved with a typealias
Interpolation = enum { case f(Float), d(Int), ... }

Yup. Given Swift's current feature set, the only way to design this is to
have a single type which all interpolations funnel through. That type could
be `Any`, of course, but that doesn't really help anybody.

If we had variadic generics, you could of course have a variadic
initializer with heterogeneous types. And I've often given thought to a
"multiple associated types" feature where a protocol could be conformed to
multiple times by specifying more than one concrete type for specific
associated types. But these are both exotic features. In their absence, an
enum (or something like that) is probably the best choice.

* * *

I'm going to try to explore some of these other designs, but they all seem
to assume the new formatting system I sketched out in "future directions",
so I implemented that first:

        https://github.com/brentdax/swift/compare/new-interpolation.
..brentdax:new-interpolation-formatting

The switch to `\(describing: foo)` has more impact than I expected; just
the code that's built by `utils/build-script`—not including tests—has over
a hundred lines with changes like this:

- expectationFailure("\(lhs) < \(rhs)", trace: ${trace})
+ expectationFailure("\(describing: lhs) < \(describing: rhs)", trace:
${trace})

On the other hand, I like what it does to other formatting (I've only
applied this kind of change in a few places):

- return "CollectionOfOne(\(String(reflecting: _element)))"
+ return "CollectionOfOne(\(reflecting: _element))"

And it *does* make you think about whether you want to use `describing:`
or `reflecting:`:

- expectEqual(expected, actual, "where the argument is: \(a)")
+ expectEqual(expected, actual, "where the argument is: \(describing:
a)")

And, thanks to LosslessStringConvertible, it also does a pretty good job
of calling out the difference between interpolations that will probably
look good to a user and ones that will look a little funny:

- return "watchOS(\(major).\(minor).[\(bugFixRange)], reason:
\(reason))"
+ return "watchOS(\(major).\(minor).[\(describing: bugFixRange)],
reason: \(reason))"

All in all, it's a bit of a mixed bag:

- return "<\(type(of: x)): 0x\(String(asNumericValue(x), radix:
16, uppercase: false))>"
+ return "<\(describing: type(of: x)): 0x\(asNumericValue(x),
radix: 16, uppercase: false)>"

We could probably improve this situation with a few targeted
`String.init(_:)`s for things like type names, `Error` instances, and
`FloatingPoint` types. (Actually, I think that `FloatingPoint` should
probably conform to `LosslessStringConvertible`, but that's a different
story.) Possibly `Array`s of `LosslessStringConvertible` types as well.

But ultimately, this might just be too source-breaking. If it is, we'll
need to think about changing the design.

The simplest fix is to add a leading parameter label if there isn't
one—that is, `\(foo)` becomes `.init(formatting: foo)`—but then you lose
the ability to use full-width initializers which are already present and
work well outside of interpolation. Perhaps we could hack overload checking
so that, if a particular flag is set on a call, it will consider both
methods with *and* without the first parameter label? But that's kind of
bizarre, and probably above my pay grade to implement.

In any case, I really think this is in the right general direction, and
with it done, I can start exploring some of the alternatives we've
discussed here. I'm hoping to build several and run the string
interpolation benchmark against them—we'll see how that goes.

--
Brent Royal-Gordon
Architechies

I still have the feeling that using full-on argument labels is too wordy
for string interpolation. The grammar is a bit awkward: "a String
describing x" makes sense when reading String(describing: x), but
"\(describing: x)" has no subject. And, at least for number formatting,
it's nowhere near the concision of printf-style format specifiers. Although
\() into T.init() does seem like a very obvious way of allowing more than
just a single argument per interpolation — maybe what I long for is not a
better core syntax, but a great DSL built on top of it. I'm struggling to
come up with anything that would be particularly ergonomic.

enum FloatSpec { case pad(Character), width(Int), ... }
init<F: FloatingPoint>(_ value: F, _ specs: FloatSpec...)
"\(value, .width(10), .pad(4))"

However, your proposal as written is a great step!

···

On Sat, Mar 11, 2017 at 5:34 AM, Brent Royal-Gordon <brent@architechies.com> wrote:

> On Mar 10, 2017, at 11:17 PM, Jacob Bandes-Storch <jtbandes@gmail.com> > wrote: