[Pitch] String revision proposal #1

When you have a pointer and a length, you can create a fully functional Collection using UnsafeBufferPointer. This means you aren't need something that’s C interop-specific any more – just the ability to create a String from a Collection of code units of some encoding.

We’ll add something to the proposal making it clear this will be possible.

···

On Mar 31, 2017, at 4:01 AM, Jean-Daniel via swift-evolution <swift-evolution@swift.org> wrote:

I’m with you for a C intro API that support taking a non-null terminated string. I often work with API that support efficient parsing by providing pointer to a global buffer + length to report parsed strings.

Without a way to create a Swift string from buffer + length, interop with such API will be difficult for no good reason, as Swift string don’t event have to be null terminated.

Le 30 mars 2017 à 18:35, Félix Cloutier via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> a écrit :

I don't have much non-nitpick issues that I greatly care about; I'm in favor of this.

My only request: it's currently painful to create a String from a fixed-size C array. For instance, if I have a pointer to a `struct foo { char name[16]; }` in Swift where the last character doesn't have to be a NUL, it's hard to create a String from it. Real-world examples of this are Mach-O LC_SEGMENT and LC_SEGMENT_64 commands.

The generally-accepted wisdom <swift - Converting a C char array to a String - Stack Overflow; is that you take a pointer to the CChar tuple that represents the fixed-size array, but this still requires the string to be NUL-terminated. What do we think of an additional init(cString:) overload that takes an UnsafeBufferPointer and reads up to the first NUL or the end of the buffer, whichever comes first?

Le 30 mars 2017 à 02:48, Brent Royal-Gordon via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> a écrit :

On Mar 29, 2017, at 5:32 PM, Ben Cohen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hi Swift Evolution,

Below is a pitch for the first part of the String revision. This covers a number of changes that would allow the basic internals to be overhauled.

Online version here: https://github.com/airspeedswift/swift-evolution/blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-StringRevision1.md

Really great stuff, guys. Thanks for your work on this!

In order to be able to write extensions accross both String and Substring, a new Unicode protocol to which the two types will conform will be introduced. For the purposes of this proposal, Unicode will be defined as a protocol to be used whenver you would previously extend String. It should be possible to substitute extension Unicode { ... } in Swift 4 wherever extension String { ... } was written in Swift 3, with one exception: any passing of self into an API that takes a concrete String will need to be rewritten as String(self). If Self is a String then this should effectively optimize to a no-op, whereas if Self is a Substring then this will force a copy, helping to avoid the “memory leak” problems described above.

I continue to feel that `Unicode` is the wrong name for this protocol, essentially because it sounds like a protocol for, say, a version of Unicode or some kind of encoding machinery instead of a Unicode string. I won't rehash that argument since I made it already in the manifesto thread, but I would like to make a couple new suggestions in this area.

Later on, you note that it would be nice to namespace many of these types:

Several of the types related to String, such as the encodings, would ideally reside inside a namespace rather than live at the top level of the standard library. The best namespace for this is probably Unicode, but this is also the name of the protocol. At some point if we gain the ability to nest enums and types inside protocols, they should be moved there. Putting them inside String or some other enum namespace is probably not worthwhile in the mean-time.

Perhaps we should use an empty enum to create a `Unicode` namespace and then nest the protocol within it via typealias. If we do that, we can consider names like `Unicode.Collection` or even `Unicode.String` which would shadow existing types if they were top-level.

If not, then given this:

The exact nature of the protocol – such as which methods should be protocol requirements vs which can be implemented as protocol extensions, are considered implementation details and so not covered in this proposal.

We may simply want to wait to choose a name. As the protocol develops, we may discover a theme in its requirements which would suggest a good name. For instance, we may realize that the core of what the protocol abstracts is grouping code units into characters, which might suggest a name like `Characters`, or `Unicode.Characters`, or `CharacterCollection`, or what-have-you.

(By the way, I hope that the eventual protocol requirements will be put through the review process, if only as an amendment, once they're determined.)

Unicode will conform to BidirectionalCollection. RangeReplaceableCollection conformance will be added directly onto the String and Substring types, as it is possible future Unicode-conforming types might not be range-replaceable (e.g. an immutable type that wraps a const char *).

I'm a little worried about this because it seems to imply that the protocol cannot include any mutation operations that aren't in `RangeReplaceableCollection`. For instance, it won't be possible to include an in-place `applyTransform` method in the protocol. Do you anticipate that being an issue? Might it be a good idea to define a parallel `Mutable` or `RangeReplaceable` protocol?

The C string interop methods will be updated to those described here: a single withCString operation and two init(cString:) constructors, one for UTF8 and one for arbitrary encodings.

Sorry if I'm repeating something that was already discussed, but is there a reason you don't include a `withCString` variant for arbitrary encodings? It seems like an odd asymmetry.

The standard library currently lacks a Latin1 codec, so a enum Latin1: UnicodeEncoding type will be added.

Nice. I wrote one of those once; I'll enjoy deleting it.

A new protocol, UnicodeEncoding, will be added to replace the current UnicodeCodec protocol:

public enum UnicodeParseResult<T, Index> {

Either `T` should be given a more specific name, or the enum should be given a less specific one, becoming `ParseResult` and being oriented towards incremental parsing of anything from any kind of collection.

/// Indicates valid input was recognized.
///
/// `resumptionPoint` is the end of the parsed region
case valid(T, resumptionPoint: Index) // FIXME: should these be reordered?

No, I think this is the right order. The thing that's valid is the code point.

/// Indicates invalid input was recognized.
///
/// `resumptionPoint` is the next position at which to continue parsing after
/// the invalid input is repaired.
case error(resumptionPoint: Index)

I know this is abbreviated documentation, but I hope the full version includes a good usage example demonstrating, among other things, how to detect partial characters and defer processing of them instead of rejecting them as erroneous.

/// An encoding for text with UnicodeScalar as a common currency type
public protocol UnicodeEncoding {
/// The maximum number of code units in an encoded unicode scalar value
static var maxLengthOfEncodedScalar: Int { get }

/// A type that can represent a single UnicodeScalar as it is encoded in this
/// encoding.
associatedtype EncodedScalar : EncodedScalarProtocol

There's an `EncodedScalarProtocol`-shaped hole in this proposal. What does it do? What are its semantics? How does `EncodedScalar` relate to the old `CodeUnit`?

@discardableResult
public static func parseForward<C: Collection>(
   _ input: C,
   repairingIllFormedSequences makeRepairs: Bool = true,
   into output: (EncodedScalar) throws->Void
) rethrows -> (remainder: C.SubSequence, errorCount: Int)

@discardableResult
public static func parseReverse<C: BidirectionalCollection>(
   _ input: C,
   repairingIllFormedSequences makeRepairs: Bool = true,
   into output: (EncodedScalar) throws->Void
) rethrows -> (remainder: C.SubSequence, errorCount: Int)
where C.SubSequence : BidirectionalCollection,
       C.SubSequence.SubSequence == C.SubSequence,
       C.SubSequence.Iterator.Element == EncodedScalar.Iterator.Element
}

Are there constraints missing on `parseForward`?

What do these do if `makeRepairs` is false? Would it be clearer if we made an enum that described the behaviors and changed the label to something like `ifIllFormed:`?

Due to the change in internal implementation, this means that these operations will be O(n) rather than O(1). This is not expected to be a major concern, based on experiences from a similar change made to Java, but projects will be able to work around performance issues without upgrading to Swift 4 by explicitly typing slices as Substring, which will call the Swift 4 variant, and which will be available but not invoked by default in Swift 3 mode.

Will there be a way to make this also work with a real Swift 3 compiler? For instance, can you define `typealias Substring = String` in such a way that real Swift 3 will parse and use it, but Swift 4 in Swift 3 mode will ignore it?

This proposal does not yet introduce an implicit conversion from Substring to String. The decision on whether to add this will be deferred pending feedback on the initial implementation. The intention is to make a preview toolchain available for feedback, including on whether this implicit conversion is necessary, prior to the release of Swift 4.

This is a sensible approach.

Thank you for developing this into a full proposal. I discussed the plans for Swift 4 with a local group of programmers recently, and everyone was pleased to hear that `String` would get an overhaul, that the `characters` view would be integrated into the string, etc. We even talked a little about `Substring` and people thought it was a good idea. This proposal is shaping up to impact a lot of people, but in a good way!

--
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

You could even argue that what we need is a Collection wrapper that turns a pointer + a terminating sigil into a Collection… but from-C-string-creation is such a common operation that it deserves a dedicated shorthand. Non-null-terminated creation probably doesn’t.

···

On Mar 31, 2017, at 8:03 AM, Ben Cohen <ben_cohen@apple.com> wrote:

When you have a pointer and a length, you can create a fully functional Collection using UnsafeBufferPointer. This means you aren't need something that’s C interop-specific any more – just the ability to create a String from a Collection of code units of some encoding.

We’ll add something to the proposal making it clear this will be possible.

On Mar 31, 2017, at 4:01 AM, Jean-Daniel via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

I’m with you for a C intro API that support taking a non-null terminated string. I often work with API that support efficient parsing by providing pointer to a global buffer + length to report parsed strings.

Without a way to create a Swift string from buffer + length, interop with such API will be difficult for no good reason, as Swift string don’t event have to be null terminated.

Le 30 mars 2017 à 18:35, Félix Cloutier via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> a écrit :

I don't have much non-nitpick issues that I greatly care about; I'm in favor of this.

My only request: it's currently painful to create a String from a fixed-size C array. For instance, if I have a pointer to a `struct foo { char name[16]; }` in Swift where the last character doesn't have to be a NUL, it's hard to create a String from it. Real-world examples of this are Mach-O LC_SEGMENT and LC_SEGMENT_64 commands.

The generally-accepted wisdom <swift - Converting a C char array to a String - Stack Overflow; is that you take a pointer to the CChar tuple that represents the fixed-size array, but this still requires the string to be NUL-terminated. What do we think of an additional init(cString:) overload that takes an UnsafeBufferPointer and reads up to the first NUL or the end of the buffer, whichever comes first?

Le 30 mars 2017 à 02:48, Brent Royal-Gordon via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> a écrit :

On Mar 29, 2017, at 5:32 PM, Ben Cohen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hi Swift Evolution,

Below is a pitch for the first part of the String revision. This covers a number of changes that would allow the basic internals to be overhauled.

Online version here: https://github.com/airspeedswift/swift-evolution/blob/3a822c799011ace682712532cfabfe32e9203fbb/proposals/0161-StringRevision1.md

Really great stuff, guys. Thanks for your work on this!

In order to be able to write extensions accross both String and Substring, a new Unicode protocol to which the two types will conform will be introduced. For the purposes of this proposal, Unicode will be defined as a protocol to be used whenver you would previously extend String. It should be possible to substitute extension Unicode { ... } in Swift 4 wherever extension String { ... } was written in Swift 3, with one exception: any passing of self into an API that takes a concrete String will need to be rewritten as String(self). If Self is a String then this should effectively optimize to a no-op, whereas if Self is a Substring then this will force a copy, helping to avoid the “memory leak” problems described above.

I continue to feel that `Unicode` is the wrong name for this protocol, essentially because it sounds like a protocol for, say, a version of Unicode or some kind of encoding machinery instead of a Unicode string. I won't rehash that argument since I made it already in the manifesto thread, but I would like to make a couple new suggestions in this area.

Later on, you note that it would be nice to namespace many of these types:

Several of the types related to String, such as the encodings, would ideally reside inside a namespace rather than live at the top level of the standard library. The best namespace for this is probably Unicode, but this is also the name of the protocol. At some point if we gain the ability to nest enums and types inside protocols, they should be moved there. Putting them inside String or some other enum namespace is probably not worthwhile in the mean-time.

Perhaps we should use an empty enum to create a `Unicode` namespace and then nest the protocol within it via typealias. If we do that, we can consider names like `Unicode.Collection` or even `Unicode.String` which would shadow existing types if they were top-level.

If not, then given this:

The exact nature of the protocol – such as which methods should be protocol requirements vs which can be implemented as protocol extensions, are considered implementation details and so not covered in this proposal.

We may simply want to wait to choose a name. As the protocol develops, we may discover a theme in its requirements which would suggest a good name. For instance, we may realize that the core of what the protocol abstracts is grouping code units into characters, which might suggest a name like `Characters`, or `Unicode.Characters`, or `CharacterCollection`, or what-have-you.

(By the way, I hope that the eventual protocol requirements will be put through the review process, if only as an amendment, once they're determined.)

Unicode will conform to BidirectionalCollection. RangeReplaceableCollection conformance will be added directly onto the String and Substring types, as it is possible future Unicode-conforming types might not be range-replaceable (e.g. an immutable type that wraps a const char *).

I'm a little worried about this because it seems to imply that the protocol cannot include any mutation operations that aren't in `RangeReplaceableCollection`. For instance, it won't be possible to include an in-place `applyTransform` method in the protocol. Do you anticipate that being an issue? Might it be a good idea to define a parallel `Mutable` or `RangeReplaceable` protocol?

The C string interop methods will be updated to those described here: a single withCString operation and two init(cString:) constructors, one for UTF8 and one for arbitrary encodings.

Sorry if I'm repeating something that was already discussed, but is there a reason you don't include a `withCString` variant for arbitrary encodings? It seems like an odd asymmetry.

The standard library currently lacks a Latin1 codec, so a enum Latin1: UnicodeEncoding type will be added.

Nice. I wrote one of those once; I'll enjoy deleting it.

A new protocol, UnicodeEncoding, will be added to replace the current UnicodeCodec protocol:

public enum UnicodeParseResult<T, Index> {

Either `T` should be given a more specific name, or the enum should be given a less specific one, becoming `ParseResult` and being oriented towards incremental parsing of anything from any kind of collection.

/// Indicates valid input was recognized.
///
/// `resumptionPoint` is the end of the parsed region
case valid(T, resumptionPoint: Index) // FIXME: should these be reordered?

No, I think this is the right order. The thing that's valid is the code point.

/// Indicates invalid input was recognized.
///
/// `resumptionPoint` is the next position at which to continue parsing after
/// the invalid input is repaired.
case error(resumptionPoint: Index)

I know this is abbreviated documentation, but I hope the full version includes a good usage example demonstrating, among other things, how to detect partial characters and defer processing of them instead of rejecting them as erroneous.

/// An encoding for text with UnicodeScalar as a common currency type
public protocol UnicodeEncoding {
/// The maximum number of code units in an encoded unicode scalar value
static var maxLengthOfEncodedScalar: Int { get }

/// A type that can represent a single UnicodeScalar as it is encoded in this
/// encoding.
associatedtype EncodedScalar : EncodedScalarProtocol

There's an `EncodedScalarProtocol`-shaped hole in this proposal. What does it do? What are its semantics? How does `EncodedScalar` relate to the old `CodeUnit`?

@discardableResult
public static func parseForward<C: Collection>(
   _ input: C,
   repairingIllFormedSequences makeRepairs: Bool = true,
   into output: (EncodedScalar) throws->Void
) rethrows -> (remainder: C.SubSequence, errorCount: Int)

@discardableResult
public static func parseReverse<C: BidirectionalCollection>(
   _ input: C,
   repairingIllFormedSequences makeRepairs: Bool = true,
   into output: (EncodedScalar) throws->Void
) rethrows -> (remainder: C.SubSequence, errorCount: Int)
where C.SubSequence : BidirectionalCollection,
       C.SubSequence.SubSequence == C.SubSequence,
       C.SubSequence.Iterator.Element == EncodedScalar.Iterator.Element
}

Are there constraints missing on `parseForward`?

What do these do if `makeRepairs` is false? Would it be clearer if we made an enum that described the behaviors and changed the label to something like `ifIllFormed:`?

Due to the change in internal implementation, this means that these operations will be O(n) rather than O(1). This is not expected to be a major concern, based on experiences from a similar change made to Java, but projects will be able to work around performance issues without upgrading to Swift 4 by explicitly typing slices as Substring, which will call the Swift 4 variant, and which will be available but not invoked by default in Swift 3 mode.

Will there be a way to make this also work with a real Swift 3 compiler? For instance, can you define `typealias Substring = String` in such a way that real Swift 3 will parse and use it, but Swift 4 in Swift 3 mode will ignore it?

This proposal does not yet introduce an implicit conversion from Substring to String. The decision on whether to add this will be deferred pending feedback on the initial implementation. The intention is to make a preview toolchain available for feedback, including on whether this implicit conversion is necessary, prior to the release of Swift 4.

This is a sensible approach.

Thank you for developing this into a full proposal. I discussed the plans for Swift 4 with a local group of programmers recently, and everyone was pleased to hear that `String` would get an overhaul, that the `characters` view would be integrated into the string, etc. We even talked a little about `Substring` and people thought it was a good idea. This proposal is shaping up to impact a lot of people, but in a good way!

--
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

Hi Brent,

Sorry, I realized I failed to reply to these at the time. See below.

The big win for Unicode is it is short. We want to encourage people to write their extensions on this protocol. We want people who previously extended String to feel very comfortable extending Unicode. It also helps emphasis how important the Unicode-ness of Swift.String is. I like the idea of Unicode.Collection, but it is a little intimidating and making it even a tiny bit intimidating is worrying to me from an adoption perspective.

Yeah, I understand why "Collection" might be intimidating. But I think "Unicode" would be too—it's opaque enough that people wouldn't be entirely sure whether they were extending the right thing.

I did a quick run-through of different language and the protocols/interfaces/whatever their string types conform to, but most don't seem to have anything that abstracts string types. The only similar things I could find were `CharSequence` in Java, `StringLike` in Scala...and `Stringy` in Perl 6. And I'm sure you thought you were joking!

Ha!

Honestly, I'd recommend just going with `StringProtocol` unless you can come up with an adjective form you like (`Stringlike`? `Textual`?). It's a bit clumsy, but it's crystal clear. Stupid name, but you'll never forget it.

I think it’s kind of evenly balanced between Unicode and StringProtocol. Neither are perfect.

I'm a little worried about this because it seems to imply that the protocol cannot include any mutation operations that aren't in `RangeReplaceableCollection`. For instance, it won't be possible to include an in-place `applyTransform` method in the protocol. Do you anticipate that being an issue? Might it be a good idea to define a parallel `Mutable` or `RangeReplaceable` protocol?

You can always assign to self. Then provide more efficient implementations where RangeReplaceableCollection. We do this elsewhere in the std lib with collections e.g. https://github.com/apple/swift/blob/master/stdlib/public/core/Collection.swift#L1277\.

Proliferating protocol combinations is problematic (looking at you, BidirectionalMutableRandomAccessSlice).

Nobody likes proliferation, but in this case it'd be because there genuinely were additional semantics that were only available on mutable strings.

(Once upon a time, I think I requested the ability to write `func index(of elem: Iterator.Element) -> Index? where Iterator.Element: Equatable`. Could such a feature be used for this? `func apply(_ transform: StringTransform, reverse: Bool) where Self: RangeReplaceableCollection`?)

The C string interop methods will be updated to those described here: a single withCString operation and two init(cString:) constructors, one for UTF8 and one for arbitrary encodings.

Sorry if I'm repeating something that was already discussed, but is there a reason you don't include a `withCString` variant for arbitrary encodings? It seems like an odd asymmetry.

Hmm. Is this a common use-case people have? Symmetry for the sake of it doesn’t seem enough. If uncommon, you can do it via an Array that you nul-terminate manually.

Is `init(cString:encoding:)` a common use case? If it is, I'm not sure why the opposite wouldn't be.

This + another use case has convinced me that yes, we should have a matching withCString version.

Yeah, it’s tempting to make ParseResult general, and the only reason we held off is because we don’t want making sure it’s generally useful to be a distraction.

Understandable.

I wonder if some part of the parsing algorithm could somehow be generalized so it was suitable for many purposes and then put on `Collection`, with the `UnicodeEncoding` then being passed as a parameter to it. If so, that would justify making `ParseResult` a top-level type.

Ah, yes. Here it is:

public protocol EncodedScalarProtocol : RandomAccessCollection {
init?(_ scalarValue: UnicodeScalar)
var utf8: UTF8.EncodedScalar { get }
var utf16: UTF16.EncodedScalar { get }
var utf32: UTF32.EncodedScalar { get }
}

What is the `Element` type expected to be here?

I think what's missing is a holistic overview of the encoding system. So, please help me write this function:

  func unicodeScalars<Encoding: UnicodeEncoding>(in data: Data, using encoding: Encoding.Type) -> [UnicodeScalar] {
    var scalars: [UnicodeScalar] =
    
    data.withUnsafeBytes { (bytes: UnsafePointer<$ParseInputElement>) in
      let buffer = UnsafeBufferPointer(start: bytes, count: data.count / MemoryLayout<$ParseInputElement>.size)
      encoding.parseForward(buffer) { encodedScalar in
        let unicodeScalar: UnicodeScalar = $doSomething(encodedScalar)
        scalars.append(unicodeScalar)
      }
    }
    
    return scalars
  }

What type would I put for $ParseInputElement? What function or initializer do I call for $doSomething?

Will come back on this.

@discardableResult
public static func parseForward<C: Collection>(
  _ input: C,
  repairingIllFormedSequences makeRepairs: Bool = true,
  into output: (EncodedScalar) throws->Void
) rethrows -> (remainder: C.SubSequence, errorCount: Int)

Are there constraints missing on `parseForward`?

Yep – see the note that appears a little later. They’re really implementation details – so not something to capture in the proposal – which may or may not be needed depending on whether this lands before or after the generics features that make them redundant.

No, I mean because this says nothing about `C`'s element type. Presumably you can't parse a bunch of `UIView`s into Unicode scalars, so there must be some kind of constraint on the collection's elements. What is it?

...oh, I notice that `parseScalarForward(_:knownCount:)` has the clause `where C.Iterator.Element == EncodedScalar.Iterator.Element` attached. Should that also be attached to `parseForward(_:repairingIllFormedSequences:into:)`?

What do these do if `makeRepairs` is false? Would it be clearer if we made an enum that described the behaviors and changed the label to something like `ifIllFormed:`?

The Unicode standard specifies values to substitute when making repairs.

I'm asking what happens if you *don't* want to make repairs. Does it, say, stop immediately, returning an `errorCount` of `1` and a `remainder` that starts at the site of the error? If so, would we better off having that parameter be something like `ifIllFormed: .stop` or `ifIllFormed: .repair`, rather than `repairingIllFormedSequences: false` or `repairingIllFormedSequences: true`?

The idea is, if you don’t want to make repairs, you use the transcoding primitives instead. The belief is that the old non-repairing versions (return nil if repairs needed) weren’t useful.

Due to the change in internal implementation, this means that these operations will be O(n) rather than O(1). This is not expected to be a major concern, based on experiences from a similar change made to Java, but projects will be able to work around performance issues without upgrading to Swift 4 by explicitly typing slices as Substring, which will call the Swift 4 variant, and which will be available but not invoked by default in Swift 3 mode.

Will there be a way to make this also work with a real Swift 3 compiler? For instance, can you define `typealias Substring = String` in such a way that real Swift 3 will parse and use it, but Swift 4 in Swift 3 mode will ignore it?

Are you talking about this as a way for people to change their code, while still being able to compile their code with the old compiler? Yes, that might be a good strategy, will think about that.

Yes, that's what I'm talking about.

I guess the actual question is, does `#if swift(>=4)` come out as `true` for Swift 4 in Swift 3 mode? If not, is there some way to detect that you're using Swift 4 in Swift 3 mode? (I suppose one answer is "yes, Swift 4 in Swift 3 mode is called Swift 3.2"; I just haven't heard anyone mention anything like that yet.) In either case, if there's some way to distinguish, you could say:

  if thisIsRealSwift3NotSwift4PretendingToBeSwift3()
  typealias Substring = String
  #endif

And then you could write the rest of your code using `Substring` and it would compile using both Swift 3 and Swift 4 toolchains, never forcing an implicit copy.

Ah right. Unfortunately as things are currently envisioned, this won’t work – you won’t be able to distinguish “true” Swift 3 from Swift 3 compatibility mode.

···

On Mar 30, 2017, at 6:52 PM, Brent Royal-Gordon <brent@architechies.com> wrote:

On Mar 30, 2017, at 2:36 PM, Ben Cohen <ben_cohen@apple.com <mailto:ben_cohen@apple.com>> wrote:

--
Brent Royal-Gordon
Architechies

There is a difference between subsequence, which is one word, and the others, which are noun phrases (i.e. “any sequence”, “lazy sequence”). The issue is whether it’s "sub-sequence" (capitalization reasonable) or subsequence (no reason for caps).

···

On Mar 30, 2017, at 8:59 AM, Adrian Zubarev via swift-evolution <swift-evolution@swift.org> wrote:

We cannot rename SubSequence to Subsequence, because that would be odd compared to all other types containing Sequence.

Yes—I was asking about the transcoding primitives here. Currently the call looks like one of these:

  let (remainder, errorCount) = UTF8.parseForward(bytes, repairingIllFormedSequences: false) { … }
  let (remainder, errorCount) = UTF8.parseForward(bytes, repairingIllFormedSequences: true) { … }

I'm saying, would it be clearer if it looked like this instead?

  let (remainder, errorCount) = UTF8.parseForward(bytes, ifIllFormed: .stop) { … }
  let (remainder, errorCount) = UTF8.parseForward(bytes, ifIllFormed: .repair) { … }

···

On Apr 5, 2017, at 12:16 PM, Ben Cohen <ben_cohen@apple.com> wrote:

The idea is, if you don’t want to make repairs, you use the transcoding primitives instead. The belief is that the old non-repairing versions (return nil if repairs needed) weren’t useful.

--
Brent Royal-Gordon
Architechies