Pitch: String Index Overhaul


(Dave Abrahams) #1

Pretty version: https://github.com/dabrahams/swift-evolution/blob/string-index-overhaul/proposals/NNNN-string-index-overhaul.md

···

----

# String Index Overhaul

* Proposal: [SE-NNNN](NNNN-string-index-overhaul.md)
* Authors: [Dave Abrahams](https://github.com/dabrahams)
* Review Manager: TBD
* Status: **Awaiting review**
* Pull Request Implementing This Proposal: https://github.com/apple/swift/pull/9806

*During the review process, add the following fields as needed:*

## Introduction

Today `String` shares an `Index` type with its `CharacterView` but not
with its `UTF8View`, `UTF16View`, or `UnicodeScalarView`. This
proposal redefines `String.UTF8View.Index`, `String.UTF16View.Index`,
and `String.CharacterView.Index` as typealiases for `String.Index`,
and exposes a public `encodedOffset` property and initializer that can
be used to serialize and deserialize positions in a `String` or
`Substring`.

Swift-evolution thread: [Discussion thread topic for that proposal](https://lists.swift.org/pipermail/swift-evolution/)

## Motivation

The different index types are supported by a set of `Index`
initializers, which are failable whenever the source index might not
correspond to a position in the target view:

if let j = String.UnicodeScalarView.Index(
  someUTF16Position, within: s.unicodeScalars) {
  ... 
}

The current API is as follows:

public extension String.Index {
  init?(_: String.UnicodeScalarIndex, within: String)
  init?(_: String.UTF16Index, within: String)
  init?(_: String.UTF8Index, within: String)
}

public extension String.UTF16View.Index {
  init?(_: String.UTF8Index, within: String.UTF16View)
  init(_: String.UnicodeScalarIndex, within: String.UTF16View)
  init(_: String.Index, within: String.UTF16View)
}

public extension String.UTF8View.Index {
  init?(_: String.UTF16Index, within: String.UTF8View)
  init(_: String.UnicodeScalarIndex, within: String.UTF8View)
  init(_: String.Index, within: String.UTF8View)
}

public extension String.UnicodeScalarView.Index {
  init?(_: String.UTF16Index, within: String.UnicodeScalarView)
  init?(_: String.UTF8Index, within: String.UnicodeScalarView)
  init(_: String.Index, within: String.UnicodeScalarView)
}

These initializers are supplemented by a corresponding set of
convenience conversion methods:

if let j = someUTF16Position.samePosition(in: s.unicodeScalars) {
  ... 
}

with the following API:

public extension String.Index {
  func samePosition(in: String.UTF8View) -> String.UTF8View.Index
  func samePosition(in: String.UTF16View) -> String.UTF16View.Index
  func samePosition(
    in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index
}

public extension String.UTF16View.Index {
  func samePosition(in: String) -> String.Index?
  func samePosition(in: String.UTF8View) -> String.UTF8View.Index?
  func samePosition(
    in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index?
}

public extension String.UTF8View.Index {
  func samePosition(in: String) -> String.Index?
  func samePosition(in: String.UTF16View) -> String.UTF16View.Index?
  func samePosition(
    in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index?
}

public extension String.UnicodeScalarView.Index {
  func samePosition(in: String) -> String.Index?
  func samePosition(in: String.UTF8View) -> String.UTF8View.Index
  func samePosition(in: String.UTF16View) -> String.UTF16View.Index
}

The result is a great deal of API surface area for apparently little
gain in ordinary code, that normally only interchanges indices among
views when the positions match up exactly (i.e. when the conversion is
going to succeed). Also, the resulting code is needlessly awkward.

Finally, the opacity of these index types makes it difficult to record
`String` or `Substring` positions in files or other archival forms,
and reconstruct the original positions with respect to a deserialized
`String` or `Substring`.

## Proposed solution

All `String` views will use a single index type (`String.Index`), so
that positions can be interchanged without awkward explicit
conversions:

let html: String = "See <a href=\"http://swift.org\">swift.org</a>"

// Search the UTF16, instead of characters, for performance reasons:
let open = "<".utf16.first!, close = ">".utf16.first!
let tagStart = s.utf16.index(of: open)
let tagEnd = s.utf16[tagStart...].index(of: close)

// Slice the String with the UTF-16 indices to retrieve the tag.
let tag = html[tagStart...tagEnd]

A property and an intializer will be added to `String.Index`, exposing
the offset of the index in code units (currently only UTF-16) from the
beginning of the string:

let n: Int = html.endIndex.encodedOffset
let end = String.Index(encodedOffset: n)
assert(end == String.endIndex)

# Comparison and Slicing Semantics

When two indices being compared correspond to positions that are valid
in any single `String` view, comparison semantics are already fully
specified by the `Collection` requirements. Where no single `String`
view contains both index values, the indices compare unequal and
ordering is determined by comparison of `encodedOffsets`. These index
values are not totally ordered but do satisfy strict weak ordering
requirements, which is sufficient for algorithms such as `sort` to
exhibit sensible behavior. We might consider loosening the specified
requirements on these algorithms and on `Comparable` to support strict
weak ordering, but for now we can treat such index pairs as being
outside the domain of comparison, like any other indices from
completely distinct collections.

An index that does not fall on an exact boundary in a given `String`
or `Substring` view will be “rounded down” to the nearest boundary
when used for slicing or range replacement. So, for example,

let s = "e\u{301}galite\u{301}"                          // "égalité"
print(s[s.unicodeScalars.indices.dropFirst().first!...]) // "égalité"
print(s[..<s.unicodeScalars.indices.last!])              // "égalit"

Replacing the failable APIs listed [above](#motivation) that detect
whether an index represents a valid position in a given view, and
enhancement that explicitly round index positions to nearby boundaries
in a given view, are left to a later proposal. For now, we do not
propose to remove the existing index conversion APIs.

## Detailed design

`String.Index` acquires an `encodedOffset` property and initializer:

public extension String.Index {
  /// Creates a position corresponding to the given offset in a
  /// `String`'s underlying (UTF-16) code units.
  init(encodedOffset: Int)

  /// The position of this index expressed as an offset from the
  /// beginning of the `String`'s underlying (UTF-16) code units.
  var encodedOffset: Int
}

`Index` types of `String.UTF8View`, `String.UTF16View`, and
`String.UnicodeScalarView` are replaced by `String.Index`:

public extension String.UTF8View {
  typealias Index = String.Index
}
public extension String.UTF16View {
  typealias Index = String.Index
}
public extension String.UnicodeScalarView {
  typealias Index = String.Index
}

Because the index types are collapsing, index conversion methods and
initializers are reduced to the following:

public extension String.Index {
  init?(_: String.Index, within: String)
  init?(_: String.Index, within: String.UTF8View)
  init?(_: String.Index, within: String.UTF16View)
  init?(_: String.Index, within: String.UnicodeScalarView)

  func samePosition(in: String) -> String.Index?
  func samePosition(in: String.UTF8View) -> String.Index?
  func samePosition(in: String.UTF16View) -> String.Index?
  func samePosition(in: String.UnicodeScalarView) -> String.Index?
}

## Source compatibility

Because of the collapse of index
types, [existing non-failable APIs](#motivation) become failable. To
avoid breaking Swift 3 code, the following overloads of existing
functions are added, allowing the resulting optional indices to be
used where previously non-optional indices were used. These overloads
were driven by making the new APIs work with existing code, including
the Swift source compatibility test suite, and should be viewed as
migration aids only, rather than additions to the Swift 3 API.

extension Optional where Wrapped == String.Index {
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
  public static func ..<(
    lhs: String.Index?, rhs: String.Index?
  ) -> Range<String.Index> {
    return lhs! ..< rhs!
  }

  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
  public static func ...(
    lhs: String.Index?, rhs: String.Index?
  ) -> ClosedRange<String.Index> {
    return lhs! ... rhs!
  }
}

// backward compatibility for index interchange.  
extension String.UTF16View {
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
  public func index(after i: Index?) -> Index {
    return index(after: i)
  }
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
  public func index(
    _ i: Index?, offsetBy n: IndexDistance) -> Index {
    return index(i!, offsetBy: n)
  }
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
  public func distance(from i: Index?, to j: Index?) -> IndexDistance {
    return distance(from: i!, to: j!)
  }
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
  public subscript(i: Index?) -> Unicode.UTF16.CodeUnit {
    return self[i!]
  }
}

extension String.UTF8View {
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
  public func index(after i: Index?) -> Index {
    return index(after: i!)
  }
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
  public func index(_ i: Index?, offsetBy n: IndexDistance) -> Index {
    return index(i!, offsetBy: n)
  }
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
  public func distance(
    from i: Index?, to j: Index?) -> IndexDistance {
    return distance(from: i!, to: j!)
  }
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
  public subscript(i: Index?) -> Unicode.UTF8.CodeUnit {
    return self[i!]
  }
}

// backward compatibility for index interchange.  
extension String.UnicodeScalarView {
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
  public func index(after i: Index?) -> Index {
    return index(after: i)
  }
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
  public func index(_ i: Index?,  offsetBy n: IndexDistance) -> Index {
    return index(i!, offsetBy: n)
  }
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
  public func distance(from i: Index?, to j: Index?) -> IndexDistance {
    return distance(from: i!, to: j!)
  }
  @available(
    swift, deprecated: 3.2, obsoleted: 4.0,
    message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
  public subscript(i: Index?) -> Unicode.Scalar {
    return self[i!]
  }
}

- **Q**: Will existing correct Swift 3 applications stop compiling due
  to this change?

  **A**: it is possible but unlikely. The existing index conversion
  APIs are relatively rarely used, and the overloads listed above
  handle the common cases in Swift 3 compatibility mode.
  
- **Q**: Will applications still compile but produce
  different behavior than they used to?

  **A**: No.
  
- **Q**: Is it possible to automatically migrate from the old syntax
  to the new syntax?

  **A**: Yes, although usages of these APIs may be rare enough that it
  isn't worth the trouble.

- **Q**: Can Swift applications be written in a common subset that works
   both with Swift 3 and Swift 4 to aid in migration?

  **A**: Yes, the Swift 4 APIs will all be available in Swift 3 mode.

## Effect on ABI stability

This proposal changes the ABI of the standard library.

## Effect on API resilience

This proposal makes no changes to the resilience of any APIs.

## Alternatives considered

The only alternative considered was no action.

--
-Dave


(Michael Ilseman) #2

Pretty version: https://github.com/dabrahams/swift-evolution/blob/string-index-overhaul/proposals/NNNN-string-index-overhaul.md

----

# String Index Overhaul

* Proposal: [SE-NNNN](NNNN-string-index-overhaul.md)
* Authors: [Dave Abrahams](https://github.com/dabrahams)
* Review Manager: TBD
* Status: **Awaiting review**
* Pull Request Implementing This Proposal: https://github.com/apple/swift/pull/9806

*During the review process, add the following fields as needed:*

## Introduction

Today `String` shares an `Index` type with its `CharacterView` but not
with its `UTF8View`, `UTF16View`, or `UnicodeScalarView`. This
proposal redefines `String.UTF8View.Index`, `String.UTF16View.Index`,
and `String.CharacterView.Index` as typealiases for `String.Index`,
and exposes a public `encodedOffset` property and initializer that can
be used to serialize and deserialize positions in a `String` or
`Substring`.

Swift-evolution thread: [Discussion thread topic for that proposal](https://lists.swift.org/pipermail/swift-evolution/)

## Motivation

The different index types are supported by a set of `Index`
initializers, which are failable whenever the source index might not
correspond to a position in the target view:

if let j = String.UnicodeScalarView.Index(
 someUTF16Position, within: s.unicodeScalars) {
 ... 
}

The current API is as follows:

public extension String.Index {
 init?(_: String.UnicodeScalarIndex, within: String)
 init?(_: String.UTF16Index, within: String)
 init?(_: String.UTF8Index, within: String)
}

public extension String.UTF16View.Index {
 init?(_: String.UTF8Index, within: String.UTF16View)
 init(_: String.UnicodeScalarIndex, within: String.UTF16View)
 init(_: String.Index, within: String.UTF16View)
}

public extension String.UTF8View.Index {
 init?(_: String.UTF16Index, within: String.UTF8View)
 init(_: String.UnicodeScalarIndex, within: String.UTF8View)
 init(_: String.Index, within: String.UTF8View)
}

public extension String.UnicodeScalarView.Index {
 init?(_: String.UTF16Index, within: String.UnicodeScalarView)
 init?(_: String.UTF8Index, within: String.UnicodeScalarView)
 init(_: String.Index, within: String.UnicodeScalarView)
}

These initializers are supplemented by a corresponding set of
convenience conversion methods:

if let j = someUTF16Position.samePosition(in: s.unicodeScalars) {
 ... 
}

with the following API:

public extension String.Index {
 func samePosition(in: String.UTF8View) -> String.UTF8View.Index
 func samePosition(in: String.UTF16View) -> String.UTF16View.Index
 func samePosition(
   in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index
}

public extension String.UTF16View.Index {
 func samePosition(in: String) -> String.Index?
 func samePosition(in: String.UTF8View) -> String.UTF8View.Index?
 func samePosition(
   in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index?
}

public extension String.UTF8View.Index {
 func samePosition(in: String) -> String.Index?
 func samePosition(in: String.UTF16View) -> String.UTF16View.Index?
 func samePosition(
   in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index?
}

public extension String.UnicodeScalarView.Index {
 func samePosition(in: String) -> String.Index?
 func samePosition(in: String.UTF8View) -> String.UTF8View.Index
 func samePosition(in: String.UTF16View) -> String.UTF16View.Index
}

The result is a great deal of API surface area for apparently little
gain in ordinary code, that normally only interchanges indices among
views when the positions match up exactly (i.e. when the conversion is
going to succeed). Also, the resulting code is needlessly awkward.

Finally, the opacity of these index types makes it difficult to record
`String` or `Substring` positions in files or other archival forms,
and reconstruct the original positions with respect to a deserialized
`String` or `Substring`.

## Proposed solution

All `String` views will use a single index type (`String.Index`), so
that positions can be interchanged without awkward explicit
conversions:

let html: String = "See <a href=\"http://swift.org\">swift.org</a>"

// Search the UTF16, instead of characters, for performance reasons:
let open = "<".utf16.first!, close = ">".utf16.first!
let tagStart = s.utf16.index(of: open)
let tagEnd = s.utf16[tagStart...].index(of: close)

// Slice the String with the UTF-16 indices to retrieve the tag.
let tag = html[tagStart...tagEnd]

A property and an intializer will be added to `String.Index`, exposing
the offset of the index in code units (currently only UTF-16) from the
beginning of the string:

let n: Int = html.endIndex.encodedOffset
let end = String.Index(encodedOffset: n)
assert(end == String.endIndex)

# Comparison and Slicing Semantics

When two indices being compared correspond to positions that are valid
in any single `String` view, comparison semantics are already fully
specified by the `Collection` requirements. Where no single `String`
view contains both index values, the indices compare unequal and
ordering is determined by comparison of `encodedOffsets`. These index
values are not totally ordered but do satisfy strict weak ordering
requirements, which is sufficient for algorithms such as `sort` to
exhibit sensible behavior. We might consider loosening the specified
requirements on these algorithms and on `Comparable` to support strict
weak ordering, but for now we can treat such index pairs as being
outside the domain of comparison, like any other indices from
completely distinct collections.

An index that does not fall on an exact boundary in a given `String`
or `Substring` view will be “rounded down” to the nearest boundary
when used for slicing or range replacement. So, for example,

What about normal subscript? I.e. what would the following print?

print(s[s.unicodeScalars.indices.dropFirst().first!]) // “é”, or just the combining scalar?

Would unifying under the same type require that indices be less stateful than they currently are?

···

On May 27, 2017, at 10:40 AM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:

let s = "e\u{301}galite\u{301}"                          // "égalité"
print(s[s.unicodeScalars.indices.dropFirst().first!...]) // “égalité"
print(s[..<s.unicodeScalars.indices.last!])              // "égalit"

Replacing the failable APIs listed [above](#motivation) that detect
whether an index represents a valid position in a given view, and
enhancement that explicitly round index positions to nearby boundaries
in a given view, are left to a later proposal. For now, we do not
propose to remove the existing index conversion APIs.

## Detailed design

`String.Index` acquires an `encodedOffset` property and initializer:

public extension String.Index {
 /// Creates a position corresponding to the given offset in a
 /// `String`'s underlying (UTF-16) code units.
 init(encodedOffset: Int)

 /// The position of this index expressed as an offset from the
 /// beginning of the `String`'s underlying (UTF-16) code units.
 var encodedOffset: Int
}

`Index` types of `String.UTF8View`, `String.UTF16View`, and
`String.UnicodeScalarView` are replaced by `String.Index`:

public extension String.UTF8View {
 typealias Index = String.Index
}
public extension String.UTF16View {
 typealias Index = String.Index
}
public extension String.UnicodeScalarView {
 typealias Index = String.Index
}

Because the index types are collapsing, index conversion methods and
initializers are reduced to the following:

public extension String.Index {
 init?(_: String.Index, within: String)
 init?(_: String.Index, within: String.UTF8View)
 init?(_: String.Index, within: String.UTF16View)
 init?(_: String.Index, within: String.UnicodeScalarView)

 func samePosition(in: String) -> String.Index?
 func samePosition(in: String.UTF8View) -> String.Index?
 func samePosition(in: String.UTF16View) -> String.Index?
 func samePosition(in: String.UnicodeScalarView) -> String.Index?
}

## Source compatibility

Because of the collapse of index
types, [existing non-failable APIs](#motivation) become failable. To
avoid breaking Swift 3 code, the following overloads of existing
functions are added, allowing the resulting optional indices to be
used where previously non-optional indices were used. These overloads
were driven by making the new APIs work with existing code, including
the Swift source compatibility test suite, and should be viewed as
migration aids only, rather than additions to the Swift 3 API.

extension Optional where Wrapped == String.Index {
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
 public static func ..<(
   lhs: String.Index?, rhs: String.Index?
 ) -> Range<String.Index> {
   return lhs! ..< rhs!
 }

 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
 public static func ...(
   lhs: String.Index?, rhs: String.Index?
 ) -> ClosedRange<String.Index> {
   return lhs! ... rhs!
 }
}

// backward compatibility for index interchange.  
extension String.UTF16View {
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
 public func index(after i: Index?) -> Index {
   return index(after: i)
 }
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
 public func index(
   _ i: Index?, offsetBy n: IndexDistance) -> Index {
   return index(i!, offsetBy: n)
 }
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
 public func distance(from i: Index?, to j: Index?) -> IndexDistance {
   return distance(from: i!, to: j!)
 }
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
 public subscript(i: Index?) -> Unicode.UTF16.CodeUnit {
   return self[i!]
 }
}

extension String.UTF8View {
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
 public func index(after i: Index?) -> Index {
   return index(after: i!)
 }
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
 public func index(_ i: Index?, offsetBy n: IndexDistance) -> Index {
   return index(i!, offsetBy: n)
 }
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
 public func distance(
   from i: Index?, to j: Index?) -> IndexDistance {
   return distance(from: i!, to: j!)
 }
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
 public subscript(i: Index?) -> Unicode.UTF8.CodeUnit {
   return self[i!]
 }
}

// backward compatibility for index interchange.  
extension String.UnicodeScalarView {
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
 public func index(after i: Index?) -> Index {
   return index(after: i)
 }
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
 public func index(_ i: Index?,  offsetBy n: IndexDistance) -> Index {
   return index(i!, offsetBy: n)
 }
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
 public func distance(from i: Index?, to j: Index?) -> IndexDistance {
   return distance(from: i!, to: j!)
 }
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
 public subscript(i: Index?) -> Unicode.Scalar {
   return self[i!]
 }
}

- **Q**: Will existing correct Swift 3 applications stop compiling due
to this change?

**A**: it is possible but unlikely. The existing index conversion
APIs are relatively rarely used, and the overloads listed above
handle the common cases in Swift 3 compatibility mode.

- **Q**: Will applications still compile but produce
different behavior than they used to?

**A**: No.

- **Q**: Is it possible to automatically migrate from the old syntax
to the new syntax?

**A**: Yes, although usages of these APIs may be rare enough that it
isn't worth the trouble.

- **Q**: Can Swift applications be written in a common subset that works
  both with Swift 3 and Swift 4 to aid in migration?

**A**: Yes, the Swift 4 APIs will all be available in Swift 3 mode.

## Effect on ABI stability

This proposal changes the ABI of the standard library.

## Effect on API resilience

This proposal makes no changes to the resilience of any APIs.

## Alternatives considered

The only alternative considered was no action.

--
-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Jordan Rose) #3

My knee-jerk reaction is to say it's too late in Swift 4 for this kind of change, but with that out of the way, I'm most concerned about what it means to have, say, a UTF-8 index that's not on a UTF-16 boundary.

let str = "言"
let oneUnitIn = str.utf8.index(after: str.utf8.startIndex)
let trailingBytes = str.utf8[oneUnitIn...]

What can I do with 'oneUnitIn'? How do I test to see if it's on a Character boundary or a UnicodeScalar boundary?

Jordan

···

On May 27, 2017, at 10:40, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:

Pretty version: https://github.com/dabrahams/swift-evolution/blob/string-index-overhaul/proposals/NNNN-string-index-overhaul.md

----

# String Index Overhaul

* Proposal: [SE-NNNN](NNNN-string-index-overhaul.md)
* Authors: [Dave Abrahams](https://github.com/dabrahams)
* Review Manager: TBD
* Status: **Awaiting review**
* Pull Request Implementing This Proposal: https://github.com/apple/swift/pull/9806

*During the review process, add the following fields as needed:*

## Introduction

Today `String` shares an `Index` type with its `CharacterView` but not
with its `UTF8View`, `UTF16View`, or `UnicodeScalarView`. This
proposal redefines `String.UTF8View.Index`, `String.UTF16View.Index`,
and `String.CharacterView.Index` as typealiases for `String.Index`,
and exposes a public `encodedOffset` property and initializer that can
be used to serialize and deserialize positions in a `String` or
`Substring`.

Swift-evolution thread: [Discussion thread topic for that proposal](https://lists.swift.org/pipermail/swift-evolution/)


(Philippe Hausler) #4

I would presume that the index type will still be shared between String and SubString, will this mean that we will now be able to express index manipulation in StringProtocol?

I find StringProtocol a bit hard to deal with when attempting to make range conversions; it would be really nice if we could make this possible (or perhaps more intuitive... since, for the life of me I can’t figure out a way to generically convert indexes for StringProtocol adoption)

So lets say you have a function as such:

func foo<S: StringProtocol>(_ str: S, range: Range<S.Index>) {
    range.lowerBound.samePosition(in: str.utf16)
}

results in the error error: value of type 'S.Index' has no member ‘samePosition’

This of course is an intended target of something that deals with strings and wants to deal with both strings and substrings uniformly since it is reasonable to pass either.

In short: are StringProtocol accessors a consideration for conversion in this change?

···

On May 27, 2017, at 10:40 AM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:

Pretty version: https://github.com/dabrahams/swift-evolution/blob/string-index-overhaul/proposals/NNNN-string-index-overhaul.md

----

# String Index Overhaul

* Proposal: [SE-NNNN](NNNN-string-index-overhaul.md)
* Authors: [Dave Abrahams](https://github.com/dabrahams)
* Review Manager: TBD
* Status: **Awaiting review**
* Pull Request Implementing This Proposal: https://github.com/apple/swift/pull/9806

*During the review process, add the following fields as needed:*

## Introduction

Today `String` shares an `Index` type with its `CharacterView` but not
with its `UTF8View`, `UTF16View`, or `UnicodeScalarView`. This
proposal redefines `String.UTF8View.Index`, `String.UTF16View.Index`,
and `String.CharacterView.Index` as typealiases for `String.Index`,
and exposes a public `encodedOffset` property and initializer that can
be used to serialize and deserialize positions in a `String` or
`Substring`.

Swift-evolution thread: [Discussion thread topic for that proposal](https://lists.swift.org/pipermail/swift-evolution/)

## Motivation

The different index types are supported by a set of `Index`
initializers, which are failable whenever the source index might not
correspond to a position in the target view:

if let j = String.UnicodeScalarView.Index(
 someUTF16Position, within: s.unicodeScalars) {
 ... 
}

The current API is as follows:

public extension String.Index {
 init?(_: String.UnicodeScalarIndex, within: String)
 init?(_: String.UTF16Index, within: String)
 init?(_: String.UTF8Index, within: String)
}

public extension String.UTF16View.Index {
 init?(_: String.UTF8Index, within: String.UTF16View)
 init(_: String.UnicodeScalarIndex, within: String.UTF16View)
 init(_: String.Index, within: String.UTF16View)
}

public extension String.UTF8View.Index {
 init?(_: String.UTF16Index, within: String.UTF8View)
 init(_: String.UnicodeScalarIndex, within: String.UTF8View)
 init(_: String.Index, within: String.UTF8View)
}

public extension String.UnicodeScalarView.Index {
 init?(_: String.UTF16Index, within: String.UnicodeScalarView)
 init?(_: String.UTF8Index, within: String.UnicodeScalarView)
 init(_: String.Index, within: String.UnicodeScalarView)
}

These initializers are supplemented by a corresponding set of
convenience conversion methods:

if let j = someUTF16Position.samePosition(in: s.unicodeScalars) {
 ... 
}

with the following API:

public extension String.Index {
 func samePosition(in: String.UTF8View) -> String.UTF8View.Index
 func samePosition(in: String.UTF16View) -> String.UTF16View.Index
 func samePosition(
   in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index
}

public extension String.UTF16View.Index {
 func samePosition(in: String) -> String.Index?
 func samePosition(in: String.UTF8View) -> String.UTF8View.Index?
 func samePosition(
   in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index?
}

public extension String.UTF8View.Index {
 func samePosition(in: String) -> String.Index?
 func samePosition(in: String.UTF16View) -> String.UTF16View.Index?
 func samePosition(
   in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index?
}

public extension String.UnicodeScalarView.Index {
 func samePosition(in: String) -> String.Index?
 func samePosition(in: String.UTF8View) -> String.UTF8View.Index
 func samePosition(in: String.UTF16View) -> String.UTF16View.Index
}

The result is a great deal of API surface area for apparently little
gain in ordinary code, that normally only interchanges indices among
views when the positions match up exactly (i.e. when the conversion is
going to succeed). Also, the resulting code is needlessly awkward.

Finally, the opacity of these index types makes it difficult to record
`String` or `Substring` positions in files or other archival forms,
and reconstruct the original positions with respect to a deserialized
`String` or `Substring`.

## Proposed solution

All `String` views will use a single index type (`String.Index`), so
that positions can be interchanged without awkward explicit
conversions:

let html: String = "See <a href=\"http://swift.org\">swift.org</a>"

// Search the UTF16, instead of characters, for performance reasons:
let open = "<".utf16.first!, close = ">".utf16.first!
let tagStart = s.utf16.index(of: open)
let tagEnd = s.utf16[tagStart...].index(of: close)

// Slice the String with the UTF-16 indices to retrieve the tag.
let tag = html[tagStart...tagEnd]

A property and an intializer will be added to `String.Index`, exposing
the offset of the index in code units (currently only UTF-16) from the
beginning of the string:

let n: Int = html.endIndex.encodedOffset
let end = String.Index(encodedOffset: n)
assert(end == String.endIndex)

# Comparison and Slicing Semantics

When two indices being compared correspond to positions that are valid
in any single `String` view, comparison semantics are already fully
specified by the `Collection` requirements. Where no single `String`
view contains both index values, the indices compare unequal and
ordering is determined by comparison of `encodedOffsets`. These index
values are not totally ordered but do satisfy strict weak ordering
requirements, which is sufficient for algorithms such as `sort` to
exhibit sensible behavior. We might consider loosening the specified
requirements on these algorithms and on `Comparable` to support strict
weak ordering, but for now we can treat such index pairs as being
outside the domain of comparison, like any other indices from
completely distinct collections.

An index that does not fall on an exact boundary in a given `String`
or `Substring` view will be “rounded down” to the nearest boundary
when used for slicing or range replacement. So, for example,

let s = "e\u{301}galite\u{301}"                          // "égalité"
print(s[s.unicodeScalars.indices.dropFirst().first!...]) // "égalité"
print(s[..<s.unicodeScalars.indices.last!])              // "égalit"

Replacing the failable APIs listed [above](#motivation) that detect
whether an index represents a valid position in a given view, and
enhancement that explicitly round index positions to nearby boundaries
in a given view, are left to a later proposal. For now, we do not
propose to remove the existing index conversion APIs.

## Detailed design

`String.Index` acquires an `encodedOffset` property and initializer:

public extension String.Index {
 /// Creates a position corresponding to the given offset in a
 /// `String`'s underlying (UTF-16) code units.
 init(encodedOffset: Int)

 /// The position of this index expressed as an offset from the
 /// beginning of the `String`'s underlying (UTF-16) code units.
 var encodedOffset: Int
}

`Index` types of `String.UTF8View`, `String.UTF16View`, and
`String.UnicodeScalarView` are replaced by `String.Index`:

public extension String.UTF8View {
 typealias Index = String.Index
}
public extension String.UTF16View {
 typealias Index = String.Index
}
public extension String.UnicodeScalarView {
 typealias Index = String.Index
}

Because the index types are collapsing, index conversion methods and
initializers are reduced to the following:

public extension String.Index {
 init?(_: String.Index, within: String)
 init?(_: String.Index, within: String.UTF8View)
 init?(_: String.Index, within: String.UTF16View)
 init?(_: String.Index, within: String.UnicodeScalarView)

 func samePosition(in: String) -> String.Index?
 func samePosition(in: String.UTF8View) -> String.Index?
 func samePosition(in: String.UTF16View) -> String.Index?
 func samePosition(in: String.UnicodeScalarView) -> String.Index?
}

## Source compatibility

Because of the collapse of index
types, [existing non-failable APIs](#motivation) become failable. To
avoid breaking Swift 3 code, the following overloads of existing
functions are added, allowing the resulting optional indices to be
used where previously non-optional indices were used. These overloads
were driven by making the new APIs work with existing code, including
the Swift source compatibility test suite, and should be viewed as
migration aids only, rather than additions to the Swift 3 API.

extension Optional where Wrapped == String.Index {
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
 public static func ..<(
   lhs: String.Index?, rhs: String.Index?
 ) -> Range<String.Index> {
   return lhs! ..< rhs!
 }

 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
 public static func ...(
   lhs: String.Index?, rhs: String.Index?
 ) -> ClosedRange<String.Index> {
   return lhs! ... rhs!
 }
}

// backward compatibility for index interchange.  
extension String.UTF16View {
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
 public func index(after i: Index?) -> Index {
   return index(after: i)
 }
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
 public func index(
   _ i: Index?, offsetBy n: IndexDistance) -> Index {
   return index(i!, offsetBy: n)
 }
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
 public func distance(from i: Index?, to j: Index?) -> IndexDistance {
   return distance(from: i!, to: j!)
 }
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
 public subscript(i: Index?) -> Unicode.UTF16.CodeUnit {
   return self[i!]
 }
}

extension String.UTF8View {
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
 public func index(after i: Index?) -> Index {
   return index(after: i!)
 }
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
 public func index(_ i: Index?, offsetBy n: IndexDistance) -> Index {
   return index(i!, offsetBy: n)
 }
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
 public func distance(
   from i: Index?, to j: Index?) -> IndexDistance {
   return distance(from: i!, to: j!)
 }
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
 public subscript(i: Index?) -> Unicode.UTF8.CodeUnit {
   return self[i!]
 }
}

// backward compatibility for index interchange.  
extension String.UnicodeScalarView {
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
 public func index(after i: Index?) -> Index {
   return index(after: i)
 }
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
 public func index(_ i: Index?,  offsetBy n: IndexDistance) -> Index {
   return index(i!, offsetBy: n)
 }
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional indices")
 public func distance(from i: Index?, to j: Index?) -> IndexDistance {
   return distance(from: i!, to: j!)
 }
 @available(
   swift, deprecated: 3.2, obsoleted: 4.0,
   message: "Any String view index conversion can fail in Swift 4; please unwrap the optional index")
 public subscript(i: Index?) -> Unicode.Scalar {
   return self[i!]
 }
}

- **Q**: Will existing correct Swift 3 applications stop compiling due
to this change?

**A**: it is possible but unlikely. The existing index conversion
APIs are relatively rarely used, and the overloads listed above
handle the common cases in Swift 3 compatibility mode.

- **Q**: Will applications still compile but produce
different behavior than they used to?

**A**: No.

- **Q**: Is it possible to automatically migrate from the old syntax
to the new syntax?

**A**: Yes, although usages of these APIs may be rare enough that it
isn't worth the trouble.

- **Q**: Can Swift applications be written in a common subset that works
  both with Swift 3 and Swift 4 to aid in migration?

**A**: Yes, the Swift 4 APIs will all be available in Swift 3 mode.

## Effect on ABI stability

This proposal changes the ABI of the standard library.

## Effect on API resilience

This proposal makes no changes to the resilience of any APIs.

## Alternatives considered

The only alternative considered was no action.

--
-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Brent Royal-Gordon) #5

Do you intend to allow users to serialize an `encodedOffset` and deserialize it later, perhaps in a later version of Swift, to represent the same position? If so, I'm not sure how you intend to maintain compatibility once the "currently only UTF-16" notation is no longer true.

···

On May 27, 2017, at 10:40 AM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:

A property and an intializer will be added to `String.Index`, exposing
the offset of the index in code units (currently only UTF-16) from the
beginning of the string:

let n: Int = html.endIndex.encodedOffset
let end = String.Index(encodedOffset: n)
assert(end == String.endIndex)

--
Brent Royal-Gordon
Architechies


(Ben Rimmington) #6

## Introduction

Today `String` shares an `Index` type with its `CharacterView` but not
with its `UTF8View`, `UTF16View`, or `UnicodeScalarView`. This
proposal redefines `String.UTF8View.Index`, `String.UTF16View.Index`,
and `String.CharacterView.Index` as typealiases for `String.Index`,
and exposes a public `encodedOffset` property and initializer that can
be used to serialize and deserialize positions in a `String` or
`Substring`.

If `encodedOffset` is only needed for serialization of indices,
could `String.Index` conform to `Codable` from SE-0166 instead?

## Proposed solution

All `String` views will use a single index type (`String.Index`), so
that positions can be interchanged without awkward explicit
conversions:

```swift
let html: String = "See <a href=\"http://swift.org&quot;>swift.org</a>"

// Search the UTF16, instead of characters, for performance reasons:
let open = "<".utf16.first!, close = ">".utf16.first!
let tagStart = s.utf16.index(of: open)
let tagEnd = s.utf16[tagStart...].index(of: close)

I think `s` should be `html` in the previous two lines.

-- Ben

···

On 27 May 2017, at 18:40, Dave Abrahams wrote:


(Dave Abrahams) #7

An index that does not fall on an exact boundary in a given `String`
or `Substring` view will be “rounded down” to the nearest boundary
when used for slicing or range replacement. So, for example,

What about normal subscript? I.e. what would the following print?

print(s[s.unicodeScalars.indices.dropFirst().first!]) // “é”, or just
the combining scalar?

I am proposing that it would be “é”

Would unifying under the same type require that indices be less
stateful than they currently are?

No; it's just a matter of unifying the states (in an enum). You can
look at the implementation in https://github.com/apple/swift/pull/9806
for details.

···

on Tue May 30 2017, Michael Ilseman <milseman-AT-apple.com> wrote:

On May 27, 2017, at 10:40 AM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:

let s = "e\u{301}galite\u{301}"                          // "égalité"
print(s[s.unicodeScalars.indices.dropFirst().first!...]) // “égalité"
print(s[..<s.unicodeScalars.indices.last!])              // "égalit"

--
-Dave


(Dave Abrahams) #8

My knee-jerk reaction is to say it's too late in Swift 4 for this kind
of change, but with that out of the way, I'm most concerned about what
it means to have, say, a UTF-8 index that's not on a UTF-16 boundary.

let str = "言"
let oneUnitIn = str.utf8.index(after: str.utf8.startIndex)
let trailingBytes = str.utf8[oneUnitIn...]

This is not new; it exists today.

What can I do with 'oneUnitIn'?

All the usual stuff; we're not proposing to change what you can do with
it.

How do I test to see if it's on a Character boundary or a
UnicodeScalar boundary?

as noted,

  Replacing the failable APIs listed [above](#motivation) that detect
  whether an index represents a valid position in a given view, and
  enhancement that explicitly round index positions to nearby boundaries
  in a given view, are left to a later proposal. For now, we do not
  propose to remove the existing index conversion APIs.

That means you can use oneUnitIn.samePosition(in: str) or
oneUnitIn.samePosition(in: str.unicodeScalars) to find out if it's on ta
character or unicode scalar boundary.

···

on Tue May 30 2017, Jordan Rose <jordan_rose-AT-apple.com> wrote:

--
-Dave


(Dave Abrahams) #9

Strings will offer access to their underlying code units, and can be
serialized and deserialized however they are represented. A String
stored as UTF-8 will be serialized and deserialized as UTF-8, so
encodedOffsets will maintain their meanings.

Cheers,

···

on Fri Jun 02 2017, Brent Royal-Gordon <swift-evolution@swift.org> wrote:

On May 27, 2017, at 10:40 AM, Dave Abrahams via swift-evolution > <swift-evolution@swift.org> wrote:

A property and an intializer will be added to `String.Index`, exposing
the offset of the index in code units (currently only UTF-16) from the
beginning of the string:

let n: Int = html.endIndex.encodedOffset
let end = String.Index(encodedOffset: n)
assert(end == String.endIndex)

Do you intend to allow users to serialize an `encodedOffset` and
deserialize it later, perhaps in a later version of Swift, to
represent the same position? If so, I'm not sure how you intend to
maintain compatibility once the "currently only UTF-16" notation is no
longer true.

--
-Dave


(Dave Abrahams) #10

## Introduction

Today `String` shares an `Index` type with its `CharacterView` but not
with its `UTF8View`, `UTF16View`, or `UnicodeScalarView`. This
proposal redefines `String.UTF8View.Index`, `String.UTF16View.Index`,
and `String.CharacterView.Index` as typealiases for `String.Index`,
and exposes a public `encodedOffset` property and initializer that can
be used to serialize and deserialize positions in a `String` or
`Substring`.

If `encodedOffset` is only needed for serialization of indices,

It's not only needed for that. Once StringProtocol provides access to
its underlying code units, the encodedOffset will be useful information
in other ways.

could `String.Index` conform to `Codable` from SE-0166 instead?

Not instead, definitely. Additionally, maybe; that's a separate
proposal.

## Proposed solution

All `String` views will use a single index type (`String.Index`), so
that positions can be interchanged without awkward explicit
conversions:

```swift
let html: String = "See <a href=\"http://swift.org&quot;>swift.org</a>"

// Search the UTF16, instead of characters, for performance reasons:
let open = "<".utf16.first!, close = ">".utf16.first!
let tagStart = s.utf16.index(of: open)
let tagEnd = s.utf16[tagStart...].index(of: close)

I think `s` should be `html` in the previous two lines.

Ah, yeah, thanks; good catch!

···

on Sat Jun 03 2017, Ben Rimmington <me-AT-benrimmington.com> wrote:

On 27 May 2017, at 18:40, Dave Abrahams wrote:

--
-Dave


(Dave Abrahams) #11

Philippe Hausler via swift-evolution

I would presume that the index type will still be shared between String and SubString,

Yes; that's a requirement of Collection conformance.

will this mean that we will now be able to express index manipulation in StringProtocol?

Depends what you mean by that I guess :wink:

I find StringProtocol a bit hard to deal with when attempting to make range conversions;

StringProtocol is pretty far from being in its indented final form, if
that's any consolation…

it would be really nice if we could make this possible (or perhaps more
intuitive... since, for the life of me I can’t figure out a way to
generically convert indexes for StringProtocol adoption)

So lets say you have a function as such:

func foo<S: StringProtocol>(_ str: S, range: Range<S.Index>) {
    range.lowerBound.samePosition(in: str.utf16)
}

results in the error error: value of type 'S.Index' has no member ‘samePosition’

This of course is an intended target of something that deals with strings
and wants to deal with both strings and substrings uniformly since it is
reasonable to pass either.

In short: are StringProtocol accessors a consideration for conversion in this change?

I'm sorry but I don't understand the question. If you're asking whether
this particular proposal is designed to address index translation on models
of StringProtocol, the answer is no. It will appear to do so for the moment
but only because StringProtocol is currently over-constrained.

The intention is that eventually StringProtocol does not constrain its
views to all have the same index type, but does constrain them to
Unicode.ViewIndex, which will require the encodedOffset initializer and
property. Also the index type of a StringProtocol's SubSequence's XXXView
will be constrained to be the same as that of the StringProtocol's own
XXXView.

Hope this helps,
Dave

···

<swift-evolution@swift.org> wrote:


(Jordan Rose) #12

My knee-jerk reaction is to say it's too late in Swift 4 for this kind
of change, but with that out of the way, I'm most concerned about what
it means to have, say, a UTF-8 index that's not on a UTF-16 boundary.

let str = "言"
let oneUnitIn = str.utf8.index(after: str.utf8.startIndex)
let trailingBytes = str.utf8[oneUnitIn...]

This is not new; it exists today.

Yes, I think that’s valuable. What’s different is that it’s not a String.Index.

What can I do with 'oneUnitIn'?

All the usual stuff; we're not proposing to change what you can do with
it.

By changing the type, you have increased the scope of where an index can be used. What happens when I use it in one of the other views and it’s not on a boundary?

(I suspect the answer is “it traps” but the proposal should spell that out explicitly.)

How do I test to see if it's on a Character boundary or a
UnicodeScalar boundary?

as noted,

Replacing the failable APIs listed [above](#motivation) that detect
whether an index represents a valid position in a given view, and
enhancement that explicitly round index positions to nearby boundaries
in a given view, are left to a later proposal. For now, we do not
propose to remove the existing index conversion APIs.

That means you can use oneUnitIn.samePosition(in: str) or
oneUnitIn.samePosition(in: str.unicodeScalars) to find out if it's on ta
character or unicode scalar boundary.

I’m sorry, I completely missed that. This part of the question is withdrawn.

I’m also concerned about putting “UTF-16” in the documentation for encodedOffset. Either it’s a ‘utf16Offset’ or it isn’t; if it’s an opaque value then it should be treated as such. (It’s also a little disturbing that round-tripping through encodedOffset isn’t guaranteed to give you the same index back.)

Jordan

···

On May 30, 2017, at 14:53, Dave Abrahams <dabrahams@apple.com> wrote:
on Tue May 30 2017, Jordan Rose <jordan_rose-AT-apple.com> wrote:


(Dave Abrahams) #13

My knee-jerk reaction is to say it's too late in Swift 4 for this kind
of change, but with that out of the way, I'm most concerned about what
it means to have, say, a UTF-8 index that's not on a UTF-16 boundary.

let str = "言"
let oneUnitIn = str.utf8.index(after: str.utf8.startIndex)
let trailingBytes = str.utf8[oneUnitIn...]

This is not new; it exists today.

Yes, I think that’s valuable. What’s different is that it’s not a String.Index.

What can I do with 'oneUnitIn'?

All the usual stuff; we're not proposing to change what you can do with
it.

By changing the type, you have increased the scope of where an index
can be used. What happens when I use it in one of the other views and
it’s not on a boundary?

(I suspect the answer is “it traps” but the proposal should spell that
out explicitly.)

Sorry, I mistakenly limited the “rounding down” behavior to slicing and
range replacement. The index would be rounded down to the previous
boundary, and then used as ever.

How do I test to see if it's on a Character boundary or a
UnicodeScalar boundary?

as noted,

Replacing the failable APIs listed [above](#motivation) that detect
whether an index represents a valid position in a given view, and
enhancement that explicitly round index positions to nearby boundaries
in a given view, are left to a later proposal. For now, we do not
propose to remove the existing index conversion APIs.

That means you can use oneUnitIn.samePosition(in: str) or
oneUnitIn.samePosition(in: str.unicodeScalars) to find out if it's on ta
character or unicode scalar boundary.

I’m sorry, I completely missed that. This part of the question is withdrawn.

I’m also concerned about putting “UTF-16” in the documentation for
encodedOffset. Either it’s a ‘utf16Offset’ or it isn’t

It is today; hopefully it won't be someday

; if it’s an opaque value then it should be treated as such.

Today a String has underlying UTF-16-compatible storage and that's
documented as such, but we intend to lift that restriction and don't
want the names to lock us into semantics.

(It’s also a little disturbing that round-tripping through
encodedOffset isn’t guaranteed to give you the same index back.)

Define “same.”

The encodedOffset is not the full value of an *arbitrary* index, and
doesn't claim to be. The indices that can be serialized and
reconstructed exactly using encodedOffset are those that fall on code
unit boundaries. Today, that means everything but UTF-8 indices. We
could consider exposing the transcodedOffset (offset within the UTF8
encoding of the scalar) as well, but I want to be conservative.

···

on Tue May 30 2017, Jordan Rose <jordan_rose-AT-apple.com> wrote:

On May 30, 2017, at 14:53, Dave Abrahams <dabrahams@apple.com> wrote:
on Tue May 30 2017, Jordan Rose <jordan_rose-AT-apple.com> wrote:

--
-Dave


(Jordan Rose) #14

My knee-jerk reaction is to say it's too late in Swift 4 for this kind
of change, but with that out of the way, I'm most concerned about what
it means to have, say, a UTF-8 index that's not on a UTF-16 boundary.

let str = "言"
let oneUnitIn = str.utf8.index(after: str.utf8.startIndex)
let trailingBytes = str.utf8[oneUnitIn...]

This is not new; it exists today.

Yes, I think that’s valuable. What’s different is that it’s not a String.Index.

What can I do with 'oneUnitIn'?

All the usual stuff; we're not proposing to change what you can do with
it.

By changing the type, you have increased the scope of where an index
can be used. What happens when I use it in one of the other views and
it’s not on a boundary?

(I suspect the answer is “it traps” but the proposal should spell that
out explicitly.)

Sorry, I mistakenly limited the “rounding down” behavior to slicing and
range replacement. The index would be rounded down to the previous
boundary, and then used as ever.

Makes sense!

How do I test to see if it's on a Character boundary or a
UnicodeScalar boundary?

as noted,

Replacing the failable APIs listed [above](#motivation) that detect
whether an index represents a valid position in a given view, and
enhancement that explicitly round index positions to nearby boundaries
in a given view, are left to a later proposal. For now, we do not
propose to remove the existing index conversion APIs.

That means you can use oneUnitIn.samePosition(in: str) or
oneUnitIn.samePosition(in: str.unicodeScalars) to find out if it's on ta
character or unicode scalar boundary.

I’m sorry, I completely missed that. This part of the question is withdrawn.

I’m also concerned about putting “UTF-16” in the documentation for
encodedOffset. Either it’s a ‘utf16Offset’ or it isn’t

It is today; hopefully it won't be someday

; if it’s an opaque value then it should be treated as such.

Today a String has underlying UTF-16-compatible storage and that's
documented as such, but we intend to lift that restriction and don't
want the names to lock us into semantics.

I don’t think you should promise that about new APIs, then, or someone will start relying on it.

(It’s also a little disturbing that round-tripping through
encodedOffset isn’t guaranteed to give you the same index back.)

Define “same.”

The encodedOffset is not the full value of an *arbitrary* index, and
doesn't claim to be. The indices that can be serialized and
reconstructed exactly using encodedOffset are those that fall on code
unit boundaries. Today, that means everything but UTF-8 indices. We
could consider exposing the transcodedOffset (offset within the UTF8
encoding of the scalar) as well, but I want to be conservative.

I’m not sure it’s clear from the name “encodedOffset” that this is a lossy conversion. I’d say it should be an optional property, but that’s probably too annoying in the invalid case. Maybe it should trap.

Jordan

···

On May 30, 2017, at 16:13, Dave Abrahams <dabrahams@apple.com> wrote:
on Tue May 30 2017, Jordan Rose <jordan_rose-AT-apple.com <http://at-apple.com/>> wrote:

On May 30, 2017, at 14:53, Dave Abrahams <dabrahams@apple.com> wrote:
on Tue May 30 2017, Jordan Rose <jordan_rose-AT-apple.com> wrote:


(Dave Abrahams) #15

My knee-jerk reaction is to say it's too late in Swift 4 for this kind
of change, but with that out of the way, I'm most concerned about what
it means to have, say, a UTF-8 index that's not on a UTF-16 boundary.

let str = "言"
let oneUnitIn = str.utf8.index(after: str.utf8.startIndex)
let trailingBytes = str.utf8[oneUnitIn...]

This is not new; it exists today.

Yes, I think that’s valuable. What’s different is that it’s not a String.Index.

What can I do with 'oneUnitIn'?

All the usual stuff; we're not proposing to change what you can do with
it.

By changing the type, you have increased the scope of where an index
can be used. What happens when I use it in one of the other views and
it’s not on a boundary?

(I suspect the answer is “it traps” but the proposal should spell that
out explicitly.)

Sorry, I mistakenly limited the “rounding down” behavior to slicing and
range replacement. The index would be rounded down to the previous
boundary, and then used as ever.

Makes sense!

How do I test to see if it's on a Character boundary or a
UnicodeScalar boundary?

as noted,

Replacing the failable APIs listed [above](#motivation) that detect
whether an index represents a valid position in a given view, and
enhancement that explicitly round index positions to nearby boundaries
in a given view, are left to a later proposal. For now, we do not
propose to remove the existing index conversion APIs.

That means you can use oneUnitIn.samePosition(in: str) or
oneUnitIn.samePosition(in: str.unicodeScalars) to find out if it's on ta
character or unicode scalar boundary.

I’m sorry, I completely missed that. This part of the question is withdrawn.

I’m also concerned about putting “UTF-16” in the documentation for
encodedOffset. Either it’s a ‘utf16Offset’ or it isn’t

It is today; hopefully it won't be someday

; if it’s an opaque value then it should be treated as such.

Today a String has underlying UTF-16-compatible storage and that's
documented as such, but we intend to lift that restriction and don't
want the names to lock us into semantics.

I don’t think you should promise that about new APIs, then, or someone
will start relying on it.

Okay, we could leave it out of this doc comment. But as long as
something documents that Strings are stored as UTF-16 (e.g. we say you
get random-access performance for the utf16 view when Foundation is
loaded), the implication is there.

(It’s also a little disturbing that round-tripping through
encodedOffset isn’t guaranteed to give you the same index back.)

Define “same.”

The encodedOffset is not the full value of an *arbitrary* index, and
doesn't claim to be. The indices that can be serialized and
reconstructed exactly using encodedOffset are those that fall on code
unit boundaries. Today, that means everything but UTF-8 indices. We
could consider exposing the transcodedOffset (offset within the UTF8
encoding of the scalar) as well, but I want to be conservative.

I’m not sure it’s clear from the name “encodedOffset” that this is a
lossy conversion.

It's not a conversion :slight_smile:

I’d say it should be an optional property, but that’s probably too
annoying in the invalid case. Maybe it should trap.

I really don't think so; IMO that would be inconsistent with the
“rounding down” behavior proposed. I think either all misaligned
accesses should trap or they should do something lenient. I proposed
lenience, but trapping is still an option.

···

on Tue May 30 2017, Jordan Rose <swift-evolution@swift.org> wrote:

On May 30, 2017, at 16:13, Dave Abrahams <dabrahams@apple.com> > wrote:
on Tue May 30 2017, Jordan Rose <jordan_rose-AT-apple.com <http://at-apple.com/>> wrote:

On May 30, 2017, at 14:53, Dave Abrahams <dabrahams@apple.com> wrote:
on Tue May 30 2017, Jordan Rose <jordan_rose-AT-apple.com> wrote:

--
-Dave