Omitting Returns in String: Case Study of SE-0255

(Michael Ilseman) #1

Omitting Returns in String: Case Study of SE-0255

I’m a fan of @nate_chandler’s SE-0255 which allows us to omit the return keyword in a single-expression function or computed variables. Here is a case study of SE-0255 as applied to String’s internal implementation.

This is only to String and related Unicode functionality; other areas such as SIMDVector.swift (@scanon) could similarly benefit.

Omission Wins

SE-0255 allows us to remove 393 single-expression returns. Here is the PR. Below are a few common themes in return omission.

Forwarding declarations

-  internal var count: Int { return _object.count }
+  internal var count: Int { _object.count }

-  internal var isSmall: Bool { return _object.isSmall }
+  internal var isSmall: Bool { _object.isSmall }

-  internal var isASCII: Bool  {
-    return _object.isASCII
-  }
+  internal var isASCII: Bool  { _object.isASCII }

-  internal var isNFC: Bool { return _object.isNFC }
+  internal var isNFC: Bool { _object.isNFC }

   public var isBidiControl: Bool {
-    return _hasBinaryProperty(__swift_stdlib_UCHAR_BIDI_CONTROL)
+    _hasBinaryProperty(__swift_stdlib_UCHAR_BIDI_CONTROL)
   }

Collection boilerplate

-  public var startIndex: Int {
-    return 0
-  }
+  public var startIndex: Int { 0 }
 
-  public var endIndex: Int {
-    return 0 + UTF8.width(value)
-  }
+  public var endIndex: Int { 0 + UTF8.width(value) }

-  public __consuming func makeIterator() -> Iterator {
-    return Iterator(_guts)
-  }
+  public __consuming func makeIterator() -> Iterator { Iterator(_guts) }

Standard boilerplate for conformances

-  public var customMirror: Mirror {
-    return Mirror(reflecting: description)
-  }
+  public var customMirror: Mirror { Mirror(reflecting: description) }

-  public var description: String { return String(self) }
+  public var description: String { String(self) }

-  public var debugDescription: String { return String(self).debugDescription }
+  public var debugDescription: String { String(self).debugDescription }


Constant-wrapping vars and simple bit-level abstractions


-  internal static var countMask: UInt64 { return 0x0000_FFFF_FFFF_FFFF }
+  internal static var countMask: UInt64 { 0x0000_FFFF_FFFF_FFFF }
 
-  internal static var flagsMask: UInt64 { return ~countMask }
+  internal static var flagsMask: UInt64 { ~countMask }
 
-  internal static var isASCIIMask: UInt64 { return 0x8000_0000_0000_0000 }
+  internal static var isASCIIMask: UInt64 { 0x8000_0000_0000_0000 }
 
-  internal static var isNFCMask: UInt64 { return 0x4000_0000_0000_0000 }
+  internal static var isNFCMask: UInt64 { 0x4000_0000_0000_0000 }
 
   internal static func small(isASCII: Bool) -> UInt64 {
-    return isASCII ? 0xE000_0000_0000_0000 : 0xA000_0000_0000_0000
+    isASCII ? 0xE000_0000_0000_0000 : 0xA000_0000_0000_0000
   }

   internal var isImmortal: Bool {
-    return (discriminatedObjectRawBits & 0x8000_0000_0000_0000) != 0
+    (discriminatedObjectRawBits & 0x8000_0000_0000_0000) != 0
   }

    public static func isLeadSurrogate(_ x: CodeUnit) -> Bool {
-    return (x & 0xFC00) == 0xD800
+    (x & 0xFC00) == 0xD800
   }

Operators

   public static func != <RHS: StringProtocol>(lhs: Self, rhs: RHS) -> Bool {
-    return !(lhs == rhs)
+    !(lhs == rhs)
   }

   public static func > <RHS: StringProtocol>(lhs: Self, rhs: RHS) -> Bool {
-    return rhs < lhs
+    rhs < lhs
   }

   public static func <= <RHS: StringProtocol>(lhs: Self, rhs: RHS) -> Bool {
-    return !(rhs < lhs)
+    !(rhs < lhs)
   }

   public static func >= <RHS: StringProtocol>(lhs: Self, rhs: RHS) -> Bool {
-    return !(lhs < rhs)
+    !(lhs < rhs)

Cannot Omit

These are areas where one could argue that the return could be omitted, but the current language does not have the mechanisms to allow this without a loss of clarity. Below are 111 declarations grouped by theme. I’m not arguing that these must be solved nor for any specific mechanism to solve them. The purpose of this is to provide empirical data to feature authors, irrespective of the other merits and drawbacks of their particular feature.

Precondition/Postcondition/Assertions

The standard library has 3 levels of invariant checking, mentioned in the programmer’s manual.

_precondition is checked in all build configurations of client code. For example, it is used to enforce important invariants at runtime such as memory safety.

_debugPrecondition is checked only in testing/debug builds of client code. For example, it can be used to check memory safety of “unsafe” APIs.

_internalInvariant is only checked in testing builds of the library itself, and never when run by client code with a shipping toolchain. These are commonly referred to as internal “assertions”.

Invariant checking can block return omission as they require a separate statement.

A potential mechanism is to support pre/post conditions on declarations. Preconditions would be checked receiving the parameters of the function, and postconditions would be checked additionally receiving the return value of the function. This feature should be explored regardless of how it enables return omission (a minor difference compared to the feature itself), but the below can help provide some fodder for a proposer.

Preconditions
  public subscript(position: Int) -> UTF8.CodeUnit {
    _precondition(position >= startIndex && position < endIndex,
      "Unicode.Scalar.UTF8View index is out of bounds")
    return value.withUTF8CodeUnits { $0[position] }
  }

  public static func leadSurrogate(_ x: Unicode.Scalar) -> UTF16.CodeUnit {
    _precondition(width(x) == 2)
    return 0xD800 + UTF16.CodeUnit(truncatingIfNeeded:
      (x.value - 0x1_0000) &>> (10 as UInt32))
  }

  public static func trailSurrogate(_ x: Unicode.Scalar) -> UTF16.CodeUnit {
    _precondition(width(x) == 2)
    return 0xDC00 + UTF16.CodeUnit(truncatingIfNeeded:
      (x.value - 0x1_0000) & (((1 as UInt32) &<< 10) - 1))
  }

  public func index(after i: Index) -> Index {
    _debugPrecondition(i._biasedBits != 0)
    return Index(_biasedBits: i._biasedBits >> 8)
  }

  public func index(after i: Index) -> Index {
    _precondition(i < endIndex, "Cannot increment beyond endIndex")
    _precondition(i >= startIndex, "Cannot increment an invalid index")
    return _slice.index(after: i)
  }

  public func index(before i: Index) -> Index {
    _precondition(i <= endIndex, "Cannot decrement an invalid index")
    _precondition(i > startIndex, "Cannot decrement beyond startIndex")
    return _slice.index(before: i)
  }

  public subscript(r: Range<Index>) -> Substring.UTF8View {
    _precondition(r.lowerBound >= startIndex && r.upperBound <= endIndex,
      "UTF8View index range out of bounds")
    return Substring.UTF8View(_slice.base, _bounds: r)
  }

  public subscript(r: Range<Index>) -> Substring {
    _boundsCheck(r)
    return Substring(Slice(base: self, bounds: r))
  }

  public var utf8Start: UnsafePointer<UInt8> {
    _precondition(
      hasPointerRepresentation,
      "StaticString should have pointer representation")
    return UnsafePointer(bitPattern: UInt(_startPtrOrData))!
  }

  public var unicodeScalar: Unicode.Scalar {
    _precondition(
      !hasPointerRepresentation,
      "StaticString should have Unicode scalar representation")
    return Unicode.Scalar(UInt32(UInt(_startPtrOrData)))!
  }
  
  public var utf8CodeUnitCount: Int {
    _precondition(
      hasPointerRepresentation,
      "StaticString should have pointer representation")
    return Int(_utf8CodeUnitCount)
  }
Assertions

String’s implementation adopts LLVM’s doctrine of assert liberally, using internal assertions as an engineering tool for writing low-level fiddly pieces. These pieces have assumptions about the way they are called or the kinds of values they are passed, but they cannot pay the performance cost of checking that at runtime. Assertions let us catch bugs earlier and catch subtle bugs that don’t manifest in behavior differences until some other future change or unanticipated use case happens. With a rich suite of assertions in place, refactoring and other changes are less scary.

  internal func _decodeUTF8(_ x: UInt8) -> Unicode.Scalar {
    _internalInvariant(UTF8.isASCII(x))
    return Unicode.Scalar(_unchecked: UInt32(x))
  }

  internal func _foreignIndex(_ i: Index, offsetBy n: Int) -> Index {
    _internalInvariant(_guts.isForeign)
    return _index(i, offsetBy: n)
  }

  internal func _foreignIndex(
    _ i: Index, offsetBy n: Int, limitedBy limit: Index
  ) -> Index? {
    _internalInvariant(_guts.isForeign)
    return _index(i, offsetBy: n, limitedBy: limit)
  }

  internal func _foreignDistance(from i: Index, to j: Index) -> Int {
    _internalInvariant(_guts.isForeign)
    return _distance(from: i, to: j)
  }
  internal func _foreignCount() -> Int {
    _internalInvariant(_guts.isForeign)
    return _distance(from: startIndex, to: endIndex)
  }

  internal func _foreignIndex(after i: Index) -> Index {
    _internalInvariant(_guts.isForeign)
    return i.nextEncoded
  }

  internal func _foreignIndex(before i: Index) -> Index {
    _internalInvariant(_guts.isForeign)
    return i.priorEncoded
  }

  internal func _foreignSubscript(position i: Index) -> UTF16.CodeUnit {
    _internalInvariant(_guts.isForeign)
    return _guts.foreignErrorCorrectedUTF16CodeUnit(at: i)
  }

  internal func _foreignDistance(from start: Index, to end: Index) -> Int {
    _internalInvariant(_guts.isForeign)
    return end._encodedOffset - start._encodedOffset
  }

  internal func _foreignIndex(_ i: Index, offsetBy n: Int) -> Index {
    _internalInvariant(_guts.isForeign)
    return i.encoded(offsetBy: n)
  }

  internal func _foreignCount() -> Int {
    _internalInvariant(_guts.isForeign)
    return endIndex._encodedOffset - startIndex._encodedOffset
  }

  internal func _foreignIsWithin(_ target: String.UTF16View) -> Bool {
    _internalInvariant(target._guts.isForeign)

    // If we're transcoding, we're a UTF-8 view index, not UTF-16.
    return self.transcodedOffset == 0
  }

  internal static func _decodeSurrogates(
    _ lead: CodeUnit,
    _ trail: CodeUnit
  ) -> Unicode.Scalar {
    _internalInvariant(isLeadSurrogate(lead))
    _internalInvariant(isTrailSurrogate(trail))
    return Unicode.Scalar(
      _unchecked: 0x10000 +
        (UInt32(lead & 0x03ff) &<< 10 | UInt32(trail & 0x03ff)))
  }

  internal var nextEncoded: String.Index {
    _internalInvariant(self.transcodedOffset == 0)
    return String.Index(_encodedOffset: self._encodedOffset &+ 1)
  }

  internal var priorEncoded: String.Index {
    _internalInvariant(self.transcodedOffset == 0)
    return String.Index(_encodedOffset: self._encodedOffset &- 1)
  }

  internal func transcoded(withOffset n: Int) -> String.Index {
    _internalInvariant(self.transcodedOffset == 0)
    return String.Index(encodedOffset: self._encodedOffset, transcodedOffset: n)
  }

  internal var _countAndFlags: CountAndFlags {
    _internalInvariant(!isSmall)
    return CountAndFlags(rawUnchecked: _countAndFlagsBits)
  }

  internal static func small(withCount count: Int, isASCII: Bool) -> UInt64 {
    _internalInvariant(count <= _SmallString.capacity)
    return small(isASCII: isASCII) | UInt64(truncatingIfNeeded: count) &<< 56
  }

  internal var largeFastIsTailAllocated: Bool {
    _internalInvariant(isLarge && providesFastUTF8)
    return _countAndFlags.isTailAllocated
  }

  internal var largeIsCocoa: Bool {
    _internalInvariant(isLarge)
    return (discriminatedObjectRawBits & 0x4000_0000_0000_0000) != 0
  }

  internal var smallCount: Int {
    _internalInvariant(isSmall)
    return _StringObject.getSmallCount(fromRaw: discriminatedObjectRawBits)
  }

  internal var smallIsASCII: Bool {
    _internalInvariant(isSmall)
    return _StringObject.getSmallIsASCII(fromRaw: discriminatedObjectRawBits)
  }

  internal var largeCount: Int {
    _internalInvariant(isLarge)
    return _countAndFlags.count
  }

  internal var largeAddressBits: UInt {
    _internalInvariant(isLarge)
    return UInt(truncatingIfNeeded:
      discriminatedObjectRawBits & Nibbles.largeAddressMask)
  }

  internal var nativeUTF8Start: UnsafePointer<UInt8> {
    _internalInvariant(largeFastIsTailAllocated)
    return UnsafePointer(
      bitPattern: largeAddressBits &+ _StringObject.nativeBias
    )._unsafelyUnwrappedUnchecked
  }

  internal var nativeUTF8: UnsafeBufferPointer<UInt8> {
    _internalInvariant(largeFastIsTailAllocated)
    return UnsafeBufferPointer(start: nativeUTF8Start, count: largeCount)
  }

  internal var objCBridgeableObject: AnyObject {
    _internalInvariant(hasObjCBridgeableObject)
    return Builtin.reinterpretCast(largeAddressBits)
  }

  static func _toUTF16CodeUnit(_ x: UTF8.CodeUnit) -> UTF16.CodeUnit {
    _internalInvariant(x <= 0x7f, "should only be doing this with ASCII")
    return UTF16.CodeUnit(truncatingIfNeeded: x)
  }

  static func _fromUTF16CodeUnit(
    _ utf16: UTF16.CodeUnit
  ) -> UTF8.CodeUnit {
    _internalInvariant(utf16 <= 0x7f, "should only be doing this with ASCII")
    return UTF8.CodeUnit(truncatingIfNeeded: utf16)
  }

Subexpression refactoring

Refactoring out subexpressions into their own declaration can aid code readability, debug-ability, maintainability, and presents the opportunity to provide a useful name. While it does prohibit return elision, the benefits far out-weight this.

Below are instances where I found an argument could be made for return elision with some future mechanism, though I have nothing specific in mind. These may serve as fodder for expression simplification and API enhancements unrelated to return elision.

Subexpressions
  public var generalCategory: Unicode.GeneralCategory {
    let rawValue = __swift_stdlib_UCharCategory(
      __swift_stdlib_UCharCategory.RawValue(
      __swift_stdlib_u_getIntPropertyValue(
        icuValue, __swift_stdlib_UCHAR_GENERAL_CATEGORY)))
    return Unicode.GeneralCategory(rawValue: rawValue)
  }

  public var canonicalCombiningClass: Unicode.CanonicalCombiningClass {
    let rawValue = UInt8(__swift_stdlib_u_getIntPropertyValue(
      icuValue, __swift_stdlib_UCHAR_CANONICAL_COMBINING_CLASS))
    return Unicode.CanonicalCombiningClass(rawValue: rawValue)
  }

  public var numericType: Unicode.NumericType? {
    let rawValue = __swift_stdlib_UNumericType(
      __swift_stdlib_UNumericType.RawValue(
      __swift_stdlib_u_getIntPropertyValue(
        icuValue, __swift_stdlib_UCHAR_NUMERIC_TYPE)))
    return Unicode.NumericType(rawValue: rawValue)
  }

  public var numericValue: Double? {
    let icuNoNumericValue: Double = -123456789
    let result = __swift_stdlib_u_getNumericValue(icuValue)
    return result != icuNoNumericValue ? result : nil
  }

  public func _bufferedScalar(bitCount: UInt8) -> Encoding.EncodedScalar {
    let x = UInt32(_buffer._storage) &+ 0x01010101
    return _ValidUTF8Buffer(_biasedBits: x & ._lowBits(bitCount))
  }

  internal func withFastCChar<R>(
    _ f: (UnsafeBufferPointer<CChar>) throws -> R
  ) rethrows -> R {
    try self.withFastUTF8 { utf8 in
      let ptr = utf8.baseAddress._unsafelyUnwrappedUnchecked._asCChar
      return try f(UnsafeBufferPointer(start: ptr, count: utf8.count))
    }
  }

  internal func _slowWithCString<Result>(
    _ body: (UnsafePointer<Int8>) throws -> Result
  ) rethrows -> Result {
    _internalInvariant(!_object.isFastZeroTerminated)
    try String(self).utf8CString.withUnsafeBufferPointer {
      let ptr = $0.baseAddress._unsafelyUnwrappedUnchecked
      return try body(ptr)
    }
  }

  internal var characterStride: Int? {
    let value = (_rawBits & 0x3F00) &>> 8
    return value > 0 ? Int(truncatingIfNeeded: value) : nil
  }

  internal var _countAndFlagsBits: UInt64 {
    let rawBits = UInt64(truncatingIfNeeded: _flags) &<< 48
                | UInt64(truncatingIfNeeded: _count)
    return rawBits
  }

  public func index(before i: Index) -> Index {
    let offset = _ValidUTF8Buffer(_biasedBits: i._biasedBits).count
    _debugPrecondition(offset != 0)
    return Index(_biasedBits: _biasedBits &>> (offset &<< 3 - 8))
  }

  internal var zeroTerminatedRawCodeUnits: RawBitPattern {
    let smallStringCodeUnitMask = ~UInt64(0xFF).bigEndian // zero last byte
    return (self._storage.0, self._storage.1 & smallStringCodeUnitMask)
  }

  internal func computeIsASCII() -> Bool {
    let asciiMask: UInt64 = 0x8080_8080_8080_8080
    let raw = zeroTerminatedRawCodeUnits
    return (raw.0 | raw.1) & asciiMask == 0
  }

  internal subscript(_ bounds: Range<Index>) -> SubSequence {
    self.withUTF8 { utf8 in
      let rebased = UnsafeBufferPointer(rebasing: utf8[bounds])
      return _SmallString(rebased)._unsafelyUnwrappedUnchecked
    }
  }

  internal func withUTF8<Result>(
    _ f: (UnsafeBufferPointer<UInt8>) throws -> Result
  ) rethrows -> Result {
    var raw = self.zeroTerminatedRawCodeUnits
    return try Swift.withUnsafeBytes(of: &raw) { rawBufPtr in
      let ptr = rawBufPtr.baseAddress._unsafelyUnwrappedUnchecked
        .assumingMemoryBound(to: UInt8.self)
      return try f(UnsafeBufferPointer(start: ptr, count: self.count))
    }
  }

  internal func _stdlib_binary_CFStringCreateCopy(
    _ source: _CocoaString
  ) -> _CocoaString {
    let result = _swift_stdlib_CFStringCreateCopy(nil, source) as AnyObject
    return result
  }

  internal func _cocoaStringSubscript(
    _ target: _CocoaString, _ position: Int
  ) -> UTF16.CodeUnit {
    let cfSelf: _swift_shims_CFStringRef = target
    return _swift_stdlib_CFStringGetCharacterAtIndex(cfSelf, position)
  }

  internal func _cocoaStringCompare(
    _ string: _CocoaString, _ other: _CocoaString
  ) -> Int {
    let cfSelf: _swift_shims_CFStringRef = string
    let cfOther: _swift_shims_CFStringRef = other
    return _swift_stdlib_CFStringCompare(cfSelf, cfOther)
  }

  func _toUTF16Indices(_ range: Range<Int>) -> Range<Index> {
    let lowerbound = _toUTF16Index(range.lowerBound)
    let upperbound = _toUTF16Index(range.lowerBound + range.count)
    return Range(uncheckedBounds: (lower: lowerbound, upper: upperbound))
  }

  private func _stringCompareSlow(
    _ leftUTF8: UnsafeBufferPointer<UInt8>,
    _ rightUTF8: UnsafeBufferPointer<UInt8>,
    expecting: _StringComparisonResult
  ) -> Bool {
    let left = _StringGutsSlice(_StringGuts(leftUTF8, isASCII: false))
    let right = _StringGutsSlice(_StringGuts(rightUTF8, isASCII: false))
    return left.compare(with: right, expecting: expecting)
  }

  internal var _offsetRange: Range<Int> {
    let (start, end) = (startIndex, endIndex)
    _internalInvariant(
      start.transcodedOffset == 0 && end.transcodedOffset == 0)
    return Range(uncheckedBounds: (start._encodedOffset, end._encodedOffset))
  }

  final internal func character(at offset: Int) -> UInt16 {
    let str = asString
    return str.utf16[str._toUTF16Index(offset)]
  }

  public func index(_ i: Index, offsetBy n: Int) -> Index {
    let result = _slice.index(i, offsetBy: n)
    _precondition(
      (_slice._startIndex ... _slice.endIndex).contains(result),
      "Operation results in an invalid index")
    return result
  }

Control Flow

Control flow constructs are not expressions, though there are pitches out there to revisit this. The below are cases that could be simplified and would enable return omission. Again, return omission is a relatively minor benefit compared to the feature, but this could be useful fodder.

CC @dabrahams @anandabits

If-then-else and guard
  internal func _utf8ScalarLength(_ x: UInt8) -> Int {
    _internalInvariant(!UTF8.isContinuation(x))
    if UTF8.isASCII(x) { return 1 }
    return (~x).leadingZeroBitCount
  }

  internal func errorCorrectedScalar(
    startingAt i: Int
  ) -> (Unicode.Scalar, scalarLength: Int) {
    if _fastPath(isFastUTF8) {
      return withFastUTF8 { _decodeScalar($0, startingAt: i) }
    }
    return foreignErrorCorrectedScalar(
      startingAt: String.Index(_encodedOffset: i))
  }

  internal func errorCorrectedCharacter(
    startingAt start: Int, endingAt end: Int
  ) -> Character {
    if _fastPath(isFastUTF8) {
      return withFastUTF8(range: start..<end) { utf8 in
        return Character(unchecked: String._uncheckedFromUTF8(utf8))
      }
    }

    return foreignErrorCorrectedGrapheme(startingAt: start, endingAt: end)
  }

  public func index(after i: Index) -> Index {
    if _fastPath(_guts.isFastUTF8) {
      return i.nextEncoded
    }

    return _foreignIndex(after: i)
  }

  public func index(before i: Index) -> Index {
    precondition(!i.isZeroPosition)
    if _fastPath(_guts.isFastUTF8) {
      return i.priorEncoded
    }

    return _foreignIndex(before: i)
  }

  public func index(_ i: Index, offsetBy n: Int) -> Index {
    if _fastPath(_guts.isFastUTF8) {
      _precondition(n + i._encodedOffset <= _guts.count)
      return i.encoded(offsetBy: n)
    }

    return _foreignIndex(i, offsetBy: n)
  }

  public func distance(from i: Index, to j: Index) -> Int {
    if _fastPath(_guts.isFastUTF8) {
      return j._encodedOffset &- i._encodedOffset
    }
    return _foreignDistance(from: i, to: j)
  }

  public var count: Int {
    if _fastPath(_guts.isFastUTF8) {
      return _guts.count
    }
    return _foreignCount()
  }

  public var count: Int {
    if _slowPath(_guts.isForeign) {
      return _foreignCount()
    }
    return _nativeGetOffset(for: endIndex)
  }

  public func _parseMultipleCodeUnits() -> (isValid: Bool, bitCount: UInt8) {
    _internalInvariant(
      !Encoding._isScalar(UInt16(truncatingIfNeeded: _buffer._storage)))
    if _fastPath(_buffer._storage & 0xFC00_FC00 == 0xD800_DC00) {
      return (true, 2*16)
    }
    return (false, 1*16)
  }

  public func _parseMultipleCodeUnits() -> (isValid: Bool, bitCount: UInt8) {
    _internalInvariant(
      !Encoding._isScalar(UInt16(truncatingIfNeeded: _buffer._storage)))
    if _fastPath(_buffer._storage & 0xFC00_FC00 == 0xDC00_D800) {
      return (true, 2*16)
    }
    return (false, 1*16)
  }

  public func withContiguousStorageIfAvailable<R>(
    _ body: (UnsafeBufferPointer<Element>) throws -> R
  ) rethrows -> R? {
    guard _guts.isFastUTF8 else { return nil }
    return try _guts.withFastUTF8(body)
  }

  internal func withFastUTF8<R>(
    _ f: (UnsafeBufferPointer<UInt8>) throws -> R
  ) rethrows -> R {
    _internalInvariant(isFastUTF8)

    if self.isSmall { return try _SmallString(_object).withUTF8(f) }

    defer { _fixLifetime(self) }
    return try f(_object.fastUTF8)
  }

  internal func withCString<Result>(
    _ body: (UnsafePointer<Int8>) throws -> Result
  ) rethrows -> Result {
    if _slowPath(!_object.isFastZeroTerminated) {
      return try _slowWithCString(body)
    }

    return try self.withFastCChar {
      return try body($0.baseAddress._unsafelyUnwrappedUnchecked)
    }
  }

  internal var utf8Count: Int {
    if _fastPath(self.isFastUTF8) { return count }
    return String(self).utf8.count
  }

  internal func _characterStride(startingAt i: Index) -> Int {
    if let d = i.characterStride { return d }

    if i == endIndex { return 0 }

    return _guts._opaqueCharacterStride(startingAt: i._encodedOffset)
  }

  internal func _characterStride(endingAt i: Index) -> Int {
    if i == startIndex { return 0 }

    return _guts._opaqueCharacterStride(endingAt: i._encodedOffset)
  }

  internal var isImmortal: Bool {
    if case .immortal = self { return true }
    return false
  }

  internal var isASCII: Bool {
    if isSmall { return smallIsASCII }
    return _countAndFlags.isASCII
  }

  internal var isNFC: Bool {
    if isSmall {
      return smallIsASCII
    }
    return _countAndFlags.isNFC
  }

  internal var fastUTF8: UnsafeBufferPointer<UInt8> {
    _internalInvariant(self.isLarge && self.providesFastUTF8)
    guard _fastPath(self.largeFastIsTailAllocated) else {
      return sharedUTF8
    }
    return UnsafeBufferPointer(
      start: self.nativeUTF8Start, count: self.largeCount)
  }

  internal var isFastZeroTerminated: Bool {
    if _slowPath(!providesFastUTF8) { return false }

    if isSmall { return true }

    return largeFastIsTailAllocated
  }

  internal func isOnUnicodeScalarBoundary(_ index: Int) -> Bool {
    guard index < count else {
      _internalInvariant(index == count)
      return true
    }
    return !UTF8.isContinuation(self[index])
  }

  internal var nativeCapacity: Int? {
      guard hasNativeStorage else { return nil }
      return _object.nativeStorage.capacity
  }

  internal var nativeUnusedCapacity: Int? {
      guard hasNativeStorage else { return nil }
      return _object.nativeStorage.unusedCapacity
  }

  internal var uniqueNativeCapacity: Int? {
    @inline(__always) mutating get {
      guard isUniqueNative else { return nil }
      return _object.nativeStorage.capacity
    }
  }

  internal var uniqueNativeUnusedCapacity: Int? {
    guard isUniqueNative else { return nil }
    return _object.nativeStorage.unusedCapacity
  }

  internal var utf8Count: Int {
    if _fastPath(self.isFastUTF8) {
      return _offsetRange.count
    }
    return Substring(self).utf8.count
  }

  internal func foreignHasNormalizationBoundary(
    before index: String.Index
  ) -> Bool {
    if index == range.lowerBound || index == range.upperBound {
      return true
    }
    return _guts.foreignHasNormalizationBoundary(before: index)
  }

  private func _getCocoaStringPointer(
    _ cfImmutableValue: _CocoaString
  ) -> CocoaStringPointer {
    if let utf8Ptr = _cocoaUTF8Pointer(cfImmutableValue) {
      return .ascii(utf8Ptr)
    }
    if let utf16Ptr = _swift_stdlib_CFStringGetCharactersPtr(cfImmutableValue) {
      return .utf16(utf16Ptr)
    }
    return .none
  }

  internal func _stringCompare(
    _ lhs: _StringGuts, _ rhs: _StringGuts, expecting: _StringComparisonResult
  ) -> Bool {
    if lhs.rawBits == rhs.rawBits { return expecting == .equal }
    return _stringCompareWithSmolCheck(lhs, rhs, expecting: expecting)
  }

  internal func _stringCompareInternal(
    _ lhs: _StringGuts, _ rhs: _StringGuts, expecting: _StringComparisonResult
  ) -> Bool {
    guard _fastPath(lhs.isFastUTF8 && rhs.isFastUTF8) else {
      return _stringCompareSlow(lhs, rhs, expecting: expecting)
    }

    let isNFC = lhs.isNFC && rhs.isNFC
    return lhs.withFastUTF8 { lhsUTF8 in
      return rhs.withFastUTF8 { rhsUTF8 in
        return _stringCompareFastUTF8(
          lhsUTF8, rhsUTF8, expecting: expecting, bothNFC: isNFC)
      }
    }
  }

  internal func _stringCompare(
    _ lhs: _StringGuts, _ lhsRange: Range<Int>,
    _ rhs: _StringGuts, _ rhsRange: Range<Int>,
    expecting: _StringComparisonResult
  ) -> Bool {
    if lhs.rawBits == rhs.rawBits && lhsRange == rhsRange {
      return expecting == .equal
    }
    return _stringCompareInternal(
      lhs, lhsRange, rhs, rhsRange, expecting: expecting)
  }

  static func _tryFromUTF8(_ input: UnsafeBufferPointer<UInt8>) -> String? {
    guard case .success(let extraInfo) = validateUTF8(input) else {
      return nil
    }

    return String._uncheckedFromUTF8(input, isASCII: extraInfo.isASCII)
  }

  internal static func _fromSubstring(
    _ substring: __shared Substring
  ) -> String {
    if substring._offsetRange == substring.base._offsetRange {
      return substring.base
    }

    return String._copying(substring)
  }

  internal static func _copying(_ str: Substring) -> String {
    if _fastPath(str._wholeGuts.isFastUTF8) {
      return str._wholeGuts.withFastUTF8(range: str._offsetRange) {
        String._uncheckedFromUTF8($0)
      }
    }
    return Array(str.utf8).withUnsafeBufferPointer {
      String._uncheckedFromUTF8($0)
    }
  }

  internal var _wholeGuts: _StringGuts {
    if let str = self as? String {
      return str._guts
    }
    if let subStr = self as? Substring {
      return subStr._wholeGuts
    }
    return String(self)._guts
  }

  internal func _nativeIsEqual<T:_AbstractStringStorage>(
    _ nativeOther: T
  ) -> Int8 {
    if count != nativeOther.count {
      return 0
    }
    return (start == nativeOther.start ||
      (memcmp(start, nativeOther.start, count) == 0)) ? 1 : 0
  }

  final internal var hash: UInt {
    if isASCII {
      return _cocoaHashASCIIBytes(start, length: count)
    }
    return _cocoaHashString(self)
  }

  final internal func _fastCStringContents(
    _ requiresNulTermination: Int8
  ) -> UnsafePointer<CChar>? {
    if isASCII {
      return start._asCChar
    }
    return nil
  }

  public static func encode(
    _ source: Unicode.Scalar
  ) -> EncodedScalar? {
    guard source.value < (1&<<7) else { return nil }
    return EncodedScalar(UInt8(truncatingIfNeeded: source.value))
  }
Switch
```swift
  internal var _isSymbol: Bool {
    switch self {
      case .mathSymbol, .currencySymbol, .modifierSymbol, .otherSymbol:
        return true
      default: return false
    }
  }

  internal var _isPunctuation: Bool {
    switch self {
      case .connectorPunctuation, .dashPunctuation, .openPunctuation,
           .closePunctuation, .initialPunctuation, .finalPunctuation,
           .otherPunctuation:
        return true
      default: return false
    }
  }

  public static func width(_ x: Unicode.Scalar) -> Int {
    switch x.value {
      case 0..<0x80: return 1
      case 0x80..<0x0800: return 2
      case 0x0800..<0x1_0000: return 3
      default: return 4
    }
  }

  static func ==(
    _ lhs: _StringComparisonResult, _ rhs: _StringComparisonResult
  ) -> Bool {
    switch (lhs, rhs) {
      case (.equal, .equal): return true
      case (.less, .less): return true
      default: return false
    }
  }

  internal static func _fromUTF8Repairing(
    _ input: UnsafeBufferPointer<UInt8>
  ) -> (result: String, repairsMade: Bool) {
    switch validateUTF8(input) {
    case .success(let extraInfo):
        return (String._uncheckedFromUTF8(
          input, asciiPreScanResult: extraInfo.isASCII
        ), false)
    case .error(let initialRange):
        return (repairUTF8(input, firstKnownBrokenRange: initialRange), true)
    }
  }

  func hasBreakWhenPaired(_ x: Unicode.Scalar) -> Bool {
    switch x.value {
    case 0x3400...0xa4cf: return true
    case 0x0000...0x02ff: return true
    case 0x3041...0x3096: return true
    case 0x30a1...0x30fc: return true
    case 0x0400...0x0482: return true
    case 0x061d...0x064a: return true
    case 0xac00...0xd7af: return true
    case 0x2010...0x2029: return true
    case 0x3000...0x3029: return true
    case 0xFF01...0xFF9D: return true

    default: return false
    }
  }

  internal func _cString(encoding: UInt) -> UnsafePointer<UInt8>? {
    switch (encoding, isASCII) {
    case (_cocoaASCIIEncoding, true):
      fallthrough
    case (_cocoaUTF8Encoding, _):
      return start
    default:
      return _cocoaCStringUsingEncodingTrampoline(self, encoding)
    }
  }

  public var _objectIdentifier: ObjectIdentifier? {
    switch _form {
      case ._cocoa(let object): return ObjectIdentifier(object)
      case ._native(let object): return ObjectIdentifier(object)
      default: return nil
    }
  }
26 Likes
(Morten Bek Ditlevsen) #2

Really cool case study!
I know that this is completely besides the point, but did UTF16 change to UTF8 on purpose in the second ‘Collection boilerplate’ diff entry?

(Morten Bek Ditlevsen) #3

Ah, in the PR I can see that the ‘before’ and ‘after’ in your post is probably from two different places in the PR...

#4

Thanks for putting together and sharing this, @Michael_Ilseman.

My initial observation upon looking through the new version, is that return-elision looks great when the entire declaration is on a single line. In particular, when the closing brace is on the same line as the keyword var, func, subscript, or get, it looks sleek and Swifty to simply state the expression in the block with no return.

However, when the declaration covers multiple lines—especially when the closing brace is on a different line from the opening brace—I keep expecting a return. It seems out of place to have a bare expression on a line by itself.

So as a purely stylistic matter, I would be in favor of adopting the convention that return should be omitted when both braces bounding the block are on the same line as the aforementioned keyword, and should be included when they are not.

2 Likes
(Martin R) #5

Just curious: Why is this not a stored property?

internal static let countMask: UInt64 = 0x0000_FFFF_FFFF_FFFF
1 Like
(Tony Allevato) #6

AFAIK, a stored static property would be a lazily initialized value that would need to be allocated and reside in memory for the remainder of the program, whereas a computed property doesn’t require any allocation—it just returns the constant each time.

Similarly, for instance properties, stored properties add storage (even if they are initialized with constants) to the type.

1 Like
(Matthew Johnson) #7

This is a fantastic case study.

Thanks for putting it together @Michael_Ilseman! Thank you especially for including a bunch of examples that can be used as motivation for if / switch expressions. I would love to see that move forward and would be happy to help contribute to a proposal. Unfortunately that topic will be stalled until somebody decides to work on implementation.

I believe storage of let instance properties is optimized away in some cases. I don’t know the exact details of when the optimization is possible though.

1 Like
(👑🦆) #8

Most of these are great. Not sure they are all readability wins, though:

public static func isLeadSurrogate(_ x: CodeUnit) -> Bool {
  (x & 0xFC00) == 0xD800
}

Maybe it’s a familiarity thing, but when I see that I need an extra second to parse it over the version with an explicit return statement.

6 Likes
(Oliver Jones) #9

Other languages get rid of the need for this return using “expression functions”. Eg in C#

public static bool EqualsIgnoreCase(this string a, string b)
    => string.Equals(a, b, StringComparison.OrdinalIgnoreCase);

This being an extension of C#’s lamda expression syntax.

Why not something like this in Swift (as Swift already supports eliding return in simple closures).

var isNotEmpty: Bool = { !isEmpty }

func isMultiple(of n: Int) -> Bool = { self % n == 0 }

The hard part is really just coming up with a syntax that isn’t horrible.

(Michael Ilseman) #10

Ah, that's my mistake. I was picking out a handful of representative samples and mixed Unicode.Scalar.UTF16View and Unicode.Scalar.UTF8View there. I wanted to go with UTF8.width() in the listing because it's less trivial than for UTF16.

Yeah, I can understand that. I figure that omitting returns is one of those things that looks a little odd at first, but will grow on me. I went with a more aggressive omission to test this, and since I'm the maintainer of this code anyways, I can report back in a release or two on how it went :slight_smile:.

I've spent many, many late nights diagnosing tiny regressions and missed optimizations. In performance-sensitive contexts, I favor code that is more closely translatable to the assembly that I want. Here, I want an immediate and not a load (these bit patterns were chosen to be representable as immediates), but I want a name for it. Lacking macros, an always-inlined function returning the immediate is the closest I can get.

Also, if that decl is reachable from @inlinable code, then the storage could become part of the ABI. I don't recall all the details here, CC @Slava_Pestov

Yeah, as I mentioned to @Nevin, I also view this as performing an experiment on the code maintainer (future me), and am interested in seeing the result.

I'm not sure I follow, you can write this today in Swift:

var isNotEmpty: Bool { !isEmpty }
func isMultiple(of n: Int) -> Bool { self % n == 0 }

What is the = buying you in your code?

4 Likes
(Adam Roberts) #11

Thanks for the wonderful case study @Michael_Ilseman. It shows the broad applicability of this feature. I look forward to updating a lot of code.

Your note on subexpressions strikes a chord with me. It is very common to have what is logically a single expression factored out into multiple subexpressions to avoid repeating computations or even just for readability. It would be great to have a way to write those as single expressions with their subordinate declarations somehow attached.

Here's an example with the normal distribution quantile function approximation from Abramowitz. Today I write it as two expressions:

func qapprox(p: Double) -> Double {
    let t = sqrt(-2 * log(p))
    return t - (2.30753 + t * 0.27061) / (1 + t * (0.99229 + t * 0.04481))
}

But t is just a factored out piece of what is really a single expression return. It would be nice to be able to write:

func qapprox(p: Double) -> Double {
    let t = sqrt(-2 * log(p)) in
    t - (2.30753 + t * 0.27061) / (1 + t * (0.99229 + t * 0.04481))
}

or:

func qapprox(p: Double) -> Double {
    t - (2.30753 + t * 0.27061) / (1 + t * (0.99229 + t * 0.04481))
    where
    t = sqrt(-2 * log(p))
}

This example has a single sub-expression but presumably there could be a list of sub-expressions in either syntax.

If we did ever get if / else / switch as expressions then making the sub-expressions lazily evaluated would be a big usability feature, too. Some sub-expressions might only be used in some cases.

The particular function I cited is only valid for p in (0,0.5] so it would also benefit either from a solution to your precondition use case or from if / else / switch expressions. In the actual current implementation I just use a guard.

(Jordan Rose) #12

I see that these are syntaxes you can write to factor out subexpressions. Why are they better, though? They're not any less ordered or imperative.

2 Likes
(Oliver Jones) #13

I guess I’m just demonstrating my ignorance then. :slight_smile:

(Adam Roberts) #14

So I agree with your statements and your question made me think about this.

I value having a function be single expression because for a certain category of coding tasks it is easier for me to reason about correctness when things are equal to just one thing.

The new elided return provides a nice visual indicator that I know a function is single expression. Even better than that, it gives me some amount of guarantee from the compiler that the function really is single expression. I wanted to have that with subexpressions added on as well (as their own single expression declarations) because it would cover more of the functions I write.

In the case of the second syntax I wrote there I also think it helps readability to have the single expression that the function is equal to at the top. That way it is near the function's declaration. It could be that just as many people find it harder to read that way though.

(Davide De Franceschi) #15

Cannot omit return in a func :eyes:

#16

You are commenting on a thread about SE-0255, which is the proposal which allows return to be omitted from (single-expression) functions.

1 Like
(Davide De Franceschi) #17

I was convinced it was only computed vars in the end :man_facepalming:

(Matthew Johnson) #18

I think it might be interesting to consider let... in and other constructs that allow more Swift to be written in an expression-oriented style. Another that comes to mind is guard...in. Rewriting your current implementation:

func qapprox(p: Double) -> Double {
    guard 0 < p && p <= 0.5 else { .nan } 
    in let t = sqrt(-2 * log(p)) 
    in t - (2.30753 + t * 0.27061) / (1 + t * (0.99229 + t * 0.04481))
}

I have always thought there was some similarity between Swift and ML and this would move it a bit further in that direction.

That said, I don't think this should be pursued immediately. I think the win from if and switch expressions is much larger and is already controversial enough. If we're able to get a proposal for these accepted then we could try to build on that with guard and let expressions in the future.

3 Likes
(Matthew Johnson) #19

I don't think they're strictly better, just as allowing return to be omitted is not strictly better than requiring it. It's subjective, but some people prefer expression-oriented code. If we pursue any of the directions that increase what you can do in an expression I'm sure there will be controversy. It's clear that the Swift community has a broad range of preferences in this regard. If somebody comes along to implement these ideas the core team will have to decide where to draw the line.

2 Likes
(Jordan Rose) #20

Hm. I'd say return omission had things going for it (and against it) besides being expression-oriented: it was argued that the return was not just unnecessary but that it made short getters and such more awkward than they needed to be. I don't feel like that extends to let/in since it isn't simpler syntactically than the thing you started with and doesn't give you any additional guarantees besides "this is an expression".

(The trailing where is a little different since it presents the content in a different order. I've used the equivalent in Haskell before, so I don't want to dismiss it out of hand here.)

1 Like