Proposal to improve the standard library stride types

moiseev · February 7, 2018, 7:43pm

The latest version will be available as a PR at apple/swift-evolution#786.

Revamp the StrideTo/StrideThrough types

Proposal: SE-0199
Authors: Max Moiseev, Xiaodi Wu
Review Manager: TBD
Status: Awaiting implementation
Implementation: apple/swift#14288

Introduction

The Swift standard library provides two sequence types StrideTo and
StrideThrough, instances of which are returned from stride(from:to:by:)
and stride(from:through:by:) respectively. These types are designed to be more
flexible versions of Swift ranges, allowing start values less than end values
and stride values other than 1.

Both of these types currently conform to Sequence. This is quite unfortunate,
because for integer values a StrideTo instance can conform to
RandomAccessCollection and provide more performant implementations for methods
like prefix(_:), dropFirst(_:), etc. That is O(1) for
RandomAccessCollection as opposed to O(n) for Sequence.

Motivation

We are going to use StrideTo type in the motivating examples for brevity, but
the same reasoning applies to StrideThrough as well.

Let's consider the following snippet:

let myStride = stride(from: 1, to: 10, by: 3)
_ = myStride.dropFirst(1)

The dropFirst(_:) method call will return a type-erased AnySequence; that
involves allocating a whole new class instance (although this is an
implementation detail) instead of just advancing the start value by one step.

With the addition of conditional conformances to the language, the result
of zip(_:_:) can now be a Collection if both containers being zipped conform
to Collection. (See
apple/swift#13941 for the details.)
Unfortunately, without StrideTo conforming to Collection as proposed,
expressions such as the following do not take advantage of zip’s conditional
conformance.

zip([1,2,3], stride(from: 2, to: 42, by: 2))

The result of this expression only conforms to Sequence and as such provides
none of the Collection APIs.

Proposed solution

We propose to make the following changes to the stride types StrideTo and
StrideThrough.

Make them conform to RandomAccessCollection on condition that the
Element.Stride conforms to BinaryInteger (see the Detailed design section for the explanation of the condition)

extension StrideTo : RandomAccessCollection
where Element.Stride : BinaryInteger { }

extension StrideThrough : RandomAccessCollection
where Element.Stride : BinaryInteger { }

Since currently both StrideTo and StrideThrough define SubSequence as AnySequence<Element>, and it only conforms to Sequence, another modification needs to be made.

Make both types their own SubSequence:

struct StrideTo<Element> {
  typealias SubSequence = StrideTo
}

struct StrideThrough<Element> {
  typealias SubSequence = StrideThrough
}

We believe these changes will solve the problems outlined in the
Motivation section and make code using results of
stride(from:to:by) and stride(from:through:by:) more powerful and more
performant without any modifications to the call sites.

Detailed design

Turning StrideTo and StrideThrough into slice types (making them their own
sub-sequences) only requires implementation, not much of a design. Conforming to
the Collection family of protocols, however, is another topic and is worth
discussing in a little more detail.

Condition for `Collection` conformance

There are other Strideable types in the standard library, including all
floating point number types. However, floating point types represent real
numbers, and striding over even a modest interval may involve a very large
number of iterations if the stride is small. The protocol Collection, however,
uses Int as the type of its count property. So reading the value of count
for an innocuous stride could easily result in trapping on integer overflow if
one is not careful.

This same argument can, of course, be applied to strides of integer values
containing many more elements than an Int can represent, but we believe this
is a far less frequent situation. Besides, the same behavior can already be
observed with the types provided by the standard library. For example, taking a
distance(from: startIndex, to: endIndex) of a range -1..<Int.max will result
in an overflow.

Besides, as mentioned at the very beginning of this proposal, StrideTo and
StrideThrough types are more general versions of Swift's ranges, that allow
stride or step other than 1, and ranges only conform to
RandomAccessCollection if they are countable, which is expressed as:

extension Range: RandomAccessCollection
where Bound: Strideable, Bound.Stride: SignedInteger { }

The proposed design for StrideTo and StrideThrough is therefore:

extension StrideTo : RandomAccessCollection
where Element.Stride : BinaryInteger { }

extension StrideThrough : RandomAccessCollection
where Element.Stride : BinaryInteger { }

Chosing the right `Index` type

Another item worth noting is that there are at least 2 possible ways to
represent indices for the new collection types.

Zero-based Int indices
Use typealias Index = Element
Use option 1 for StrideTo and re-use ClosedRange.Index for StrideThrough

Zero-based indexing is somewhat simpler to implement, but the downside is that
indices of such collections are no longer intechangeable with its sub-sequence's,
which is not the behavior all other standard library collection exhibit. For
example:

let r = 5..<10
let sub = r[7...]
r[sub.startIndex] == sub[sub.startIndex] // results in true

ClosedRange.Index type is designed to allow for an explicit pastEnd case
and might be a good candidate for StrideThrough, but stride-through is
slightly more complicated than a ClosedRange because the last value can be
jumped over when it does not fall on the stride boundary. This detail will
unnecessarily complicate the implementation of Collection conformance with no
benefits over option 2.

Therefore, we're left with what seems to be the only option for Index type:

extension StrideTo: Collection
where Element.Stride: BinaryInteger {
  typealias Index = Element
}

extension StrideThrough: Collection
where Element.Stride: BinaryInteger {
  typealias Index = Element
}

Source compatibility

This is a source breaking change, as demonstrated by the following example that
would no longer compile.

func consume(_ xs: AnySequence<Int>) { }
consume(stride(from: 1, to: 10, by: 2).dropFirst(1))

However, running the test compatiblity test suite on the branch that implements
this proposal did not reveal any related breakage. We also believe that results
of stride(from:to:by:) and stride(from:through:by:) functions are rarely
used in a way that would be incompatible with this change. The majority of the
usages would be to just loop over strides using for..in or calling Sequence
combinators, both of which are not affected.

Making this change in a backward-compatible way would be much harder and may
result in a worse API overall. (See Alternatives
considered section for details.)

Effect on ABI stability

This work is meant to fix the ABI of the standard library.

Alternatives considered

One other option, and a backward-compatible one in fact, would be to preserve
StrideTo conformance to Sequence with the default SubSequence and
introduce a new type StrideToCollection unconditionally contrained to
Element.Stride: BinaryInteger. This solution would also require introducing
new overloads for stride(from:to:by:) and stride(from:through:by:)
functions, similarly constrained, to produce instances of StrideToCollection /
StrideThroughCollection. We believe that this solution results in a less
understandable API that also creates extra work for the type checker.

Nevin · February 7, 2018, 8:58pm

Perhaps I am missing something, but conceptually a StrideTo is *always* countable, and is *always* random-access. In contrast, a Range only has a canonical step size if its bounds-type does, whereas a StrideTo explicitly specifies its step size.

So count me on the side of giving StrideTo an *unconditional* conformance to RandomAccessCollection. I don’t know what would be required to make that happen, but it seems the natural thing to do.

nnnnnnnn · February 7, 2018, 9:26pm

Overall, the proposal looks terrific, and I agree with holding off on Collection conformance for floating-point strides. I think the index issue could use some more exploration, however. Using a StrideTo's element as its index makes sense, but for StrideThrough will create the same problem as it did for ClosedRange — we can't define a "past the end" index that will include the maximum element of a type's range. What are the complexities that prevent using something like ClosedRange.Index?

xwu · February 8, 2018, 2:07am

Conceptually, yes, but an unconditional conformance is unimplementable.

(Why? Recall that a type can only conform to a protocol in one way. Because of rounding error in floating-point types, the way to compute count correctly for ranges with floating-point bounds cannot be expressed without floating-point specific methods not available for integer bounds (not to mention that even the most efficient way of performing that computation is both gross and expensive). And if you attempt to fix all such issues by dynamically dispatching to protocol requirements available on the numeric type itself, you'd end up having to transpose the entirety of the stride type API to the numeric types themselves as underscored protocol requirements with default implementations. This is plainly unacceptable.)

nnnnnnnn · February 8, 2018, 4:05am

Thinking more about the index type for these, I wonder if boxing Element even for StrideTo would be better than having Element as the index. An unboxed Element index leaves gaps that are easy to accidentally use and a little expensive to check for. For example, I could accidentally write code like this:

let myStride = stride(from: 0, to: 100, by: 10)
for i in 0..<myStride.count {
    print(myStride[i])
}
// or
let firstElement = myStride[myStride.startIndex]
let secondElement = myStride[myStride.startIndex + 1]

We'll need to perform a division to validate each index on subscripting. That's slower than a bounds comparison, and while it probably isn't too bad for our stdlib integer types, it could be significant for a bigint type. If we're not going to validate the indices this wouldn't be an issue, but it seems like it's probably easy to get wrong—we don't have any other collections with nonconsecutive indices like this.

If we box the value we could rely on just the bounds checks instead, and then both StrideTo and StrideThrough would have opaque index types:

extension StrideTo {
    struct Index : Comparable, Hashable {
        let _value: Element
        internal init(_ _value: Element) { ... }
    }
}

moiseev · February 8, 2018, 10:39pm

Great idea!

https://github.com/apple/swift/pull/14288/commits/3d82ca0404fb9ddd8cb2d3310b22b850c0ee4fd9

With this in mind, I'll take another shot at re-using ClosedRange.Index for StrideThrough.

moiseev · February 9, 2018, 1:24am

After having discussed the issue with @Ben_Cohen it looks like a good idea to make range types conform to Collection with Bound: Strideable, Bound.Stride: SignedNumeric as opposed to the current , Bound.Stride: SignedInteger.

Its iterator will then have to be implemented using the Strideable._step function in order to avoid negative effects of rounding error accumulation. Similar to how it was done for stride types by @xwu.

xwu · February 9, 2018, 1:55am

I'm concerned about this; count is unimplementable with such a relaxed constraint--at least, it is not implementable correctly (i.e. such that there are no off-by-one errors with respect to final round-off of floating-point values) unless it's an ~~O(N)~~ O(log(N)) computation traversing the entire stride.

Even if we punt this to a Strideable._count function, no default implementation of such a function can be written other than traversing the entire stride (recall that SignedNumeric does not have division), making this either a breaking change on all Strideable types (if no default implementation is supplied) or a very suboptimal design (if the default implementation is ~~O(N)~~).

Nevin · February 9, 2018, 2:39am

Perhaps we need a “Euclidean division” protocol

Nevin · February 9, 2018, 2:49am

Technically count can be found in O(log(n)) with a “galloping” (ie. exponential) binary search.

xwu · February 9, 2018, 3:12am

Fair. Still, not great.

Nevin · February 9, 2018, 2:52pm

If we want StrideTo to be its own SubSequence, *and* we want it to work for floating-point elements, do we need to store the “base” value as distinct from the first entry?

As in, if we simply rebase a slice to its first element, couldn’t that yield different results than “originalBase + n * stride” due to rounding?

Nobody1707 · February 9, 2018, 4:35pm

Maybe, but a SignedInteger couldn't conform to it. Euclidean division requires that the remainder be non-negative.

xwu · February 9, 2018, 5:06pm

Yes, you’re right, and that (i.e., storing the original start value) would blow up the fixed layout of StrideTo. Again showing how floating-point strides conforming to Collection are either easily subtly broken or very, very complicated for the model.

Nevin · February 9, 2018, 5:39pm

It’s not just Collection, this is a problem for Sequence as well. After all dropFirst() returns a SubSequence.

Example:

let a = 1 as Float
let b = a + 4 * a.ulp
let c = a.ulp * 4 / 3
let s = stride(from: a, to: b, by: c)
for x in s { print(String(x.bitPattern, radix: 16)) }
// Prints:
// 3f800000
// 3f800001
// 3f800003

let s2 = stride(from: a + c, to: b, by: c)
for x in s2 { print(String(x.bitPattern, radix: 16)) }
// Prints:
// 3f800001
// 3f800002

For added fun, try changing 4 to 3 in b, or (4 / 3) to (5 / 3) in c.

xwu · February 9, 2018, 6:04pm

...that is indeed a problem.