Lazy Filter Subscripts

Hello Swift Evolution,

In playing around with the standard library’s Sequence types, I noticed an unexpected behaviour.

The current implementation of LazyFilterCollection uses the indices of the base collection as its own, and also forwards any subscripts directly to the base collection. The relevant implementation is as shown below:

extension LazyFilterCollection: Collection {
	public subscript(position: Index) -> Element {
		return _base[position]
	}
	
	public subscript(bounds: Range<Index>) -> SubSequence {
		return SubSequence(_base: _base[bounds], _predicate)
	}
}

This means that it is possible to retrieve values via subscripting that shouldn’t exist in the filtered collection:

let evenDigits = Array(0 ..< 10).lazy.filter { $0.isMultiple(of: 2) }
print(evenDigits[3]) // prints 3

And it also means that indices which don’t exist in the filtered collection can be subscripted:

let evenDigits = Array(0 ..< 10).lazy.filter { $0.isMultiple(of: 2) }
print(evenDigits[5])               // prints 5
print(Array(eventDigits[5...]))    // prints [6, 8]

This behaviour isn’t currently documented.

Additionally since Sequence operators are often used in a chained fashion (lazy.compactMap, for instance, is implemented as lazy.map.filter.map) it would be hard to debug any errors caused by the existing subscripting behaviour, and especially with API vending type-erased or opaque collections, consumers might not even be aware that a lazy filter is being used under the hood, unless the creator decides to explicitly leak and maintain that implementation detail.

As for possible solutions, I have a draft Swift Evolution proposal that introduces a new LazyFilterCollection.Index subtype which handles the base index internally, with the working being similar to String.Index, with the init being private, and construction being limited to querying a given lazy filtered collection instance using the index(after:), index(before:) etc. methods. That proposal, however, would break the ABI.

I’m not sure what other possible solutions there are that can also work efficiently given the lazy execution model and also preserve ABI compatibility, so I’m curious to hear the forum and standard library team’s take on this problem.

3 Likes

I'm not sure how this is different from a bug where you use index not introduced by Collection's own index API, which already falls into undefined territory.

4 Likes

How about adding a runtime check in debug mode? It should be abi and api compatible as far as I know, would catch most of the bugs, and would educate people that you cannot use shady indices

1 Like

Do the easiest/simplest job that will work.

The actual filtering within a LazyFilterCollection is done within the index(after:) method. When the source collection is bidirectional, so is the lazy filter, and index(before:) also actively filters. No other part of the lazy filter does filtering; the other core methods forward to the base collection or to one of the filtering methods.

Remember that collections cannot take arbitrary valid values of their Index type; only states that correspond to an active element (or the past-the-end marker) of the current instance are legal to place into a method that takes Index parameters.

Remember that the range of a collection's indices is contiguous from the collection's perspective, but does not have to be a contiguous range from the Index type's perspective; those gaps, although valid states, must not be put in as index parameters. The purpose of LazyFilterCollection is to create gaps; so using a gap index in the post-filtered collection is just as undefined behavior as using a completely-out-of-range index in the pre-filtered collection. So there's no extra bug here.

Problem with that attitude is that Swift is supposed to be a safe language.
We don't expect people to learn all the invalid patterns in the language by heart and never forget them. If a programmer tries to do something illegal we either have an error at compile time or at runtime.

We don't expect people to remember that you cannot add two Ints that together give you more than Int.max: we crash instead

We don't expect people to remember that you cannot dereference a null pointer: we instead have Optionals that you have to unwrap explicitly

We don't expect people to remember that you cannot index past the end of an array: we instead show them a nice error message at runtime telling them what they've done wrong

5 Likes

The documentations always say something along the line of only use valid index here, not using invalid index will trap the execution. I’m not sure that always trap philosophy will always be a viable option. I’d prefer it where appropriate of course.

I’ve seen safety being mentioned a few times over the years. Though there’re at least two interpretations to that.

One being “memory safety”, which Swift has been subscribing to stupendously. So much so that, IIRC, the word unsafe has been use only when such kind of safety can potentially be violated.

Another one is “safety” in terms of no undefined behavior, which I see Swift as following where it doesn’t impact other core goals.

Whether or not Swift does subscribe to both/all notions of safety, as well as how prioritized those notions are, probably deserves a separate discussion thread.

This is true, though for the only other standard library type which has a similar problem with subscripting a given offset, String, this is also actively discouraged by using a custom Index type.

I disagree with this assessment. I agree that it’s undefined behaviour and might have to persist because of ABI compatibility requirements, but to say that it’s not a bug is letting your knowledge of the implementation excuse what to an end user is incorrect behaviour. Just because you know how a bug happened doesn’t mean that it stops being one.

2 Likes

At a minimum, document the behavior.

1 Like

It's already documented, in Collection. For any given instance, using valid states of Index that are not members of the sequence vended by indices is already invalid behavior. The current Standard library just doesn't guarantee a crash; it actually "works" here due to the shortcuts used in writing LazyFilterCollection. The only way to make it not "work" would be to either exhaustively test Index values (which is non-trivial) or use a wrapper index type. The latter would only exist to prevent cross-contamination in dereferencing; it wouldn't help with the operations of LFC.

I've mentioned this in some thread here long ago, but I copied a quote from a C++ FAQ: "guard against Murphy, not Machiavelli." The OP is treating this like a Murphy crisis, but as far as I, and the Standard library, are concerned, it's actually a Machiavelli concern. You can only trigger this by doing something you weren't supposed to do in the first place.

You missed the key word "extra." I'm saying it's a bug for the exact same reason @Lantua did.

Generally, we would add a _debugPrecondition() to check for this kind of API misuse. Adding a regular precondition would likely be too expensive, as it would cause us to evaluate the predicate more than once in many situations, and there is an expectation that subscripting with an index is the way to retrieve an element with lowest overhead.

However, in this case, it is tricky to check this precondition even in debug mode only because we would need to evaluate a user-defined predicate, which might be stateful, and people might be relying on the exact number and order in which their predicates are evaluated (even though I'd generally say they shouldn't). Furthermore, the same code would evaluate the user-defined predicate a different number of times in debug mode and in release mode.

4 Likes

I know I'm really late to the party but I just solved a really hard to understand bug that was caused by code subscripting into a lazily filtered array way down in some seemingly unrelated code. I've always considered Swift to be safe in the sense that it lets me know when the code is doing something that potentially has unintended consequences but this completely slipped under that radar. I strongly believe that subscripting a lazily filtered array should be disallowed by Swift. Or, at the very minimum, the compiler should warn about such usage.