There is no fundamental Swift bug here. Things are largely working as expected, with one minor exception.
Each step of this method chain is eagerly executed to completion, before the next one starts.
This statement is therefore expected to start by invoking isValid precisely seven times in filter, to produce a filtered version of the entire input array.
The prefix operation is then run on the resulting new array, visiting the first three indices. (At offsets 0, 1, and 2 from the start, with 2 being the end index of the resulting slice.)
This is working exactly as designed!
(Eager evaluation is pretty useful, too -- if you have a large amount of data, and you can spare the memory to hold all those intermediate temporary arrays, such methods chains can have better throughput than any of the lazy alternatives. And at the end of it, you get all results nicely packaged in a direct Array value -- which can be repeatedly accessed with no further computing.)
This is not true.
The addition of lazy switches the meaning of this method chain expression: instead of each step executing to completion on after the other, this is creating a chain of on-the-fly collection transformations, each doing as little work as possible to respond to each request.
The prefix method stops iterating the contents of its input collection as soon as it finds the full extent of the slice that it needs to return:
public __consuming func prefix(_ maxLength: Int) -> SubSequence {
_precondition(
maxLength >= 0,
"Can't take a prefix of negative length from a collection")
let end = index(startIndex,
offsetBy: maxLength, limitedBy: endIndex) ?? endIndex
return self[startIndex..<end]
}
This does not iterate over the entire input collection, unless the prefix reaches the end of it.
In the specific case of the original code sample, the bounds for the returned collection slice are expected to be at offsets 0 and 2 in the filtered collection. These positions correspond to offsets 0 and 4 in the original array. (I.e, the first and the third true value.)
To find these bounds, this method does the following steps:
- Invoke
isValid once to find the position of startIndex (at offset 0 in the original array).
- Invoke
isValid four more times to find the index at offset 2 from this start index. (It's at offset 4 in the original array.)
- Invoke
isValid once more to find startIndex again.
The code needs to visit array items at offsets between 0 and 4 (inclusive), and that is precisely what it is doing. Altogether, it invokes isValid six times while doing this. This is only one invocation more than the minimum number necessary to find the bounds of the returned slice, as item 0 is visited twice.
Items 5 and 6 are never visited.
A real (but very minor) performance issue is that prefix should only compute startIndex once. (It is probably possible to fix that.) Other than that, the current implementation is doing the minimum possible amount of work to get the desired result.
(Somewhat regrettably, FilteredCollection does not precompute its startIndex during its creation. That said, some people (mistakenly) disagree with the idea of doing nontrivial work while creating a lazy transformation, and in any case, this is not easily changed at this point, as that type is now frozen.)
Using the sequence-based, lazy prefix transformation does indeed eliminate the need to find an endIndex -- it can stop at the last element that the prefix needs. (This is in addition to avoiding calculating startIndex twice.)
In the case of this code sample, this variant visits the four array items between offsets 0 and 3 -- it does not need to visit item 4 and it doesn't visit item 0 twice.
Avoiding looking at a single additional index (or two) is rarely, if ever, going to be worth worrying about.
My recommendations:
-
If you have a seven-element array of booleans that you need to process once, then it simply does not matter how inefficiently your code is filtering it. Scan the whole array four times. Do a clever quadratic algorithm. Copy items into a temporary Set, then sort the results. Encode the array into JSON and spawn a subprocess to process it. Better yet, send it over a satellite connection to a remote machine. Figure out a way to use Morse code or Unicode grapheme breaking! Go wild!
-
If you have more than seven items, and you think your code is running too slow, remember to profile it before making changes (or blaming it on your dependencies). The code may indeed be slow, but for me, the bottleneck tends not to be where I expect it. (If you do end up making changes, remember to also profile the result, to see if there was an actual improvement.)
-
It is usually a very bad idea to try solving hypothetical performance issues by reaching for AnySequence. Doing that has a distinct tendency to turn imagined performance problems into real ones.
-
The lazy collection transformations aren't general drop-in replacements to the eager ones -- they produce different behavior. Passing closures with side effects to these lazy transformations (such as the print statements in this code sample) will cause major headaches and confusion, and, eventually, bugs. Do not do that.
-
In general terms, it can be a good idea to reach for the least capable tool that still fits the use case. So if you only need the results to iterate over them once, then it can indeed be technically better to generate them as a lazy sequence rather than a lazy collection: the latter is capable of far more interesting things, and subsequently it can sometimes be slightly slower. However, this does not mean that it is good practice to force every lazy expression to be limited to Sequence -- that might be a good trick to know about in the exceptionally rare cases when the difference actually matters. (Use a profiler.)