Creating a CollectionOfOne from a Sequence element

Strangely, multiple prototype ideas result in me needing to create a CollectionOfOne from a sequence element. I can copy out the element from the Collection, of course, but taking out an element from a Sequence is shockingly a lot harder. But maybe we can make the process easier:

extension CollectionOfOne {

    /// Creates an instance containing just the first element from the given
    /// sequence.
    ///
    /// - Parameters:
    ///   - sequence: The source of the element to store in the collection.
    ///   - requireSingleton: Whether the element to be stored must be the sole
    ///     element from the source sequence.  If not given, defaults to
    ///     `false`, *i.e.* this initializer will ignore whether `sequence` has
    ///     further elements.
    /// - Postcondition: `first!` is the same as the first element from
    ///   `sequence`.  Fails if either `sequence` is empty or if
    ///   `requireSingleton` is `true` and `sequence` has at least two elements.
    public init?<S: Sequence>(
        firstFrom sequence: S,
        noResiduum requireSingleton: Bool = false
    ) where S.Element == Element {
        var iterator = sequence.makeIterator()
        guard let first = iterator.next(),
              !requireSingleton || iterator.next() == nil else {
            return nil
        }

        self.init(first)
    }

}

Something to add someday?

let values = Array(sequence.prefix(2))
guard !values.isEmpty,
    values.count == 1 || !requireSingleton else {
  return nil
}

self.init(values[0])

Just going to rename the optional second parameter label to be positive.

extension CollectionOfOne {
    /// Creates an instance containing just the first element from the given
    /// sequence.
    public init?<S: Sequence>(
        firstFrom sequence: S,
        onlyOne requireSingleton: Bool = false
    ) where S.Element == Element {/*...*/}
}

Since Sequence is technically single-pass, this emptiness check can permanently discard an element.

Same here. Anything that reads elements from a generic Sequence should be considered mutating.

What if we allowed a way to recover that dropped second element:

public init?<S: Sequence>(
    firstFrom sequence: S,
    recoverSecondAfterRejecting possibleRecover: ((Element) throws -> Void)? = nil
) rethrows where S.Element == Element {
    var iterator = sequence.makeIterator()
    guard let first = iterator.next() else { return nil }

    if let recover = possibleRecover, let second = iterator.next() {
        try recover(second) // Error here
        return nil
    } else {
        self.init(first)
    }
}

This doesn't quite work. I think the error on the marked line:

Call can throw, but the error is not handled; a function declared 'rethrows' may only throw if its parameter does

which looks contradictory, is due to using Xcode 12 beta 4 on a Big Sur (public) beta 5 system. I had to use this workaround in my macOS playground:

    if possibleRecover != nil, let second = iterator.next() {
        try possibleRecover!(second)
        return nil
    } else {
        self.init(first)
    }

But maybe it'll be easier to never look past the first element, leaving it up to the user if that's important enough to check. (But you can't because there's no way to get the updated iterator. Unless I change the interface again to take an inout iterator parameter.)

Ugh,... fine:

extension CollectionOfOne {

    /// Creates an instance containing just the first element extracted from the
    /// given iterator.
    ///
    /// - Parameters:
    ///   - iterator: The source of the element to store in the collection.
    /// - Postcondition: `first!` is the same as the first element from
    ///   `iterator`.  Fails if `iterator` had no more elements; otherwise, that
    ///   element is removed from the iterator's vended sequence.
    public init?<I: IteratorProtocol>(
        firstFrom iterator: inout I
    ) where I.Element == Element {
        guard let first = iterator.next() else { return nil }

        self.init(first)
    }

    /// Creates an instance containing just the first element from the given
    /// sequence.
    ///
    /// If it is required to know if the extracted element was the sole element
    /// from `sequence`, use the iterator-based initializer with an iterator
    /// made from this sequence, and test the iterator for a second element
    /// afterwards.
    ///
    /// - Parameters:
    ///   - sequence: The source of the element to store in the collection.
    /// - Postcondition: `first!` is the same as the first element from
    ///   `sequence`.  Fails if `sequence` was empty.
    public init?<S: Sequence>(
        firstFrom sequence: S
    ) where S.Element == Element {
        var iterator = sequence.makeIterator()
        guard let protoSelf = Self(firstFrom: &iterator) else { return nil }

        self = protoSelf
    }

}

There isn't much leeway when dealing with Sequence. Lazy wrapping and consuming them is about all that you can do. Users should expect single-pass Sequence to be consumed/invalidated when passed around anyway.

Now you may just document that the function consumes exactly 2 elements, or just return the consumed-but-unused ones. Though it's quite a hassle from the user perspective either way.

Yeah I think Sequence is a bit of a useless protocol - it doesn't model the thing it is supposed to model (single-pass collections, with no notion of position) very well, and as it happens, the thing "above" it in the collection hierarchy (IteratorProtocol) models that same thing much better. Multi-pass sequences always have some notion of position, so Collection is a much better fit.

The reason Sequence exists is so that you can write for loops over both single and multi-pass sequences. Personally, I have no idea why that was felt to be an important feature in the language, and my rule of thumb is actually to avoid it entirely for generic code: if you can, use Collection, otherwise, use an inout IteratorProtocol.

Consider a super-simple snippet like sequence.prefix(2); sequence.prefix(2) (reading the first 2 elements, twice). This could yield [1, 2], [1, 2], or it could yield [1, 2, 3, 4] depending on implementation details your generic algorithm can't possibly know. Any other place in the collection hierarchy gives you consistent results - iterators always consume their elements (so reading the first 2 elements, twice, always yields [1, 2], [3, 4]), and accessing Collection elements by their indexes never consumes them (so the same operation always yields [1, 2], [1, 2]).

1 Like

The sequence stuff needs to be improved... with more protocols! Although IteratorProtocol is the protocol that does the visitation work, it doesn't have an inspection interface, nor a non-mutable one. That stuff is supposed to be done through the partner Sequence type. There's also the problem that Sequence is over-specified; it requires that its elements' publication order is part of its semantic, although for-in doesn't require that at all. A for-in loop only require's the source to publish its elements exactly once; technically, the order doesn't even have to be stable between runs.

I mentioned it in previous posts, but I imagine an improved hierarchy to be:

  • IteratorProtocol stays the same (for now).
  • Collective is the current Sequence except for ripping out the parts that assume the publication order is part of a conforming type's semantics.
  • PersistentCollective: Collective is a marker-only protocol for a multiple-pass object; the iteration states of every returned iterator cannot interfere with each other or the source collective.
  • Sequence: Collective keeps its current interface; it's a marker-only protocol with the addition of the extension methods that assume linear access.
  • typealias PersistentSequence = PersistentCollective & Sequence.
  • IndexedCollective: PersistentCollective lets you access any element with a token, without going through an arbitrary number of other elements first.
  • Collection: PersistentSequence, IndexedCollective.
  • MutableCollective: IndexedCollective adds mutable access to elements.
  • MutableCollection: Collection, MutableCollective.
  • BidirectionalCollection, RandomAccessCollection, and RangeReplaceableCollection are the same.

Note that adding intermediate protocols breaks ABI stability, so we have to wait until ABI Version 2 or retroactive base protocols.

I don't see anything in the documentation which requires any particular iteration order.

I would draw your attention to this part, however:

Repeated Access

The Sequence protocol makes no requirement on conforming types regarding whether they will be destructively consumed by iteration. As a consequence, don’t assume that multiple for - in loops on a sequence will either resume iteration or restart from the beginning:

for element in sequence {
   if ... some condition { break }
}
for element in sequence {
   // No defined behavior
}

In this case, you cannot assume either that a sequence will be consumable and will resume iteration, or that a sequence is a collection and will restart iteration from the first element. A conforming sequence that is not a collection is allowed to produce an arbitrary sequence of elements in the second for - in loop.

The problem isn't over-specification; it's the opposite. You should always be very cautious when you see the phrase "No defined behaviour".

(Note: this isn't C-style UB with nasal demons and the like: only 1 of 2 possible outcomes may occur for a valid conformance - either iteration will consume, or it won't, but which of those outcomes actually happens is not knowable from generic code. It might be better to call it "Unpredictable behaviour").
(Edit: Or is it? I'm not sure. That "arbitrary sequence of elements in the second for loop" bit would imply that more than those 2 outcomes may occur. Hmm... :thinking:. Could it produce bogus object pointers? What if it was just a sequence of plain integers which your loop interprets as memory addresses? I guess it could lead to C-style UB).

The thing about this documentation is that it is too high-level: to really grasp what this means, you need to go back to the first sentence and remember that this applies to all iteration, and that iteration is the only way to get the Sequence's elements (every method, like .prefix or .first(where:), uses iteration, so they are interchangeable with the for loops in the example). That's important because it's quite rare to write multiple consecutive for loops over the same Sequence. If you just read the above example, it might not be clear how this affects you and your code (not you personally, of course - Swift developers in general).

Essentially what this reduces to is that you can perform one and only one read operation on a Sequence, and every subsequent operation returns :man_shrugging:. Either you call makeIterator() and read via that iterator, or you call something like .first(where:), but not both. Otherwise the result is not predictable. If you pass the sequence in to a function, you have to assume that it might iterate it unless it is guaranteed not to (even logging/debugging code might iterate the sequence). Similarly, once you've done your read and possibly consumed the sequence (or not - who knows?), you must not pass that Sequence as an argument to another function - otherwise their results also become unpredictable ("A conforming sequence ... is allowed to produce an arbitrary sequence of elements in the second for - in loop.").

All of which leads me to say that Sequence is useless for writing generic algorithms. It might make a convenient entry-point, but all you should do in that function is get an iterator and pass that to your real implementation. Treat them as yucky, slimy things and touch them as little as you possibly can.

The underlying issue with Sequence is that it's a hack. It exists for language convenience - using for loops for single-pass sequences and allowing Collections easy access to algorithms written in terms of single-pass sequences. But from a conceptual/modelling perspective? It's all over the place. It achieves abstraction over single- and multi-pass sequences by just not defining its behaviour. A better design would likely require new (magic?) language features, but the current model is not fit for purpose IMHO.

I agree that Sequence does not allow to write many generic algorithms that work for all sequences. But there are many generic algorithms that accept quite a lot of sequences. For example, I'm happy I'm able to write infiniteSequence.first(where: { ... }), when I know for sure that this loop will terminate (because I know how my program works better than the standard library does).

Sequence is a weak protocol, but a strong ergonomic asset.

To be clear: Sequence is over-specified in that it insists on the publication order is part of an instance's semantics, although the for-in loop doesn't require that. It's under-specified in terms of your point of repeated access.

I came in around the start of Swift 3. As soon as I read that part of the Sequence docs, I thought that this was for a job for a marker protocol between Sequence and Collection that represents the multi-pass concept. It's all about semantics and adds no additional customization points (just like RandomAccessCollection doesn't over BidirectionalCollection). This would be the PersistentSequence protocol in my list. Were marker protocols not possible in older versions of Swift?