Count(of:) and contains(other:) for Collection

Having useless methods on unordered collections is senseless, and since the Standard Library doesn't have an appropriate protocol for ordered collections, it is best to place those methods in BidirectionalCollection, to which all ordered collections of the Standard Library currently conform. That is at least due to the fact that a bidirectional collection has to be ordered. Hence, regarding your second statement – it is very hard to believe Set or Dictionary will become bidirectional.
Sequence is not a choice not only for this very reason and the absence of count – it is undesirable to load hierarchically high-level protocols with extended functionality.

The "too bad, so sad" consequences for single pass sequences are at least sensible for the single element contains because that method only needs to visit each element in the sequence at most once. So the bad stuff in that case is that you might consume the entire sequence.

For the proposed additions, you can (potentially) visit each element more than once. Which may not be possible for all sequences.

That is a good point. In other words, contains(element:) is O(n) with a constant <= 1, the proposed methods are O(mn).

The problem is that you’re punishing all forward-only Collections to make sure Dictionary and Set don’t get the interface. That’s a price too high (unless the algorithm really needs two-way iteration). Design against Murphy, not Machiavelli; and this is deep in Machiavelli territory.

What’s the harm in letting Dictionary and Set have those methods? There is no (real) consequences. No one would be forcing you to call them on your objects.

2 Likes

I think it's reasonable to assume that a method that checks whether a sequence contains a possible subsequence should be added on Sequence.

Especially since Sequence already has starts(with: Sequence) and our new contains(Sequence) would just be a more general variant of that, checking for the possible subsequence at every position instead of only the first.

So, assuming we add contains(Sequence) to Sequence

(click to see implementation)
extension Sequence where Element: Equatable {
    /// Returns a Boolean value that indicates whether the sequence contains
    /// the given subsequence.
    ///
    /// - Parameter possibleSubsequence: The subsequence to find within the
    ///   the sequence. `possibleSubsequence` must be finite.
    /// - Returns: `true` if the sequence contains `possibleSubsequence`;
    ///   otherwise, `false`.
    public func contains<S>(_ possibleSubsequence: S) -> Bool
        where Element == S.Element, S : Sequence
    {
        // Pardon my quick and dirty implementation:
        var myIter = self.makeIterator()
        var subsIter = possibleSubsequence.makeIterator()
        guard let subsFirst = subsIter.next() else { return false }
        while let myNext = myIter.next() {
            if myNext == subsFirst {
                while true {
                    let myNext = myIter.next()
                    let subsNext = subsIter.next()
                    if myNext != subsNext {
                        if myNext == nil { return false }
                        if subsNext == nil { return true }
                        break
                    } else {
                        if myNext == nil { return true }
                    }
                }
                subsIter = possibleSubsequence.makeIterator()
                let _ = subsIter.next()
            }
        }
        return false
    }
}

… we can now do this:

let a = [1, 2, 3, 4, 5]
let b = [3, 4]
print(a.contains(b)) // true

But note that we can also do this:

let a = [1, 2, 3, 4, 5] as Set
let b = [3, 4] as Set
print(a.contains(b)) // false or true

I guess that the behavior of this last example is not in accordance with what most people would expect, from simply reading the code. Yet, we have done nothing wrong except forgetting that an unordered set is a sequence.

(Note that set a may or may not contain set b, depending on the order that their elements happens to be in, and this will be different between runs in recent (development snapshot) versions of the compiler.)

After a while I agree with @CTMacUser about generalizing for Collection. After all it isn't the consumer's problem that the Standard Library has this so far small issue with useless methods for some sequences. We must respect all collections, and besides, contributing to the aforementioned issue isn't a bad thing to do either.

But as you say, then comes the question of whether to further generalize for Sequence. First of all, it is worth mentioning both methods, even in their most optimized state, require the ability to non-destructively traverse a sequence multiple times. I wouldn't be punishing single-pass sequences in this case, since the methods wouldn't work at all. Though we might be punishing infinite sequences and other sequences that aren't collections.

Also, we must not forget that Sequence is a top-level abstraction and must contain minimum functionality to appropriately generalize and carry it's role. Loading Sequence with extended functionality or setting a precedent isn't desirable.

By the way, the illusive similarity of contains and isSubset might become a typical misusage case.

I think that already happened... please look for yourself instead of blindly believing my numbers (I didn't count very diligent ;-), but out of the 44 methods of Sequence, only five are actually credible for every Sequence:
Besides those with the problems of order, there are many methods that don't work for infinite sequences (although that's not really true -- some may work great for several minutes ;-)

Yes, this is (part of) my point.

The current design (Set, Dictionary, Sequence, etc) is a recipe for problems like this and eg elementsEqual.

Though take note the order-sensitivity of the methods in question has already been discussed here earlier.

@Tino I won't count, but please keep in mind I am aware that there is an issue. You've also voiced that quite a couple of times in the set & sequence thread. Let's prevent another similar discussion from spawning here.

I agree that the Set-as-Sequence discussion would be better confined to the other thread. To me, the problem here is again one of naming. You're not actually asking if the Sequence/Collection contains the elements of another sequence, which could be interpreted in many different ways, you're asking if the other Sequence is a subsequence of this Sequence. Note that this isn't just a problem for Set and Dictionary, because a reasonable interpretation for an Array might be that it contains at least one of each of the elements from other, e.g. [1, 2, 3, 4].contains([3, 1]) == true. So a name like containsSubsequence or sequentiallyContains (the outcome of the proposal to rename elementsEqual will provide some guidance here) makes much more sense than contains to me, which is inherently ambiguous.

Edit: Oh, and I think these would be useful methods to add, given they have correct names and it is decided how overlapping subsequences are handled in count(of:).

1 Like

If you mean the way we provide this option, currently I consider passing a parameter the most sensible choice.

Concerning the naming, perhaps you should address the contains(other:) method on StringProtocol, which is a special case of the proposed method for strings.

Right, I misspoke above. This is not a subsequence, it's a substring, which is a name that happens to work for both. A subsequence is a different concept entirely, and could also be provided, but it's much less useful in general. So this perhaps points to the name hasSubstring or similar, because I now realise “contains” is not particularly helpful here at all.

The existing contains(_:), kind of works for two Strings if you don't think about their Collection conformance, but it doesn't seem helpful for a Sequence generalisation. Looking into it, this isn't even actually a Swift String method, it's part of the NSString overlay and only available if you import Foundation. So I don't think we have to care much about matching the name.

"test".contains("es") // error: cannot convert value of type 'String' to expected argument type 'Character'

Edit: For anyone wondering about the different between a substring and a subsequence, the latter is a superset of the former which doesn't have to be contiguous. e.g. <2, 3, 5> is a subsequence of <1, 2, 3, 4, 5, 6>.

I do think we need to consider that StringProtocol will inherit the proposed methods and, in particular, the consistency of the existing contains and the introduced. The best option, for what it's worth, is for the existing contains to be part of what is proposed.

Substring though is String's subsequence, and is a special case of what you mention. You are right about subsequences and I have tweaked the proposed methods to accept Self, since only contiguous matches are meant to be considered. Thanks for pointing that out.

To be clear, the existing contains(_:) in Swift takes a single element, so that name seems fine and unambiguous for that function. The other contains(_:), for two StringProtocols, doesn't seem very relevant to me, since it's only available in Foundation and Swift has been generally happy to break with the NSString names for things. I'm not sure which of these two you are referring to when you say “the existing contains”.

Good point, I guess SubSequence is already (slightly incorrectly) named in Swift as being contiguous, so it's fine to use hasSubsequence/hasSubSequence or similar for this function. I'm not sure that this requires that the methods only accept Self, though. Why is that the case?

I am referring to the existing contains(other:). What I am trying to emphasize is that the proposed contains will be inherited by StringProtocol and thus become a method of, among others, NSString. It's either having a redundant method, merging it or replacing it – precisely where the naming question arises. Aside from my opinions on the naming, if we do not want redundant methods, we have to rename the existing contains to match the introduced, which is source-breaking (if the names are different).

Consider the following example. It is safe to assume that in the majority of cases, people will use sequences and expect the method to accept sequences. Subsequences for the purpose of the method simply aren't convenient. Furthermore, we need to account for non-contiguous matches if we accept subsequences. In other words, a method that accepts a subsequence is a separate method.

let array1 = [2, 1, 1]

print([2, 3, 1, 2, 1, 1].contains(array1)) 
// This wouldn't be possible if we use subsequences as an argument.
// array1 would have to be converted to ArraySlice<Int>

The redundant method name is not part of the Swift standard library, is my point, so I don't see the concern in providing this under a different name. There are many existing methods on NSString which are redundant and named entirely differently to the same functionality for String.

SubSequence in Swift is defined to be contiguous, so why would we need to account for non-contiguous matches? I think there's something fundamental that I'm not understanding here. It's fine to provide a faster implementation when the Sequence types match, or when you have certain Collection types, etc, but shouldn't this method be implementable even for entirely unrelated Sequence types (as long as they have the same Element type), even if the other Sequence needs to be first converted into an Array or something to ensure it is multipass?

You could choose contains(subsequence:) and rename the String method to the same. That way there would be no confusion and it would be consistently named.

I would appreciate an example.

Yes, you're right. But for Collection, not Sequence. Still, subsequences are very inconvenient in this case due to their types.

This is to avoid usage on semantically incompatible sequences. Rarely such functions are used in the context of different types. But more importantly, my goal is not to create opportunities for unsafe code and unexpected results by loosing the method's constraints.

let set: Set<Int>  = [1, 2, 3]

[1, 2, 3, 4].contains(set)

let array = [(0, 0), (1, 1), (2, 2)]

[0 : 0, 1 : 2, 3 : 4].contains(array)

These are just standard types. Who knows what kind of sequences exist in the code base.

Here's a quick skim of some of the NSString API from command-clicking it in Swift, because I'm not that familiar with it myself (I may have missed better matches):

NSString String Notes
var length var count Former is the number of UTF-16 code units, i.e. .utf16.count
character(at:) subscript(String.Index) Former returns the UTF-16 code unit
Various functions like getCharacters(_:range:), rangeOfComposedCharacterSequences(for:) subscript(Range<String.Index>)
compare(_:) ==, <, etc
components(separatedBy:) split(separator:maxSplits:omittingEmptySubsequences:)

I ran out of time here because it's quite hard to compare (plus I haven't been careful to check what is hidden in the overlay). But I would say generally that I don't think there's a desire to match NSString when it doesn't make sense. I think the contains(subsequence:) that @hlovatt mentions is an okay spelling if you think hasSubsequence is too radical.

I'm not sure this is true (and I'm going to ignore your Set and Dictionary examples because these sort of arguments should happen in the other thread). @xwu gave the example in the thread of working out how many times a particular sequence of bytes appears in Data, and it could be natural to provide the subsequence you're searching for as an Array. I might make an infinite sequence of even numbers using AnySequence and check if it contains [10, 12]. I might have done some processing of an Array of Ints on a lazy view, and now want to check if it contains a subsequence without having to somehow construct a matching LazyFilterCollection<LazyMapCollection<Array<Int>, Int>>. In general, people want to be able to consider two different Sequence types with the same Element type as compatible for operations like these (see the prior art in much of the Swift API, e.g. the existing starts(with:) which is very similar to your contains and also available on String, etc), and there's nothing semantically incompatible about such Sequences.

But these aren't redundancies, they exist separately. I just realized NSString doesn't conform to StringProtocol. So what we have left (if they are named differently) is two methods for the same purpose on String and Substring. I know one of them is part of the NSString compatibility stuff, but I would really like to avoid this redundancy without creating source-breaking changes. We already have a to-be-fixed precedent with split and components as a result of String becoming a collection again. And I think the core team will agree on this. It would allow to seamlessly remove the implementation from Foundation.

Hm. I must admit you have a point. Perhaps it's the consumer's responsibility to sensibly use a method. The worst that could happen is an unexpected result. This is similar to how Daryle made me overlook limiting the methods to BidirectionalCollection. Thank you for bringing this up. I will edit the thread back to how it was in favor of the generic approach and further think this over.