Count(of:) and contains(other:) for Collection

anthonylatsis · April 15, 2018, 5:36pm

After a while I agree with @CTMacUser about generalizing for Collection. After all it isn't the consumer's problem that the Standard Library has this so far small issue with useless methods for some sequences. We must respect all collections, and besides, contributing to the aforementioned issue isn't a bad thing to do either.

But as you say, then comes the question of whether to further generalize for Sequence. First of all, it is worth mentioning both methods, even in their most optimized state, require the ability to non-destructively traverse a sequence multiple times. I wouldn't be punishing single-pass sequences in this case, since the methods wouldn't work at all. Though we might be punishing infinite sequences and other sequences that aren't collections.

Also, we must not forget that Sequence is a top-level abstraction and must contain minimum functionality to appropriately generalize and carry it's role. Loading Sequence with extended functionality or setting a precedent isn't desirable.

By the way, the illusive similarity of contains and isSubset might become a typical misusage case.

Tino · April 15, 2018, 5:49pm

I think that already happened... please look for yourself instead of blindly believing my numbers (I didn't count very diligent ;-), but out of the 44 methods of Sequence, only five are actually credible for every Sequence:
Besides those with the problems of order, there are many methods that don't work for infinite sequences (although that's not really true -- some may work great for several minutes ;-)

Jens · April 15, 2018, 8:06pm

Yes, this is (part of) my point.

The current design (Set, Dictionary, Sequence, etc) is a recipe for problems like this and eg elementsEqual.

anthonylatsis · April 15, 2018, 10:15pm

Though take note the order-sensitivity of the methods in question has already been discussed here earlier.

@Tino I won't count, but please keep in mind I am aware that there is an issue. You've also voiced that quite a couple of times in the set & sequence thread. Let's prevent another similar discussion from spawning here.

jawbroken · April 15, 2018, 11:42pm

I agree that the Set-as-Sequence discussion would be better confined to the other thread. To me, the problem here is again one of naming. You're not actually asking if the Sequence/Collection contains the elements of another sequence, which could be interpreted in many different ways, you're asking if the other Sequence is a subsequence of this Sequence. Note that this isn't just a problem for Set and Dictionary, because a reasonable interpretation for an Array might be that it contains at least one of each of the elements from other, e.g. [1, 2, 3, 4].contains([3, 1]) == true. So a name like containsSubsequence or sequentiallyContains (the outcome of the proposal to rename elementsEqual will provide some guidance here) makes much more sense than contains to me, which is inherently ambiguous.

Edit: Oh, and I think these would be useful methods to add, given they have correct names and it is decided how overlapping subsequences are handled in count(of:).

anthonylatsis · April 16, 2018, 12:02am

If you mean the way we provide this option, currently I consider passing a parameter the most sensible choice.

Concerning the naming, perhaps you should address the contains(other:) method on StringProtocol, which is a special case of the proposed method for strings.

jawbroken · April 16, 2018, 12:46am

Right, I misspoke above. This is not a subsequence, it's a substring, which is a name that happens to work for both. A subsequence is a different concept entirely, and could also be provided, but it's much less useful in general. So this perhaps points to the name hasSubstring or similar, because I now realise “contains” is not particularly helpful here at all.

The existing contains(_:), kind of works for two Strings if you don't think about their Collection conformance, but it doesn't seem helpful for a Sequence generalisation. Looking into it, this isn't even actually a Swift String method, it's part of the NSString overlay and only available if you import Foundation. So I don't think we have to care much about matching the name.

"test".contains("es") // error: cannot convert value of type 'String' to expected argument type 'Character'

Edit: For anyone wondering about the different between a substring and a subsequence, the latter is a superset of the former which doesn't have to be contiguous. e.g. <2, 3, 5> is a subsequence of <1, 2, 3, 4, 5, 6>.

anthonylatsis · April 16, 2018, 1:01am

I do think we need to consider that StringProtocol will inherit the proposed methods and, in particular, the consistency of the existing contains and the introduced. The best option, for what it's worth, is for the existing contains to be part of what is proposed.

Substring though is String's subsequence, and is a special case of what you mention. You are right about subsequences and I have tweaked the proposed methods to accept Self, since only contiguous matches are meant to be considered. Thanks for pointing that out.

jawbroken · April 16, 2018, 1:14am

To be clear, the existing contains(_:) in Swift takes a single element, so that name seems fine and unambiguous for that function. The other contains(_:), for two StringProtocols, doesn't seem very relevant to me, since it's only available in Foundation and Swift has been generally happy to break with the NSString names for things. I'm not sure which of these two you are referring to when you say “the existing contains”.

Good point, I guess SubSequence is already (slightly incorrectly) named in Swift as being contiguous, so it's fine to use hasSubsequence/hasSubSequence or similar for this function. I'm not sure that this requires that the methods only accept Self, though. Why is that the case?

anthonylatsis · April 16, 2018, 1:31am

I am referring to the existing contains(other:). What I am trying to emphasize is that the proposed contains will be inherited by StringProtocol and thus become a method of, among others, NSString. It's either having a redundant method, merging it or replacing it – precisely where the naming question arises. Aside from my opinions on the naming, if we do not want redundant methods, we have to rename the existing contains to match the introduced, which is source-breaking (if the names are different).

Consider the following example. It is safe to assume that in the majority of cases, people will use sequences and expect the method to accept sequences. Subsequences for the purpose of the method simply aren't convenient. Furthermore, we need to account for non-contiguous matches if we accept subsequences. In other words, a method that accepts a subsequence is a separate method.

let array1 = [2, 1, 1]

print([2, 3, 1, 2, 1, 1].contains(array1)) 
// This wouldn't be possible if we use subsequences as an argument.
// array1 would have to be converted to ArraySlice<Int>

jawbroken · April 16, 2018, 1:48am

The redundant method name is not part of the Swift standard library, is my point, so I don't see the concern in providing this under a different name. There are many existing methods on NSString which are redundant and named entirely differently to the same functionality for String.

SubSequence in Swift is defined to be contiguous, so why would we need to account for non-contiguous matches? I think there's something fundamental that I'm not understanding here. It's fine to provide a faster implementation when the Sequence types match, or when you have certain Collection types, etc, but shouldn't this method be implementable even for entirely unrelated Sequence types (as long as they have the same Element type), even if the other Sequence needs to be first converted into an Array or something to ensure it is multipass?

hlovatt · April 16, 2018, 1:51am

You could choose contains(subsequence:) and rename the String method to the same. That way there would be no confusion and it would be consistently named.

anthonylatsis · April 16, 2018, 2:21am

I would appreciate an example.

Yes, you're right. But for Collection, not Sequence. Still, subsequences are very inconvenient in this case due to their types.

This is to avoid usage on semantically incompatible sequences. Rarely such functions are used in the context of different types. But more importantly, my goal is not to create opportunities for unsafe code and unexpected results by loosing the method's constraints.

let set: Set<Int>  = [1, 2, 3]

[1, 2, 3, 4].contains(set)

let array = [(0, 0), (1, 1), (2, 2)]

[0 : 0, 1 : 2, 3 : 4].contains(array)

These are just standard types. Who knows what kind of sequences exist in the code base.

jawbroken · April 16, 2018, 3:11am

Here's a quick skim of some of the NSString API from command-clicking it in Swift, because I'm not that familiar with it myself (I may have missed better matches):

NSString	String	Notes
`var length`	`var count`	Former is the number of UTF-16 code units, i.e. `.utf16.count`
`character(at:)`	`subscript(String.Index)`	Former returns the UTF-16 code unit
Various functions like `getCharacters(_:range:)`, `rangeOfComposedCharacterSequences(for:)`	`subscript(Range<String.Index>)`
`compare(_:)`	`==`, `<`, etc
`components(separatedBy:)`	`split(separator:maxSplits:omittingEmptySubsequences:)`

I ran out of time here because it's quite hard to compare (plus I haven't been careful to check what is hidden in the overlay). But I would say generally that I don't think there's a desire to match NSString when it doesn't make sense. I think the contains(subsequence:) that @hlovatt mentions is an okay spelling if you think hasSubsequence is too radical.

I'm not sure this is true (and I'm going to ignore your Set and Dictionary examples because these sort of arguments should happen in the other thread). @xwu gave the example in the thread of working out how many times a particular sequence of bytes appears in Data, and it could be natural to provide the subsequence you're searching for as an Array. I might make an infinite sequence of even numbers using AnySequence and check if it contains [10, 12]. I might have done some processing of an Array of Ints on a lazy view, and now want to check if it contains a subsequence without having to somehow construct a matching LazyFilterCollection<LazyMapCollection<Array<Int>, Int>>. In general, people want to be able to consider two different Sequence types with the same Element type as compatible for operations like these (see the prior art in much of the Swift API, e.g. the existing starts(with:) which is very similar to your contains and also available on String, etc), and there's nothing semantically incompatible about such Sequences.

anthonylatsis · April 16, 2018, 4:01am

But these aren't redundancies, they exist separately. I just realized NSString doesn't conform to StringProtocol. So what we have left (if they are named differently) is two methods for the same purpose on String and Substring. I know one of them is part of the NSString compatibility stuff, but I would really like to avoid this redundancy without creating source-breaking changes. We already have a to-be-fixed precedent with split and components as a result of String becoming a collection again. And I think the core team will agree on this. It would allow to seamlessly remove the implementation from Foundation.

jawbroken:

I’m not sure this is true (and I’m going to ignore your Set and Dictionary examples because these sort of arguments should happen in the other thread). @xwu gave the example in the thread of working out how many times a particular sequence of bytes appears in Data, and it could be natural to provide the subsequence you’re searching for as an Array. I might make an infinite sequence of even numbers using AnySequence and check if it contains [10, 12]. I might have done some processing of an Array of Ints on a lazy view, and now want to check if it contains a subsequence without having to somehow construct a matching LazyFilterCollection<LazyMapCollection<Array<Int>, Int>>. In general, people want to be able to consider two different Sequence types with the same Element type as compatible for operations like these (see the prior art in much of the Swift API, e.g. the existing starts(with:) which is very similar to your contains and also available on String, etc), and there’s nothing semantically incompatible about such Sequences.

Hm. I must admit you have a point. Perhaps it's the consumer's responsibility to sensibly use a method. The worst that could happen is an unexpected result. This is similar to how Daryle made me overlook limiting the methods to BidirectionalCollection. Thank you for bringing this up. I will edit the thread back to how it was in favor of the generic approach and further think this over.

jawbroken · April 16, 2018, 4:49am

Yes, which goes back to why I think it's very important to get the name right, even if it clashes with this strange contains(_:) method on StringProtocol that only exists when you import Foundation and is probably some accident of history rather than a carefully considered API. I don't want another elementsEqual situation where the name is very misleading, and I think calling this operation contains(_:) is misleading in the exactly the same way.

pyrtsa · April 16, 2018, 6:14am

In fact, either of these names – or anything involving the word subsequence should be reserved to mean the operation that returns true for "abcdef".hasSubsequence("bdf"), see Subsequence - Wikipedia.

jawbroken · April 16, 2018, 8:53am

I mentioned this above, when I said that the correct term is substring, but then retracted it. SubSequence is already defined in Swift to be contiguous so the ship has sailed on that definition a long time ago, and substring would probably be a confusing term in a language which has Strings and Substrings (also contiguous). So subsequence in Swift is definitely not a subsequence in the mathematical sense, and that's incredibly unlikely to change.

Tino · April 16, 2018, 10:50am

I think it's ok for the type to be called Subsequence (after all, it is a subsequence -- the mathematical definition is broader than ours), but when we change a method name to contain a term of art for more clarity, imho that term should be used right.

Also, I think the general (I'm not referring to this specific post here) attitude of "we are Swift - we don't care about math" does not serve us well:
We are not only in contradiction with a subject that may trigger bad memories from the school time of some people; we also violate what people expect after looking up definitions in Wikipedia.

Swift tries not to break expectations people coming from other languages might have, but at the same time, we rarely care if we break with common knowledge.

jawbroken · April 16, 2018, 11:03am

The method name is not being changed, because the method doesn't currently exist. I don't really understand the term of art reference, because I don't recall anyone in here justifying a naming choice by saying it's a term of art. Personally, I'm only saying that the definition of a subsequence in Swift is already long-settled and distributed throughout the language, so that term should be reused for consistency. It might be great to be consistent with the mathematical definition of a subsequence, though I'm not sure how widely known that definition is, but it probably isn't on the table to redefine it now, and the real mathematical term of substring is itself very problematic in a programming language.