Having useless methods on unordered collections is senseless, and since the Standard Library doesn't have an appropriate protocol for ordered collections, it is best to place those methods in BidirectionalCollection
, to which all ordered collections of the Standard Library currently conform. That is at least due to the fact that a bidirectional collection has to be ordered. Hence, regarding your second statement – it is very hard to believe Set
or Dictionary
will become bidirectional.
Sequence
is not a choice not only for this very reason and the absence of count
– it is undesirable to load hierarchically high-level protocols with extended functionality.
The "too bad, so sad" consequences for single pass sequences are at least sensible for the single element contains
because that method only needs to visit each element in the sequence at most once. So the bad stuff in that case is that you might consume the entire sequence.
For the proposed additions, you can (potentially) visit each element more than once. Which may not be possible for all sequences.
That is a good point. In other words, contains(element:)
is O(n) with a constant <= 1
, the proposed methods are O(mn).
The problem is that you’re punishing all forward-only Collection
s to make sure Dictionary
and Set
don’t get the interface. That’s a price too high (unless the algorithm really needs two-way iteration). Design against Murphy, not Machiavelli; and this is deep in Machiavelli territory.
What’s the harm in letting Dictionary
and Set
have those methods? There is no (real) consequences. No one would be forcing you to call them on your objects.
I think it's reasonable to assume that a method that checks whether a sequence contains a possible subsequence should be added on Sequence
.
Especially since Sequence
already has starts(with: Sequence)
and our new contains(Sequence)
would just be a more general variant of that, checking for the possible subsequence at every position instead of only the first.
So, assuming we add contains(Sequence)
to Sequence
…
(click to see implementation)
extension Sequence where Element: Equatable {
/// Returns a Boolean value that indicates whether the sequence contains
/// the given subsequence.
///
/// - Parameter possibleSubsequence: The subsequence to find within the
/// the sequence. `possibleSubsequence` must be finite.
/// - Returns: `true` if the sequence contains `possibleSubsequence`;
/// otherwise, `false`.
public func contains<S>(_ possibleSubsequence: S) -> Bool
where Element == S.Element, S : Sequence
{
// Pardon my quick and dirty implementation:
var myIter = self.makeIterator()
var subsIter = possibleSubsequence.makeIterator()
guard let subsFirst = subsIter.next() else { return false }
while let myNext = myIter.next() {
if myNext == subsFirst {
while true {
let myNext = myIter.next()
let subsNext = subsIter.next()
if myNext != subsNext {
if myNext == nil { return false }
if subsNext == nil { return true }
break
} else {
if myNext == nil { return true }
}
}
subsIter = possibleSubsequence.makeIterator()
let _ = subsIter.next()
}
}
return false
}
}
… we can now do this:
let a = [1, 2, 3, 4, 5]
let b = [3, 4]
print(a.contains(b)) // true
But note that we can also do this:
let a = [1, 2, 3, 4, 5] as Set
let b = [3, 4] as Set
print(a.contains(b)) // false or true
I guess that the behavior of this last example is not in accordance with what most people would expect, from simply reading the code. Yet, we have done nothing wrong except forgetting that an unordered set is a sequence.
(Note that set a
may or may not contain set b
, depending on the order that their elements happens to be in, and this will be different between runs in recent (development snapshot) versions of the compiler.)
After a while I agree with @CTMacUser about generalizing for Collection
. After all it isn't the consumer's problem that the Standard Library has this so far small issue with useless methods for some sequences. We must respect all collections, and besides, contributing to the aforementioned issue isn't a bad thing to do either.
But as you say, then comes the question of whether to further generalize for Sequence
. First of all, it is worth mentioning both methods, even in their most optimized state, require the ability to non-destructively traverse a sequence multiple times. I wouldn't be punishing single-pass sequences in this case, since the methods wouldn't work at all. Though we might be punishing infinite sequences and other sequences that aren't collections.
Also, we must not forget that Sequence
is a top-level abstraction and must contain minimum functionality to appropriately generalize and carry it's role. Loading Sequence
with extended functionality or setting a precedent isn't desirable.
By the way, the illusive similarity of contains
and isSubset
might become a typical misusage case.
I think that already happened... please look for yourself instead of blindly believing my numbers (I didn't count very diligent ;-), but out of the 44 methods of Sequence
, only five are actually credible for every Sequence
:
Besides those with the problems of order, there are many methods that don't work for infinite sequences (although that's not really true -- some may work great for several minutes ;-)
Yes, this is (part of) my point.
The current design (Set, Dictionary, Sequence, etc) is a recipe for problems like this and eg elementsEqual.
Though take note the order-sensitivity of the methods in question has already been discussed here earlier.
@Tino I won't count, but please keep in mind I am aware that there is an issue. You've also voiced that quite a couple of times in the set & sequence thread. Let's prevent another similar discussion from spawning here.
I agree that the Set
-as-Sequence
discussion would be better confined to the other thread. To me, the problem here is again one of naming. You're not actually asking if the Sequence
/Collection
contains the elements of another sequence, which could be interpreted in many different ways, you're asking if the other Sequence
is a subsequence of this Sequence
. Note that this isn't just a problem for Set
and Dictionary
, because a reasonable interpretation for an Array
might be that it contains at least one of each of the elements from other
, e.g. [1, 2, 3, 4].contains([3, 1]) == true
. So a name like containsSubsequence
or sequentiallyContains
(the outcome of the proposal to rename elementsEqual
will provide some guidance here) makes much more sense than contains
to me, which is inherently ambiguous.
Edit: Oh, and I think these would be useful methods to add, given they have correct names and it is decided how overlapping subsequences are handled in count(of:)
.
If you mean the way we provide this option, currently I consider passing a parameter the most sensible choice.
Concerning the naming, perhaps you should address the contains(other:)
method on StringProtocol
, which is a special case of the proposed method for strings.
Right, I misspoke above. This is not a subsequence, it's a substring, which is a name that happens to work for both. A subsequence is a different concept entirely, and could also be provided, but it's much less useful in general. So this perhaps points to the name hasSubstring
or similar, because I now realise “contains” is not particularly helpful here at all.
The existing contains(_:)
, kind of works for two String
s if you don't think about their Collection
conformance, but it doesn't seem helpful for a Sequence
generalisation. Looking into it, this isn't even actually a Swift String
method, it's part of the NSString
overlay and only available if you import Foundation
. So I don't think we have to care much about matching the name.
"test".contains("es") // error: cannot convert value of type 'String' to expected argument type 'Character'
Edit: For anyone wondering about the different between a substring and a subsequence, the latter is a superset of the former which doesn't have to be contiguous. e.g. <2, 3, 5>
is a subsequence of <1, 2, 3, 4, 5, 6>
.
I do think we need to consider that StringProtocol
will inherit the proposed methods and, in particular, the consistency of the existing contains
and the introduced. The best option, for what it's worth, is for the existing contains
to be part of what is proposed.
Substring
though is String
's subsequence, and is a special case of what you mention. You are right about subsequences and I have tweaked the proposed methods to accept Self
, since only contiguous matches are meant to be considered. Thanks for pointing that out.
To be clear, the existing contains(_:)
in Swift takes a single element, so that name seems fine and unambiguous for that function. The other contains(_:)
, for two StringProtocol
s, doesn't seem very relevant to me, since it's only available in Foundation
and Swift has been generally happy to break with the NSString
names for things. I'm not sure which of these two you are referring to when you say “the existing contains
”.
Good point, I guess SubSequence
is already (slightly incorrectly) named in Swift as being contiguous, so it's fine to use hasSubsequence
/hasSubSequence
or similar for this function. I'm not sure that this requires that the methods only accept Self
, though. Why is that the case?
I am referring to the existing contains(other:)
. What I am trying to emphasize is that the proposed contains
will be inherited by StringProtocol
and thus become a method of, among others, NSString
. It's either having a redundant method, merging it or replacing it – precisely where the naming question arises. Aside from my opinions on the naming, if we do not want redundant methods, we have to rename the existing contains
to match the introduced, which is source-breaking (if the names are different).
Consider the following example. It is safe to assume that in the majority of cases, people will use sequences and expect the method to accept sequences. Subsequences for the purpose of the method simply aren't convenient. Furthermore, we need to account for non-contiguous matches if we accept subsequences. In other words, a method that accepts a subsequence is a separate method.
let array1 = [2, 1, 1]
print([2, 3, 1, 2, 1, 1].contains(array1))
// This wouldn't be possible if we use subsequences as an argument.
// array1 would have to be converted to ArraySlice<Int>
The redundant method name is not part of the Swift standard library, is my point, so I don't see the concern in providing this under a different name. There are many existing methods on NSString
which are redundant and named entirely differently to the same functionality for String
.
SubSequence
in Swift is defined to be contiguous, so why would we need to account for non-contiguous matches? I think there's something fundamental that I'm not understanding here. It's fine to provide a faster implementation when the Sequence
types match, or when you have certain Collection
types, etc, but shouldn't this method be implementable even for entirely unrelated Sequence
types (as long as they have the same Element
type), even if the other Sequence
needs to be first converted into an Array
or something to ensure it is multipass?
You could choose contains(subsequence:)
and rename the String
method to the same. That way there would be no confusion and it would be consistently named.
I would appreciate an example.
Yes, you're right. But for Collection
, not Sequence
. Still, subsequences are very inconvenient in this case due to their types.
This is to avoid usage on semantically incompatible sequences. Rarely such functions are used in the context of different types. But more importantly, my goal is not to create opportunities for unsafe code and unexpected results by loosing the method's constraints.
let set: Set<Int> = [1, 2, 3]
[1, 2, 3, 4].contains(set)
let array = [(0, 0), (1, 1), (2, 2)]
[0 : 0, 1 : 2, 3 : 4].contains(array)
These are just standard types. Who knows what kind of sequences exist in the code base.
Here's a quick skim of some of the NSString API from command-clicking it in Swift, because I'm not that familiar with it myself (I may have missed better matches):
NSString | String | Notes |
---|---|---|
var length |
var count |
Former is the number of UTF-16 code units, i.e. .utf16.count |
character(at:) |
subscript(String.Index) |
Former returns the UTF-16 code unit |
Various functions like getCharacters(_:range:) , rangeOfComposedCharacterSequences(for:) |
subscript(Range<String.Index>) |
|
compare(_:) |
== , < , etc |
|
components(separatedBy:) |
split(separator:maxSplits:omittingEmptySubsequences:) |
I ran out of time here because it's quite hard to compare (plus I haven't been careful to check what is hidden in the overlay). But I would say generally that I don't think there's a desire to match NSString
when it doesn't make sense. I think the contains(subsequence:)
that @hlovatt mentions is an okay spelling if you think hasSubsequence
is too radical.
I'm not sure this is true (and I'm going to ignore your Set
and Dictionary
examples because these sort of arguments should happen in the other thread). @xwu gave the example in the thread of working out how many times a particular sequence of bytes appears in Data
, and it could be natural to provide the subsequence you're searching for as an Array
. I might make an infinite sequence of even numbers using AnySequence
and check if it contains [10, 12]
. I might have done some processing of an Array
of Int
s on a lazy view, and now want to check if it contains a subsequence without having to somehow construct a matching LazyFilterCollection<LazyMapCollection<Array<Int>, Int>>
. In general, people want to be able to consider two different Sequence
types with the same Element
type as compatible for operations like these (see the prior art in much of the Swift API, e.g. the existing starts(with:)
which is very similar to your contains
and also available on String
, etc), and there's nothing semantically incompatible about such Sequence
s.
But these aren't redundancies, they exist separately. I just realized NSString
doesn't conform to StringProtocol
. So what we have left (if they are named differently) is two methods for the same purpose on String
and Substring
. I know one of them is part of the NSString
compatibility stuff, but I would really like to avoid this redundancy without creating source-breaking changes. We already have a to-be-fixed precedent with split
and components
as a result of String
becoming a collection again. And I think the core team will agree on this. It would allow to seamlessly remove the implementation from Foundation
.
Hm. I must admit you have a point. Perhaps it's the consumer's responsibility to sensibly use a method. The worst that could happen is an unexpected result. This is similar to how Daryle made me overlook limiting the methods to BidirectionalCollection
. Thank you for bringing this up. I will edit the thread back to how it was in favor of the generic approach and further think this over.