Counting occurrences of a substring in a string

aamir-nazir · March 14, 2020, 7:31pm

Given the fact that Swift String class doesn't has any method for counting occurrences of a substring in a string. i.e
Counting occurrences of "Swift" in "Hello Swift Swift" => 2

There are many alternatives available for counting occurrences of substring but I think it would be more intuitive if we add support of counting occurrences of a substring in a string. i.e

func occurences(of substring: String, compareOption: CompareOptions? = nil) -> Int {
	....
}

Lantua · March 14, 2020, 7:40pm

I feel like it could be part of a more general regex api, though NSRegularExpression already exist.

aamir-nazir · March 14, 2020, 7:44pm

Yes true, but it requires few lines of code and not very succinct version. Perhaps adding a specific method for only counting of occurrences using NSRegularExpression API makes more sense.

Nevin · March 14, 2020, 7:59pm

I’m not sure if it needs to be in the standard library, and I’m also not sure why the pitch is limited to String. Wouldn’t the same idea work for any Sequence?

Also, what do you want to happen for overlaps?

"nananana".countOccurrences("nana")
// 2 or 3?

And what about returning an array of indices where each match begins?

aamir-nazir · March 14, 2020, 8:05pm

Regarding overlaps it should works akin to the following method of standard library and its answer will be 2.

func components(separatedBy separator: String) -> [String]

I think it will be a great idea to generalise this pitch to Sequence.

xwu · March 14, 2020, 8:32pm

Wildchild9 · March 15, 2020, 8:50pm

I think that a generic version of this, that counts the occurrences of a subsequence (a collection with the same element type) in a collection, would be helpful. Something along the lines of the following I think would fit the bill:

extension Collection where Element: Equatable {
    
    /// Returns the number of occurences of a subsequence within the collection.
    ///
    /// Any collection with the same element type as `self` may be passed to the
    /// `sequence` parameter. In the following example, both a `String` and a
    /// `Substring`'s occurrences can be counted in a string because they both have
    /// the same element type as `String`,  `Character`.
    ///
    ///     let someString = "abcabcbc"
    ///     let matchString = "abc"  // "abc"
    ///     let matchSubstring = someString.suffix(2)  // "bc"
    ///
    ///     print(someString.occurrences(of: matchString))
    ///     // Prints "2"
    ///
    ///     print(someString.occurrences(of: matchSubstring))
    ///     // Prints "3"
    ///
    /// - Parameter subsequence: The collection whose occurrences of its sequence
    /// of elements, in order, are being counted.
    ///
    ///
    /// - Returns: The number of occurrences of `subsequence` in the collection.
    ///
    /// - Complexity: O(*n*), where *n* is the length of the collection.
    func occurrences<OtherCollection>(of subsequence: OtherCollection) -> Int where OtherCollection: Collection, OtherCollection.Element == Element {
        guard !isEmpty, !subsequence.isEmpty else {
            return 0
        }
        var runningCount = 0
        var currentIndex = startIndex
        var currentMatchIndex = subsequence.startIndex
        var isMatching = false
        while currentIndex < endIndex {
            let element = self[currentIndex]
            let otherElement = subsequence[currentMatchIndex]
            if element == otherElement {
                isMatching = true
                currentMatchIndex = subsequence.index(after: currentMatchIndex)
            }
            currentIndex = self.index(after: currentIndex)
            if isMatching, currentMatchIndex == subsequence.endIndex {
                runningCount += 1
                isMatching = false
                currentMatchIndex = subsequence.startIndex
            }
        }
        return runningCount
    }
}

Lantua · March 15, 2020, 9:05pm

Of course, the question is whether we want subsequence or substring, or both.

Wildchild9 · March 15, 2020, 9:56pm

I feel like both would be the way to go, like in the generic method that I showed, because why limit yourself when you could use any collection with the same element type as the initial collection.

Also, I think that a generic method should also probably include an overlaps parameter to allow for overlapped matches to be counted. Here is an altered version of my generic implementation with an overlaps argument.

extension Collection where Element: Equatable {
    
    /// Returns the number of occurences of a subsequence within the collection.
    ///
    /// Any collection with the same element type as `self` may be passed to the
    /// `sequence` parameter. In the following example, both a `String` and a
    /// `Substring`'s occurrences can be counted in a string because they both have
    /// the same element type as `String`,  `Character`.
    ///
    ///     let someString = "abcabcbc"
    ///     let matchString = "abc"  // "abc"
    ///     let matchSubstring = someString.suffix(2)  // "bc"
    ///
    ///     print(someString.occurrences(of: matchString))
    ///     // Prints "2"
    ///
    ///     print(someString.occurrences(of: matchSubstring))
    ///     // Prints "3"
    ///
    /// In some cases, a subsequence may have overlapping occurrences in the collection. The
    /// following example shows the effect of the `overlaps` parameter when counting occurrences
    /// of a subsequence in a collection that contains overlapping occurrences of said subsequence.
    ///
    ///     let source = "bbbabbabbb"
    ///     let match = "bb"
    ///
    ///     // Overlapping matches
    ///     print(source.occurrences(of: match, overlaps: true))
    ///     // Prints "5"
    ///
    ///     // Non-overlapping matches
    ///     print(source.occurrences(of: match))
    ///     // Prints "3"
    ///
    /// - Parameter subsequence: The collection whose occurrences of its sequence
    /// of elements, in order, are being counted.
    /// - Parameter overlaps: A Boolean value indicating whether overlapping occurrences
    /// of `subsequence` will be counted as separate occurrences. Default value is `false`.
    ///
    /// - Returns: The number of occurrences of `subsequence` in the collection.
    ///
    /// - Complexity:
    ///    * If `overlaps` is `false`: O(*n*), where *n* is the length of the
    ///      collection.
    ///    * If `overlaps` is `true`:  O(*nm*) where *n* is the length of the collection and *m*
    ///      is the length of the subsequence.
    func occurrences<OtherCollection>(of subsequence: OtherCollection, overlaps: Bool = false) -> Int where OtherCollection: Collection, OtherCollection.Element == Element {
        guard !isEmpty, !subsequence.isEmpty, subsequence.count <= count else {
            return 0
        }
        var runningCount = 0
        var currentIndex = startIndex
        var currentOffset = 0
        var currentMatchIndices = [subsequence.startIndex]
        repeat {
            let element = self[currentIndex]
            for (i, matchIndex) in currentMatchIndices.enumerated().reversed() {
                let otherElement = subsequence[matchIndex]
                if element == otherElement {
                    let nextMatchIndex = subsequence.index(after: matchIndex)
                    if nextMatchIndex == subsequence.endIndex {
                        runningCount += 1
                        currentMatchIndices.remove(at: i)
                    } else {
                        currentMatchIndices[i] = nextMatchIndex
                    }
                } else {
                    currentMatchIndices.remove(at: i)
                }
            } while currentIndex < endIndex 
            currentIndex = self.index(after: currentIndex)
            currentOffset += 1
            if overlaps, count - currentOffset >= subsequence.count {
                currentMatchIndices.append(subsequence.startIndex)
            }
        }
        return runningCount
    }
}

Lantua · March 15, 2020, 10:35pm

Just a few things:

The outer loop could use for-in for more Swifty code.
The second algorithm should be O(nm), not O(n).
Your first algorithm is maximally counting non-overlapping subsequences. The second one is counting substrings.

You’re using the same readme for both of them, so I’m not sure if you intend to have different algorithms, or not.
- If you meant to have two algorithms, maybe we should name them differently to avoid confusion?

Wildchild9 · March 16, 2020, 1:46am

I don't understand the difference you are getting at between subsequences and substrings. Substring is String.SubSequence. Substrings are subsequences of strings. Also how is the overlapping occurrences algorithm O(nm)?

Also, bikeshedding some alternative names:

We could use elements instead of subsequence for the first parameter
We could use omittingOverlappingOccurrences instead of the overlaps parameter
We could call the method count(of:omittingOverlappingOccurrences:) instead of occurrences

Wildchild9 · March 16, 2020, 2:18am

Actually I understand why the latter algorithm is O(nm), I'll update the function.

Lantua · March 16, 2020, 6:08am

See both wikis. Subsequence can be non-consecutive. Substring (not String.Substring) needs to be consecutive.

Note: Some use subarray as a more generic substring. Maybe we can switch to that.

xwu · March 16, 2020, 11:56am

No, that’s incorrect:

https://developer.apple.com/documentation/swift/collection/1641276-subsequence

https://developer.apple.com/documentation/swift/string/subsequence

cukr · March 16, 2020, 12:06pm

That's not the subsequence Lantua is talking about. Check out the posted wiki links

Lantua · March 16, 2020, 12:26pm

Yes, thank you.

Vikram · February 1, 2023, 10:59am

   var arr=Array(password)
   var consecutiveOccurence = 3             // Number of consecutive occurence u want to find
    var l=0
    var h=1
    
 var res = 0
    while(h<arr.count){

        if(arr[l]==arr[h]){
            if(arr[l]==arr[h+1]){
                res+=1
                l=l+consecutiveOccurence                 
               h=h+consecutiveOccurence
            }
            else{
                l=h
                h=h+(consecutiveOccurence - 1)
            }
        }
 else{
           l+=1
           h+=1
        }
     }

print(res)