Search for substr from position

Can I search for a substring starting from a specific position:

firstIndex(of: String, at: String.Index) -> String.Index?

Like this in C++:

// search from position 5
string const s = "This is a string";
string::size_type n = s.find("is", 5);

or in Go:

s := "This is a string"
n := strings.Index(s[5:], "is")

I don't think there's any such function in the standard library, not even an O(nm) one. Closest I can think of is to use NSRegularExpression.

There are some NSString searching methods in Foundation, but I donā€™t know if theyā€™ll fit your needs.

If you decide to write your own, here are some thoughts:

When the string youā€™re searching for is short, the naive algorithm is fast. When itā€™s long, youā€™re better off with a linear-time algorithm.

If youā€™re searching for whole words, you can skip ahead to the start of the next word each time thereā€™s a mismatch.

Thereā€™s a whole body of research on string search algorithms (see, eg. the wiki article or this list or this Stack Overflow post).

Iā€™m not an expert in the field, but my understanding is that the two-way algorithm is held in high regard.

ā€¢ ā€¢ ā€¢

If this functionality were ever added to Swift, I suspect it would be written as a method on Collection, with a signature like so:

extension Collection where Element: Equatable {
  func firstRange<C: Collection>(of target: C) -> Range<Index>?
    where C.Element == Element
  {
    // Insert implementation here
  }
}

For optimizability, it would probably also be a protocol requirement of Collection, so that eg. BidirectionalCollection could provide its own implementation.

1 Like

You may be interested in the following proposed addition to Swift Algorithms.

With that, the requested functionality is spelled:

s.dropFirst(5).firstRange(of: "is")
6 Likes

I found that @SDGGiesbrecht's solution is nice.
As for the pitfalls, if I keep using value of type String.Index for the as: argument, will it be all good? (Integer value cannot be passed as argument type String.Index, right?)

var s = "This also is a string"
var sep = "is"
var i1 = s.firstIndex(of: sep, at: s.startIndex)
if var i1 = i1 {
    var i2 = s.firstIndex(of: sep, at: s.index(after: i1))
    if var i2 = i2 {
        print("'\(s)'")
        print("'\(s[i1...i2])'")
        for _ in sep {
            i1 = s.index(after: i1)
        }
        i2 = s.index(before: i2)
        print("'\(s[i1...i2])'")
        print("'\(s[i1...i2].trimmingCharacters(in: .whitespacesAndNewlines))'")
    }
}

Yes, that will work, and it does not stumble into any of the repeated integer conversion pitfalls I was talking about.

(Caveat: At this point I am assuming you can guarantee sep will never be "", in which case I donā€™t know off the top of my head what will happen when it reaches range(of:) in the extension method.)

Three parts of it might still be doing slightly more work than necessary.

  1. Twice you have used this pattern (with i1 and with i2):

    var x = y()
    if var x = x {
      // ...
    

    Those two var declarations create two separate variables, even though one shadows the other. (The compiler is probably even warning you that the first one is never changed and could be switched to a let.) You can compress that pattern directly into this:

    if var x = y() {
      // ...
    

    That way you are only storing one variable.

  2. The innermost loop...

    for _ in sep {
      i1 = s.index(after: i1)
    }
    

    ...could be reduced to...

    i1 = s.index(i1, offsetBy: sep.count)
    

    (count must be doing a similar loop of some form under the hood, but it might have inside information allowing it to do so without the overhead of dispatching to the index(after:) method in each iteration.)

  3. The conversion to an open range...

    i2 = s.index(before: i2)
    print("'\(s[i1...i2])'")
    

    ...could be simplified by just using a closed range directly:

    print("'\(s[i1..<i2])'")
    
1 Like

Thanks for help.

I rewrite my code following your suggestion.

The code won't crash even sep is "".

import Foundation
extension String {
    func firstIndex(of: String, at: String.Index) -> String.Index? {
        return self[at...].range(of: of)?.lowerBound
    }
}

let s = "This also is a string"
let sep = "is"
if var i1 = s.firstIndex(of: sep, at: s.startIndex) {
    if let i2 = s.firstIndex(of: sep, at: s.index(after: i1)) {
        print("'\(s)'")
        print("'\(s[i1...i2])'")
        i1 = s.index(i1, offsetBy: sep.count)
        print("'\(s[i1..<i2])'")
        print("'\(s[i1..<i2].trimmingCharacters(in: .whitespacesAndNewlines))'")
    }
}

If we're doing a code review I'll point out that one of Swift's design principles is Fluency. range(of: of) isn't very fluent IMO. Same with [at...]. I recommend you add additional variable labels here. Something like

    func firstIndex(of subString: String, at index: String.Index) -> String.Index? {
        return self[index...].range(of: subString)?.lowerBound
    }

You might prefer different names but range(of: of)and [at...] are odd.

You should be able to read your code out loud and it should sound normal.

I would call this firstRange(of:after:)

But anyways, thereā€™s a whole series of operations youā€™d want to do before or after a certain point in a string (capitalizing, lower casing, sorting, searching, replacing, etc.). Thereā€™s no point in bloating each API with after: Index parameters.

Instead, you can compose orthogonal components like slicing (to pick which part or act on) and a normal API like firstIndex(of:)

1 Like