Am I overcomplicating String.Index?

Image you have a string of the format: 1 junk character, a real name, a '@', like:
%Michael@
or
^Daniel@

How do you retrieve the name? My solution was:

String(personName[personName.index(after: personName.startIndex)..<personName.firstIndex(of: "@")!]))
Where personName variable is the content.

I would avoid the use of indexes entirely.

let justName: Substring = personName.dropFirst().dropLast()
let justNameString: String = String(personName.dropFirst().dropLast())
4 Likes

That is just beautiful and brilliant at the same time! Thanks! I guess this would not however go any better than index if the junk characters would be at a random position? Right?

1 Like

Give us an example of “junk characters would be at a random position”: what the input looks like, and what output you want.

For example D@an%iel where you'd want to get rid of @ and %.

let input = "D@an%iel"
let output = input.filter { $0.isLetter }
           // or: .filter(\.isLetter)
// output == "Daniel"
3 Likes

Right! Thanks again! So just one last question. When is the case that indexes should be used?

To a first approximation, never. There may be some generic collection algorithms which can make good use of indexes and operate on strings just as well as other collections. However, if you're using string indexes directly, assume that there's some better way unless demonstrated otherwise.

9 Likes

Note that with the regular expression support in 5.7, you can do your processing task with something like:

let name = inputString.wholeMatch(of: /.(.+)@/)?.1

or if you prefer to spell it out:

import RegexBuilder
let pattern = Regex {
  One(.any)
  Capture { OneOrMore(.any) }
  "@"
}

inputString.wholeMatch(of: pattern)?.1

(You can make these regexes more efficient by matching one-or-more not-@ characters, but the simple form suffices for illustration.)

2 Likes

The idea is that you (or, even better, standard library authors like me!) build more useful algorithms like filter and dropFirst that internally use indices, and then ideally don’t have to mess with them directly after that. Of course there are always cases where you’re doing something unusual that isn’t covered by the available algorithms, but we hope as we continue to expand the toolbox, these will become increasingly rare.

7 Likes

Ok, another use case, imagine you'd like to retrieve all characters following the first occurrence of the character :. I guess this does need String.Index? Or is there a better way other than?:

let name = "dfsafa:Dan"

name[name.index(after: name.firstIndex(of: ":")!)...] //  Dan

name.split(separator: ":").last!

Or

let name = "dfsafa:Dan:Smith"

let dan = name.split(separator: ":").dropFirst().joined(separator: ":")

Or

name.drop(while: { $0 == ":" })

1 Like

I still don't quite understand the need for String.Index. Why not just have integer abstraction at the level of views? Like:

myString.utf8View[3] // third byte in the UTF-8 representation

String's indexing model has always had a tradeoff: it's more complex than other languages, but in exchange it produces correct results where many other languages don't.

There's nothing wrong with using it, this is just acknowledgement that many people would prefer not to :slight_smile:

3 Likes

Having a bytewise view of String contents would be fine, but we wouldn't call it utf8View, since UTF8 code points are not single bytes.

UTF-8 code unit:
https://docs.swift.org/swift-book/LanguageGuide/StringsAndCharacters.html

Yes, the question is whether randomly indexing into that is a useful operation (vs something more structured). If you're writing a UTF8 decoder it is, but for most stuff it doesn't seem very helpful.

(I do agree it's a little odd that iterating it produces the bytes but indexing tries to be more structured)

4 Likes

Also, Strings created in Swift are always UTF-8, but on Apple platforms String may be backed by a UTF-16 NSString, where indexing into the UTF-8 code units is not an O(1) operation. (And I think most of String’s design was done when Swift strings were UTF-16 by default as well.)

5 Likes

If you don't mind slow index operations (including accidentally quadratic algorithms in case you, say, go through each character of a string and do a subscript with integer index) then consider this simple wrapper. This might be ok for tests and short strings, just avoid using it in real apps in production.