Am I overcomplicating String.Index?

danmihai · July 10, 2022, 3:20pm

Image you have a string of the format: 1 junk character, a real name, a '@', like:
%Michael@
or
^Daniel@

How do you retrieve the name? My solution was:

String(personName[personName.index(after: personName.startIndex)..<personName.firstIndex(of: "@")!]))
Where personName variable is the content.

mayoff · July 10, 2022, 3:37pm

I would avoid the use of indexes entirely.

let justName: Substring = personName.dropFirst().dropLast()
let justNameString: String = String(personName.dropFirst().dropLast())

danmihai · July 10, 2022, 3:43pm

That is just beautiful and brilliant at the same time! Thanks! I guess this would not however go any better than index if the junk characters would be at a random position? Right?

mayoff · July 10, 2022, 3:44pm

Give us an example of “junk characters would be at a random position”: what the input looks like, and what output you want.

danmihai · July 10, 2022, 3:45pm

For example D@an%iel where you'd want to get rid of @ and %.

mayoff · July 10, 2022, 3:48pm

let input = "D@an%iel"
let output = input.filter { $0.isLetter }
           // or: .filter(\.isLetter)
// output == "Daniel"

danmihai · July 10, 2022, 3:49pm

Right! Thanks again! So just one last question. When is the case that indexes should be used?

xwu · July 10, 2022, 5:00pm

To a first approximation, never. There may be some generic collection algorithms which can make good use of indexes and operate on strings just as well as other collections. However, if you're using string indexes directly, assume that there's some better way unless demonstrated otherwise.

scanon · July 10, 2022, 6:28pm

Note that with the regular expression support in 5.7, you can do your processing task with something like:

let name = inputString.wholeMatch(of: /.(.+)@/)?.1

or if you prefer to spell it out:

import RegexBuilder
let pattern = Regex {
  One(.any)
  Capture { OneOrMore(.any) }
  "@"
}

inputString.wholeMatch(of: pattern)?.1

(You can make these regexes more efficient by matching one-or-more not-@ characters, but the simple form suffices for illustration.)

David_Smith · July 10, 2022, 8:39pm

The idea is that you (or, even better, standard library authors like me!) build more useful algorithms like filter and dropFirst that internally use indices, and then ideally don’t have to mess with them directly after that. Of course there are always cases where you’re doing something unusual that isn’t covered by the available algorithms, but we hope as we continue to expand the toolbox, these will become increasingly rare.

danmihai · July 11, 2022, 8:41pm

Ok, another use case, imagine you'd like to retrieve all characters following the first occurrence of the character :. I guess this does need String.Index? Or is there a better way other than?:

let name = "dfsafa:Dan"

name[name.index(after: name.firstIndex(of: ":")!)...] //  Dan

Avi · July 11, 2022, 8:51pm

name.split(separator: ":").last!

Or

let name = "dfsafa:Dan:Smith"

let dan = name.split(separator: ":").dropFirst().joined(separator: ":")

Or

name.drop(while: { $0 == ":" })

danmihai · July 11, 2022, 9:08pm

I still don't quite understand the need for String.Index. Why not just have integer abstraction at the level of views? Like:

myString.utf8View[3] // third byte in the UTF-8 representation

David_Smith · July 11, 2022, 9:18pm

String's indexing model has always had a tradeoff: it's more complex than other languages, but in exchange it produces correct results where many other languages don't.

There's nothing wrong with using it, this is just acknowledgement that many people would prefer not to

David_Smith · July 11, 2022, 9:27pm

Having a bytewise view of String contents would be fine, but we wouldn't call it utf8View, since UTF8 code points are not single bytes.

danmihai · July 11, 2022, 9:29pm

UTF-8 code unit:
https://docs.swift.org/swift-book/LanguageGuide/StringsAndCharacters.html

David_Smith · July 11, 2022, 9:33pm

Yes, the question is whether randomly indexing into that is a useful operation (vs something more structured). If you're writing a UTF8 decoder it is, but for most stuff it doesn't seem very helpful.

(I do agree it's a little odd that iterating it produces the bytes but indexing tries to be more structured)

jrose · July 11, 2022, 9:54pm

Also, Strings created in Swift are always UTF-8, but on Apple platforms String may be backed by a UTF-16 NSString, where indexing into the UTF-8 code units is not an O(1) operation. (And I think most of String’s design was done when Swift strings were UTF-16 by default as well.)

tera · July 12, 2022, 7:36pm

If you don't mind slow index operations (including accidentally quadratic algorithms in case you, say, go through each character of a string and do a subscript with integer index) then consider this simple wrapper. This might be ok for tests and short strings, just avoid using it in real apps in production.

How to write my own extension to enable Python-like String indexing?

let text = "hello"
print(text[slow: 0..<2]) // "he"
print(text[slow: 0...2]) // "hel"
print(text[slow: 1...]) // "ello"
print(text[slow: ..<2]) // "he"
print(text[slow: ...2]) // "hel"