Subscripting a string should be possible or have an easy alternative

Doing Advent of Code day 4. Felt quite frustrated at the language that you cannot subscript a string easily.

I tried doing string[i] but it had to be a String.Index. But initialising one seemed hard from Xcode auto complete. Advancing the String.Index by 1 using +=1 is also not possible because of a mismatch between String.Index and Int.

I found myself doing this

var entities: [[Character]] {
    data.split(separator: .newlineSequence).map { Array(String($0)) }
  }

I understand at a high level why its not possible to subscript a string, its not going to work with emojis and so on. But what about indexing using the string.utf8[SOMEINT]? Or some other API that just aknowledges that you are going to do something with potentially weird outcomes? Or maybe let me just advance the String.Index by 1.

Just feels like a frustrating developer experience for anyone trying to do coding challenges if string manipulation is not seemingly intuitive.

This all may be a HUGE skill issue on my side. Happy for that to be the case and learn.

2 Likes

If you want to access low level functions, you need to use low level data structures. String is a high level data structure, utf8 characters are not always 1 byte long. You can just use [CChar] when working with ascii strings.

8 Likes

Subscripting String is easy and performant. I encourage you to read Accessing and Modifying a String from the official swift.org documentation.

You can take it one step further by adding your own extensions to String/Substring that take an Int for an index.

3 Likes

If the data is pure ascii like the day4 of aoc you can use string.utf8.withContiguousStorageIfAvailable { b in
Which give you a buffer indexable by int.

2 Likes

I thought I'd mention String/withUTF8(_:) since it is non-optional:

import Foundation.NSString

var hello = "γƒΎ(ΛΆα΅” α—œ α΅”ΛΆ)"
var again = String(NSString(string: hello))

print(hello.utf8.withContiguousStorageIfAvailable(Array.init) as Any) // OK
print(hello.withUTF8(Array.init)) // OK

print(again.utf8.withContiguousStorageIfAvailable(Array.init) as Any) // :(
print(again.withUTF8(Array.init)) // OK
1 Like

Sounds like you just want an array of integers instead of String, because you're working with an abstract sequence of symbols taken a small finite alphabet, and not text written in human language, with all the richness the latter representation must entail.

It's worth pointing out that in mathematics and computer science, "string" has a technical meaning, so "string manipulation" in a coding challenge tends to be quite different in nature than "String manipulation" of Unicode text.

6 Likes

A bunch of people have mentioned .utf8 and .withUTF8 already. For AoC type stuff, though, I usually map the string to get a non-transient version of the data in whatever type I actually want to work with, since you're really not going to do any "string" processing on it at all:

let values = string.map(\.asciiValue!) // [UInt8]

Note that this will trap if the string has any non-ASCII bytes, but that's fine for AoC.

17 Likes

I don't think this is easy or intuitive. Below is an opinion.

let greeting = "Guten Tag!"
greeting[greeting.startIndex]
// G
greeting[greeting.index(before: greeting.endIndex)]
// !
greeting[greeting.index(after: greeting.startIndex)]
// u
let index = greeting.index(greeting.startIndex, offsetBy: 7)
greeting[index]
// a

You have to understand String.Index. You also must keep a reference to the original string at all times.

var c = MyFancyCollection([10, 20, 30, 40, 50])
var i = c.startIndex
while i != c.endIndex {
    c[i] /= 5
    i = c.index(after: i)
}

This looks promising at first, but you can't advance the index without having access to the original collection.

So I really am being asked to do entities[i][entities[i].index(after: j)] if I want to continue using the string collection. And entities[i][entities[i].index(j, offsetBy: 3)]. That, to me, looks verbose and not user-friendly in comparison to other languages.

Finally,

You can take it one step further by adding your own extensions to String/Substring that take an Int for an index.

The language should provide me this. Like around about every other language. I feel like most swift developers may just refuse to do coding challenges in the language and revert to python :person_shrugging: .

1 Like

Some other responses are also going into utf8 but then the comparison is with UInt8("X) which is a long way to just compare the index to a char. Maybe I just don't like or understand the swift language design around strings.

I feel like I read the coding challenge in english, and it is intuitive to try and solve it in english using strings, not converting to UInt8 or some other sequence of integers representing the format of the question. That doesn;t makes sense to me if the immediate comparison is easier to understand. I get that this is ONE way to think about it, but I don't think it's the first way most people think about it.

Is there no API that does exactly what I did, let you iterate the Character values of a string?

Swift is primarily intended to be a language for writing shipping software in, not coding challenges. In nearly all cases like this that we've seen, the coding challenge in question is explicitly requiring people to write code that would not be acceptable in a real application, and modifying it to actually be correct ends up re-introducing the complexity in other languages. When there are cases where that isn't true, we should address them by adding higher level API on String to accomplish the task at hand directly.

So far we haven't seen examples of things that meet all of these criteria:

  • exist in real world code
  • don't have critical bugs when done the simple way
  • aren't attempting to accomplish some higher level goal that could be expressed more directly

but if we did, that would be interesting and worth discussing what to do about.

Adding integer indexing is probably not going to be the solution though, because it has unsolvable downsides.

19 Likes

Also does Array<Character> meet your needs? Rather than Array<UInt8> or similar.

4 Likes

That's 16 bytes per element, but it probably won't matter unless your strings are huge.

Surely the Array comparison is much more intuitive for most programmers than Array.

Who would do array[i] == UInt8(ascii: "X") over array[i] == "X"? Why do I ever need to know what a UInt8 is or what UTF8 is or any of that.

I guess if there's no real world case for it, that's fine, but I am here to voice my frustration because a programming language should assist in getting to solutions, especially programming solutions but in this scenario, I thought it got in the way enough to make this thread. Because just starting the problem became difficult, before I even started addressing the question being asked.

Don't do this. Every new comer is tempted to implement these operators, because they're familiar with them from other languages, without understanding: there's a reason why the standard library intentionally omits them.

2 Likes

It's a deceptive convenience, because nearly every other language (only exceptions I know are Swift, Rust and Go) has String subscripting behaviour which is flat wrong, as far as humans are concerned.

Try this in your JS console:

"πŸ‘‹πŸ»"[0]

Correctly handing human language is more important than accommodating programming challenges, which involve atypical code. E.g. when's the last time a real-world problem required you to sort the characters in the string? It's a non-sense operation in human language terms, but comes up commonly in coding challenges.

In both my Rust and Swift solutions to AoC, I've been using Vec<Vec<char>>/[[Character]], and honestly I found it quite ergonomic.

7 Likes

You should be able to roll your own, by extending the String type.

Tip: (from TSPL) Use the indices property to access all of the indices of individual characters in a string.

for index in greeting.indices {
    print("\(greeting[index]) ", terminator: "")
}
// Prints "G u t e n   T a g ! "

F: surfaceIndex β€”> deepIndex β€”> Character

The indices property is an array of String.Index. Index the property by using the surface index (Int [0, count)) to get the deep index, then use the deep index to get the character.

Code
// Print the individual characters in this string, in three different functions
let s = "Hello, \u{274C} \u{274E} \u{2708}\u{1F33B}"

@main
enum A {
    static func main ()  {
        print (s)

        let cs = ContinuousClock ()
        let mf = cs.measure {
            f ()
        }
        print ("f took", mf)
        print ()
        
        let mg = cs.measure {
            g ()
        }
        print ("g took", mg)
        print ()
        
        let mh = cs.measure {
            h ()
        }
        print ("h took", mh)
    }
}


func f () {
    for x in 0..<s.count {
        print (s [x])
    }
}

func g () {
    let sx = StringIndexer (s)
    for x in 0..<s.count {
        print (sx [x])
    }
}

func h () {
    for c in s.indices {
        print (s [c])
    }
}

// Naive approach...
extension String {
    subscript (_ x: Int) -> Character {
        // require a valid Int index, the surface index
        precondition (x >= 0)
        precondition (x < self.count)
        
        // prepare the deep index vector
        var uv: [String.Index] = []
        for u in self.indices {
            uv.append (u)
        }
        
        // surface index -> deep index -> Character
        return self [uv [x]]
    }
}

// Slightly better approach...
//
// Let String be indexable by Int values
struct StringIndexer {
    // String to index
    let s: String
    
    // Deep index vector
    private let uv: [String.Index]
    
    init (_ s: String) {
        var uv: [String.Index] = []
        for u in s.indices {
            uv.append (u)
        }
        self.s = s
        self.uv = uv
    }
}

extension StringIndexer {
    subscript (_ x: Int) -> Character {
        // require a valid surface index
        precondition (x >= 0)
        precondition (x < uv.count)
        
        // surface index -> deep index -> Character
        return s [uv [x]]
    }
}

Output
Hello, ❌ ❎ ✈🌻
H
e
l
l
o
,
 
❌
 
❎
 
✈
🌻
f took 0.000767474 seconds

H
e
l
l
o
,
 
❌
 
❎
 
✈
🌻
g took 5.1812e-05 seconds

H
e
l
l
o
,
 
❌
 
❎
 
✈
🌻
h took 3.6901e-05 seconds
Program ended with exit code: 0

Hang in there, you will appreciate the String type.:grinning:

Hold up, you can just iterate a String directly, because it's a Sequence of Character:

for c in "abc" {
    print(c)
}

Was there some restriction in the thread above that i missed?

3 Likes

Issue was Iterating via a subscript so that you can adjust the subscript value and then check the previous or next character without having to use the original string every time.

for index in greeting.indices {
    index += 1 // Does not work
    if index % 2 == 0 {}  // does not work 
    greeting[index].index(index, offsetBy: 3)] // This is the way
    index.addSomeInt() // Does not exist but I think should be possible 
}

When you're working with linear sequencing, it typically pays dividends to lean heavily into the Sequence and Collection nature of all of the types you're working with, without indexing directly.

To use your example:

for (i, (c, c_next)) in zip(greeting, greeting.dropFirst()).enumerated() {
    // c == greeting[i]
    // c_next == greeting[i + 1]
    // `i % 2 == 0` works just fine
}

No indices needed at all! But if you do really want to use them, you can also rely on String.indices being a Sequence too:

let indices = greeting.indices
for (i, (idx, idx_next)) in zip(indices, indices.dropFirst()).enumerated() {
    // idx_next == idx "+ 1"
    // `i % 2 == 0` works just fine
}

The vast majority of operations folks are used to solving with indices are typically covered much more nicely with these types of APIs, but it's hard to know to reach for them because

  1. Very often, developers reach for indices because that's what they're used to in other languages (which might not offer similar higher-level tools), so they aren't looking for other approaches, and
  2. They're not easily highlighted in documentation

In the vanishing minority of cases where you truly need random accesses to grapheme clusters/Unicode scalars/UTF-8 code units, Array<Character>/Array<UnicodeScalar>/Array<UInt8> really are the types you want to be reaching for, since strings are not random-access collections.

15 Likes

@itaiferber, I fully agree, but I am not afraid to say that the current documentation can be improved to make it easier for the newcomers to appreciate the complexity of the String operations.

See my cryptic example above. An example like that could be used to demonstrate that a String can still be indexed by using Int-valued indices but doing so would be inefficient.