Substring syntax is getting ridiculous!

From SWIFT 4 this line of code is official substring syntax:
str[..<str.index(str.startIndex, offsetBy: 8)]

It is the most unreadable line of code for something as simple as substring. If you add the conversion to String it is getting more ugly :)
Leave beginners aside but when I look at such thing in my old code - I have to spend few seconds of valuable thinking time to be sure what it is doing!!! It is not normal and is not good...

2 Likes

I suggest looking at Shorthand for Offsetting startIndex and endIndex, and other threads.

So it is 1000 lame comments! The whole idea of using collection syntax is wrong. As a rule using one thing for many different tasks in development leads to unwanted complexity.
There should be just a simple method like this:
substr = str.substring(start,end)

Avoiding “unwanted complexity” is precisely what lead to the current API design, as far as I understand it. By offering easy integer subscripts Swift would offer an easy way for people to shoot themselves in the foot performance-wise, since Unicode strings do not support random access. You may disagree with the solution, but it’s a choice that makes sense. Do you know about languages that get Unicode right and offer fast integer subscripting on strings?

3 Likes

My experience with other languages is that they index strings by scalar or code point, not by grapheme. Whether this is "right" is subjective, but it does allow fast integer access.

C++ and Java will do well and fast for Unicode16 and that is good for half of the world. As I said - if it is a simple and readable method you can hide performance optimisations inside and if it is as good as C++ for the Unicode16 - you have done well!
This line of code will be the most frequent use case - and as SWIFT users are other developers it should be simple and readable so they can focus on solving their problems and not thinking about SWIFT's Unicode wise problem!

I understand your goal, and many people are displeased with the current string ergonomics, which is why the work on making them better is still somewhat in motion. That thread I linked upstream is part of that work!

Rather than recreating all the arguments already stated elsewhere, I think we should be summarizing the prior work, and putting forward solutions to any issues identified previously. And if it looks like the previous discussion died, but was mostly fleshed out, then an implementation and a pitch would be needed.

5 Likes

I use the following extension in projects with lots of string manipulation:

subscript (r: Range<Int>) -> String {
    let range = Range(uncheckedBounds: (lower: max(0, min(count, r.lowerBound)),
                                        upper: min(count, max(0, r.upperBound))))
    let start = index(startIndex, offsetBy: range.lowerBound)
    let end = index(start, offsetBy: range.upperBound - range.lowerBound)
    return String(self[start ..< end])
}

Sure, it's not as nice as it being part of the standard library, but it's not that much of a burden to add as-needed.

1 Like

Note that you would get the same result with str.prefix(8) – the only difference is that this also accepts a string with less than 8 characters.

3 Likes

OK - so I am leaving the "new" solution to the SWIFT developers, and I hope for clean, simple and readable syntax :)

As someone who grew up in a non-ASCII language and up to this day receives post packages with his name garbled on it, I welcome every language that decides to be Unicode-correct by default.

Swift also uses UTF-16 internally for non-ASCII string storage. That’s not the point, though – the point is what the public API is based on. Many common things like emoji don’t fit into a single UTF-16 code unit, so if you based the indexing on UTF-16 code units, we would have to accept "😂".count returning 2. And then different people would complain about Swift getting it wrong.

Unicode is hard. As Erik said, let’s please read the past discussions before demanding improvements.

5 Likes

I am sorry, but I think you got the wrong impression. I am not advocating any changes. I like that Swift indexes on graphemes (Foundation does not). But it is not "incorrect" to index on scalars. It's just different.

It is not trivial but it is not hard :)
":joy:".count can easily return one and you can easily use Int for characters when enumerating and still use 1,2,3 or 4 bytes to store them internally. All these things are SWIFT developers decisions and I do not mind them. However the syntax is something that is VERY important for readability and productivity of their colleagues that are users of SWIFT.

Also it is VERY bad practice to change the syntax in a way that breaks legacy code. SWIFT is old enough now to have this in mind, but in the case of substrings the change is very needed!

proposal PR and implementation are here. Is this what you meant or have I misinterpreted your statement?

The PR has been there for a while, since mid February, though this isn't really that long in terms of swift-evolution :slight_smile:. I'm not sure how scheduling works or what the priority of this is. Maybe @Ben_Cohen or @dabrahams might be able to speak about it?

I'm not sure how substr = str[offset: start..<end] is so much worse?

Also IIRC we have support for doing:

let substr = str[...]

If I can not do var char = str[5] why should I have slice syntax? That is why I do not like it.
You are reusing collections syntax for Strings that do not support its most important feature - subscript. So it is just getting complex - looks the same but it is not !

Could you please elaborate?

Collection syntax, as you call, defines a subscript that works on associate type Index which may not necessarily be Int. So Strings do support subscript as defined for Collection in Swift.

This sounds like an argument against what you what? The offset label is used to make it look different.

Can you please send me a link how development is managed?

Referring to things as "ridiculous" and your fellow commenters as "lame" is disrespectful and inappropriate for this forum. Combativeness is not how we do things around here. Your posts touch on issues we care about and are looking into solutions for, but your arguments about indexing and UTF-16 are reiterating aspects of String's design that have been considered and iterated thoroughly already. I would recommend participating in one of the existing threads on the subject of string and/or indexing ergonomics (as @nuclearace noted, there's Shorthand for Offsetting startIndex and endIndex), or if you have novel solutions to the ergonomics problems, starting off a thread with more concrete ideas for how things could be improved.

14 Likes