Cannot assign value of type 'String' to subscript of type 'String.SubSequence'

bartvk · October 8, 2021, 12:13pm

I have the following Swift code:

let awayTeam = "William Riker,Tasha Yar,Data,Beverly Crusher"
var values = awayTeam.split(separator: ",")
values[1] = String(repeating: "X", count: values[1].count)
print(values.joined(separator: ","))

It fails to compile, the error is: Cannot assign value of type 'String' to subscript of type 'String.SubSequence' (aka 'Substring')

The solution is to do:
values[1] = Substring(String(repeating: "X", count: values[1].count))

But why does the following then work?
values[1] = "X"

Casting to Substring seems like a hassle to me. Is there a good rationale for why? Is Swift trying to protect me in some way?

Avi · October 8, 2021, 12:25pm

Assigning a string literal works because SubString conforms to ExpressibleByStringLiteral.

sveinhal · October 8, 2021, 12:50pm

In this example, you are not in fact assigning the string "X" to values[1].

Although it might feel counter intuitive, "X" is not a String, but rather a source code string literal, subject to interpretation and type inference. In this case, the type is inferred to be Substring and since string literals are indeed convertible to substrings, it is therefore not an error.

In the same vein, [foo] might be an Array or a Set or even an OptionSet depending on context.
And 123 might be an Int or a Double or some other numeric type.

These literal values that you type out in your source code, can be interpreted as a number of different types, but is always statically determined at compile-time.

anon9791410 · October 8, 2021, 2:41pm

I have not given this a lot of thought:

Considering the solution is to just make a new SubString…

values[1] = .init(repeating: "X", count: values[1].count)

…and that's as easy as using String, why do we even have String? The initial literal could be a Substring also…

let awayTeam: Substring = "William Riker,Tasha Yar,Data,Beverly Crusher"

and the end effect is the same. But I'm sure that's bad practice. Is it documented?

Lantua · October 8, 2021, 4:29pm

A small thing I'd like to add is that values[1] = String(...) doesn't work because values[1] is of type Substring, and Swift requires you to write the needed conversion (String -> Substring) explicitly*. Though you can also construct the Substring directly as @Jessy has pointed out.

Substring keeps the original string around. So one may meaningfully decide to continue referencing the original string (and use Substring) or to make a new copy (and use String). This is also the reason that the uses of long-live Substring is normally discouraged.

The distinction becomes more benign in smaller scale, like in your awayTeam example. We even have string[...] subscript to quickly create a Substring from any given String.

That said, it is indeed possible to have a Collection be its own SubSequence, like what Data does. Though I'm not sure that's a good idea given how different they are.

* There are some implicit conversions, but those are hard-coded into the language because the ergonomic outweigh such ideology.

young · October 8, 2021, 6:38pm

I wonder why the spelling inconsistencies:

Substring <== this is correct, not SubString

the others:

SubSequence

ArraySlice

It seems to be inconsistent with the others, Substring should be SubString

?

Lantua · October 8, 2021, 6:51pm

IMO, SubSequence is actually the odd one, especially since they should be substrings (wiki), array slices, and subsequences (wiki).

IIRC, SubSequence had been with Sequence since the very beginning. So maybe it was created before the API naming guideline took the shape it has to date.

Peter-Schorn · October 9, 2021, 12:10am

You should read the documentation for Substring,

anon9791410 · October 9, 2021, 9:21am

Right. What I'm saying is, why do we have an entirely separate public type, String, whose only job* is to do what this property does?

extension Substring {
  var withMinimalBase: Self { .init(String(self)) }
}

values[1] // "Tasha Yar"
values[1].base // "William Riker,Tasha Yar,Data,Beverly Crusher"
values[1].withMinimalBase // "Tasha Yar"
values[1].withMinimalBase.base // "Tasha Yar"

* I've seen "You can perform many string operations on a substring" there. What would be a lot more informative is to know what String can do, that SubString can't.

E.g. currently, you can't use a SubString wherever you can use a String. Why?

Peter-Schorn · October 9, 2021, 4:00pm

Just because a type presents the same interface as another type doesn't mean it has the same behavior. Substring exists as a performance optimization. This is the part of the documentation I wanted you to read:

When you create a slice of a string, a Substring instance is the result. Operating on substrings is fast and efficient because a substring shares its storage with the original string. The Substring type presents the same interface as String , so you can avoid or defer any copying of the string’s contents.

For example, imagine you have a 100-character string and you access characters 0..<90:

let slice = largeString[0..<90]

slice, a Substring, now shares the same memory as largeString. Without this optimization, you would have to copy most of the storage of largeString into slice, which would result in almost twice the memory usage. This is why Substring exists.

anon9791410 · October 9, 2021, 4:51pm

You have it backwards. String is a specialized memory-optimization of Substring.
I'm saying it's not worth a type.
But it's possible that it is a type because a type is the only marking system that Swift offers to assert that the Substring has been minimized. I want documentation on that.

Peter-Schorn · October 9, 2021, 4:53pm

What do you mean by this?

And why is String a specialized memory-optimization of Substring?

itaiferber · October 9, 2021, 6:47pm

From SE-0163:

String is currently serving as its own subsequence, allowing substrings to share storage with their "owner". This can lead to memory leaks when small substrings of larger strings are stored long-term (see here for more detail on this problem). Introducing a separate type of Substring to serve as String.Subsequence is recommended to resolve this issue, in a similar fashion to ArraySlice .

and

A new type, Substring , will be introduced. Similar to ArraySlice it will be documented as only for short- to medium-term storage:

Important
Long-term storage of Substring instances is discouraged. A substring holds a reference to the entire storage of a larger string, not just to the portion it presents, even after the original string’s lifetime ends. Long-term storage of a substring may therefore prolong the lifetime of elements that are no longer otherwise accessible, which can appear to be memory leakage.

Aside from minor differences, such as having a SubSequence of Self and a larger size to describe the range of the subsequence, Substring will be near-identical from a user perspective.

The introduction of Substring came about from practical experience of code holding on to Strings permanently, when those Strings were really slices of much larger data, effectively wasting memory.

But it's possible that it is a type because a type is the only marking system that Swift offers to assert that the Substring has been minimized

Yep, this is it. The API between String and Substring is meant to be as close to identical as possible (so much so that most operations on String-like objects should likely be using StringProtocol itself, which abstracts over the two), but the type of String indicates to you that it is the owner of its entire buffer, whereas Substring is not (i.e., direct storage of Substrings should be a code smell that indicates that you really want conversion to String to "minimize" the slice).

The full rationale behind this perhaps isn't spelled out as clearly as it could be in the Substring docs, but it does at least say

Important
Don’t store substrings longer than you need them to perform a specific operation. A substring holds a reference to the entire storage of the string it comes from, not just to the portion it presents, even when there is no other reference to the original string. Storing substrings may, therefore, prolong the lifetime of string data that is no longer otherwise accessible, which can appear to be memory leakage.

tera · October 9, 2021, 7:35pm

    let a = "very pretty string here"
    var substring = a[a.range(of: "very pretty string")!]
    print(substring)        // "very pretty string"
    print(substring.base)   // "very pretty string here"
    substring.removeSubrange(substring.range(of: "y str")!)
    print(substring)        // "very pretting"
    print(substring.base)   // "very pretting here"

is the resulting base of a substring a bug here? it's not shared with any other string to keep it that way.

xwu · October 9, 2021, 7:42pm

Nope.

itaiferber · October 9, 2021, 7:46pm

To add to @xwu's comment, this is expected. Substrings are mutable through RangeReplaceableCollection, and have value semantics — when they are mutated, they make a copy of the string slice they're holding on to (if not uniquely held) and are now a slice of that string.

The fact that the base isn't inherently held onto by anything else other than substring.base isn't an issue.

tera · October 9, 2021, 8:15pm

the "here" portion of the base in the last example is a waste, as there is no way to use it. this waste can be quite big. a better implementation would drop everything from the base but the slice itself upon mutation.

itaiferber · October 10, 2021, 3:05pm

That's a good point — I believe this may be caused by Substring being implemented on top of Slice<String> directly without additional handling on mutation, so changes are applied to the underlying string first, and then the substring is reformed atop the underlying string without getting rid of characters beyond the slice boundaries. I can't think of a case off the top of my head where keeping the full underlying mutated string is necessary, but @Michael_Ilseman or @David_Smith might know better. It seems like a worthwhile optimization to consider, but for now, this isn't semantically incorrect, at least.

sveinhal · October 12, 2021, 8:50am

I think Java used to (until around 2012 or thereabouts) implement String like our Substring type. That is, a string would hold a character buffer combined with an offset and a length. Calling .substring on a string would return a String type (but again, similar to our Substring) with the same buffer, but a different length and offset.

They changed that because the overwhelming number of string manipulations weren't parsers, scanners and other cases where that optimisation mattered. Their String is now like our String.

Swift opted for a middle ground. The Swift project realised that although Java made the right decision when they changed their String, there are still cases where keeping the old "lens" type made sense. So Swift god two distinct types.

It has a different set of tradeoffs. For the most part, the inconvenience of dealing with two distinct types are mitigated by type inference and function overloads, but it sometimes surfaces to the user/programmer. Like it did for the OP.

I still think it is far preferable to having a single String type with Substring semantics.
As the Java team learnt the hard way.

bartvk · October 12, 2021, 9:14am

@sveinhal Your explanation really made sense to me. Much appreciated.