Cannot assign value of type 'String' to subscript of type 'String.SubSequence'

IMO, SubSequence is actually the odd one, especially since they should be substrings (wiki), array slices, and subsequences (wiki).

IIRC, SubSequence had been with Sequence since the very beginning. So maybe it was created before the API naming guideline took the shape it has to date.

You should read the documentation for Substring,

1 Like

Right. What I'm saying is, why do we have an entirely separate public type, String, whose only job* is to do what this property does?

extension Substring {
  var withMinimalBase: Self { .init(String(self)) }
}
values[1] // "Tasha Yar"
values[1].base // "William Riker,Tasha Yar,Data,Beverly Crusher"
values[1].withMinimalBase // "Tasha Yar"
values[1].withMinimalBase.base // "Tasha Yar"

* I've seen "You can perform many string operations on a substring" there. What would be a lot more informative is to know what String can do, that SubString can't.

E.g. currently, you can't use a SubString wherever you can use a String. Why?

1 Like

Just because a type presents the same interface as another type doesn't mean it has the same behavior. Substring exists as a performance optimization. This is the part of the documentation I wanted you to read:

When you create a slice of a string, a Substring instance is the result. Operating on substrings is fast and efficient because a substring shares its storage with the original string. The Substring type presents the same interface as String , so you can avoid or defer any copying of the string’s contents.

For example, imagine you have a 100-character string and you access characters 0..<90:

let slice = largeString[0..<90]

slice, a Substring, now shares the same memory as largeString. Without this optimization, you would have to copy most of the storage of largeString into slice, which would result in almost twice the memory usage. This is why Substring exists.

2 Likes

You have it backwards. String is a specialized memory-optimization of Substring.
I'm saying it's not worth a type.
But it's possible that it is a type because a type is the only marking system that Swift offers to assert that the Substring has been minimized. I want documentation on that.

1 Like

What do you mean by this?

And why is String a specialized memory-optimization of Substring?

From SE-0163:

String is currently serving as its own subsequence, allowing substrings to share storage with their "owner". This can lead to memory leaks when small substrings of larger strings are stored long-term (see here for more detail on this problem). Introducing a separate type of Substring to serve as String.Subsequence is recommended to resolve this issue, in a similar fashion to ArraySlice .

and

A new type, Substring , will be introduced. Similar to ArraySlice it will be documented as only for short- to medium-term storage:

Important
Long-term storage of Substring instances is discouraged. A substring holds a reference to the entire storage of a larger string, not just to the portion it presents, even after the original string’s lifetime ends. Long-term storage of a substring may therefore prolong the lifetime of elements that are no longer otherwise accessible, which can appear to be memory leakage.

Aside from minor differences, such as having a SubSequence of Self and a larger size to describe the range of the subsequence, Substring will be near-identical from a user perspective.

The introduction of Substring came about from practical experience of code holding on to Strings permanently, when those Strings were really slices of much larger data, effectively wasting memory.

But it's possible that it is a type because a type is the only marking system that Swift offers to assert that the Substring has been minimized

Yep, this is it. The API between String and Substring is meant to be as close to identical as possible (so much so that most operations on String-like objects should likely be using StringProtocol itself, which abstracts over the two), but the type of String indicates to you that it is the owner of its entire buffer, whereas Substring is not (i.e., direct storage of Substrings should be a code smell that indicates that you really want conversion to String to "minimize" the slice).

The full rationale behind this perhaps isn't spelled out as clearly as it could be in the Substring docs, but it does at least say

Important
Don’t store substrings longer than you need them to perform a specific operation. A substring holds a reference to the entire storage of the string it comes from, not just to the portion it presents, even when there is no other reference to the original string. Storing substrings may, therefore, prolong the lifetime of string data that is no longer otherwise accessible, which can appear to be memory leakage.

4 Likes
    let a = "very pretty string here"
    var substring = a[a.range(of: "very pretty string")!]
    print(substring)        // "very pretty string"
    print(substring.base)   // "very pretty string here"
    substring.removeSubrange(substring.range(of: "y str")!)
    print(substring)        // "very pretting"
    print(substring.base)   // "very pretting here"

is the resulting base of a substring a bug here? it's not shared with any other string to keep it that way.

Nope.

1 Like

To add to @xwu's comment, this is expected. Substrings are mutable through RangeReplaceableCollection, and have value semantics — when they are mutated, they make a copy of the string slice they're holding on to (if not uniquely held) and are now a slice of that string.

The fact that the base isn't inherently held onto by anything else other than substring.base isn't an issue.

3 Likes

the "here" portion of the base in the last example is a waste, as there is no way to use it. this waste can be quite big. a better implementation would drop everything from the base but the slice itself upon mutation.

That's a good point — I believe this may be caused by Substring being implemented on top of Slice<String> directly without additional handling on mutation, so changes are applied to the underlying string first, and then the substring is reformed atop the underlying string without getting rid of characters beyond the slice boundaries. I can't think of a case off the top of my head where keeping the full underlying mutated string is necessary, but @Michael_Ilseman or @David_Smith might know better. It seems like a worthwhile optimization to consider, but for now, this isn't semantically incorrect, at least.

1 Like

I think Java used to (until around 2012 or thereabouts) implement String like our Substring type. That is, a string would hold a character buffer combined with an offset and a length. Calling .substring on a string would return a String type (but again, similar to our Substring) with the same buffer, but a different length and offset.

They changed that because the overwhelming number of string manipulations weren't parsers, scanners and other cases where that optimisation mattered. Their String is now like our String.

Swift opted for a middle ground. The Swift project realised that although Java made the right decision when they changed their String, there are still cases where keeping the old "lens" type made sense. So Swift god two distinct types.

It has a different set of tradeoffs. For the most part, the inconvenience of dealing with two distinct types are mitigated by type inference and function overloads, but it sometimes surfaces to the user/programmer. Like it did for the OP.

I still think it is far preferable to having a single String type with Substring semantics.
As the Java team learnt the hard way.

8 Likes

@sveinhal Your explanation really made sense to me. Much appreciated.

1 Like

indeed. swift solution seems a reasonable compromise, but not a silver bullet. example that would've benefited from "Substring semantics baked into String":

    var s = "very long string here ..."
    while s != "" {
        s.removeLast()
        do_something(s)
    }

with "substring semantics" this loop could have run without memory allocation / string copying, just adjusting the length leaving the same base of the string. ditto for the head removal or both ends removal.

if substring semantics was baked into the String itself (so there was no need for a separate Substring type) i'd expect to see some "compact" API to convert String to its trimmed form:

s.removeLast(), etc
s.compact() // opt-in on an as needed basis

if i was designing the thing i'd probably go with this latter approach: just one small API method to surface (and for the users to know about when to use and when to not) instead of the heavier Substring / StringProtocol approach.

That's a fair preference. But I'm not convinced that this approach wouldn't be less optimal in practice. It would be easy, especially for newcomers to forget the .compact() calls, or it could quickly become noisy to include them in everywhere to avoid memory leaks.

But I guess it boils down to different trade-offs.

It does if you use actual slicing operations (dropLast() over removeLast()), or operate on a slice (var s = "string"[...]). And that works for all collections.

3 Likes

isn't the situation irt memory leaks and noise exactly the same now with Substring? extra noise due to "String(substring)" here and there, and this warning in the documentation which is very easy to miss / forget for newcomers:

Sure. But its a different set of easy omissions/extra noise. I happen to like this set better.

But yes, you are right.

IME, I often find heavy string manipulation code (one that warrants Substring) and other, lighter string usages to be separated relatively cleanly. Meaning that I'd only need to do String(substring) or .compact only when crossing between those two areas. In that regard, I do enjoy having type-level information to remind me to trim unused string portions.

3 Likes