The Relationship Between StringProtocol.SubSequence and Substring

I'm writing some code that takes in a generic type, T that conforms to StringProtocol and returns a Substring and encountered some issues.

func foo<T>(_ str: T) -> Substring where T: StringProtocol {
    return str.dropFirst()[...].prefix(20).dropLast(3).suffix(5)
}

For some reason, this code fails with the following compile-time error:

error: cannot convert return expression of type 'T.SubSequence' to return type 'Substring'
    return str.dropFirst()[...].prefix(20).dropLast(3).suffix(5)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~
                                                                 as! Substring

This confused me because I thought that StringProtocol.SubSequence was just a type alias for Substring. Anyways, I thought "okay, let me check the definition of SubSequence on StringProtocol just to confirm my suspicions." Low and behold, I saw that in fact Substring is just the default value for StringProtocol.SubSequence and as such the two are not necessarily equal in a generic context. This surprised me, and I tried to think about why StringProtocol.SubSequence wouldn't just be a typealias for Substring because in practice, afaik, it always is, but to no avail. Then documentation I checked the documentation for StringProtocol and saw the following.

Do not declare new conformances to StringProtocol . Only the String and Substring types in the standard library are valid conforming types.

Since String.SubSequence is Substringand Substring.SubSequence is Substring, why would StringProtocol.SubSequence not just be equal to Substring? I literally cannot think of any reason/benefit for this. It seems to just be needlessly restrictive. Does anyone know why this is the case?

I noticed that changing the return type of foo to T.SubSequence fixes the problem above, but I also was a bit confused by that too. Shouldn't the return type be T.SubSequence.SubSequence.SubSequence.SubSequence.SubSequence instead? After some inspection, it seems that all of these operations return the SubSequence of the Collection they are called on. If this is the case and all of them produce the SubSequence of their input (even though in a non-generic context it is always Substring), why wouldn't the type be the longer aforementioned one? How does it know that all subsequent SubSequences of T are the same type? Can someone explain the black magic that is going on here?

StringProtocol refines Collection, and Collection's SubSequence is its own SubSequence.

1 Like

SubSequence is a component of Sequence, and is only inherited by StringProtocol. If it were being designed today, StringProtocol would likely have been defined like this:

protocol StringProtocol: Sequence where SubSequence == Substring {}

But I think StringProtocol is older than the ability to define such constraints. Thus the default value which was the best that could be done at that point.

This would be the better fix, to prevent passing the same problem on to the caller at the next level:

func foo<T>(_ str: T) -> Substring where T: StringProtocol, T.SubSequence == Substring {
  // ...
}

@Lantua beat me to it.

String.SubSequence was removed in SE-0234: Remove Sequence.SubSequence. Gotta move to Collection.

2 Likes

Also, as a side note. A large amount of algorithms work with SubString as well as generic StringProtocol. So I do suggest that you use SubString instead, for convenience (and performance too, I believe).

2 Likes

Thanks @Lantua and @SDGGiesbrecht for your explanations. I had a feeling that it had something to do with legacy. Also, I totally forgot that Collection.SubSequence.SubSequence is equal to Collection.SubSequence, that totally makes sense then.

Anyways, I do like the suggestion of adding the extra requirement T.SubSequence == Substring on foo. That prevents a user from somehow passing in their own type that conforms to StringProtocol, yet for some reason has a SubSequence that is not Substring. I know no one would ever practically do this, but the restriction enforces the intended equality of the two types and fixes the problem.

Right, but Sequence.SubSequence still existed when StringProtocol was created, didn’t it?

Regardless, swap in the name Collection every place I wrote Sequence to translate to the present and the point still stands.

It is the way it is because of an accident of history. It’s stuck the way because of stability guarantees. That’s the only reason there is.

1 Like

But of course. These protocols are ancient, and the main point still stands that it has something to do with its rooted protocols.

Note that you can always convert a Collection into its SubSequence using UnboundedRange_. So you can just take SubString, though I'm don't know if this performance note by @Joe_Groff (How to make `String.Index` with `Int` offset (distance) in O(1) time?) still holds.

1 Like

Why not just work with T.SubSequence? Taking a quick look at StringProtocol, any conforming type's SubSequence must also conform the StringProtocol. So, unless you are using members that are strictly in Substring, you should be OK. Further, if the Swift team decides to go "screw it" and make more conforming types, your code won't choke.

You are generally correct. In my own experience, I have never actually had to write any code that mentions StringProcotol or Substring directly. Generic extensions to Collection and friends have always sufficed. The function discussed here could just as easily have been this instead:

func foo<T>(_ str: T) -> T.SubSequence where T: Collection {
  return str.dropFirst()[...].prefix(20).dropLast(3).suffix(5)
}

But the original poster seemed to want the return type to be Substring for some reason, and the constraint I suggested expressed his apparent intention.

The provided code looks more like a concocted example than a verbatim use case, so I was trying to stick to what the poster actually asked for. Diverging from that runs the risk of the solution not being applicable to whatever the poster was actually trying to solve.