Why does `Character` conform to `LosslessStringConvertible`?

the LosslessStringConvertible protocol has a requirement:

init?(_ description: String)

the init is failable, which suggests that the implementation should return nil on failure, instead of trapping.

but in swift, an un-failable init can satisfy a failable requirement (because we cannot overload on init?). and Character has an unconditional init(_:) that traps on multi-character input:

extension Character
{
    init(_ s: String)
    {
        ...
    }
}

and Character conforms to LosslessStringConvertible.

which means the following crashes at runtime:

func test<S>(s:S.Type) -> S? where S:LosslessStringConvertible
{
   S.init("abc")
}

print(test(s: Character.self) as Any)
Swift/Character.swift:177: Fatal error:
Can't form a Character from a String containing more than one extended
grapheme cluster

this is not how Unicode.Scalar, or Int, or even Bool for that matter behave.

3 Likes

That protocol means if you start with an instance of the type and convert it into a string, then that same exact string can be used to instantiate an instance of the type (and the resulting instance will be equivalent to the one you started with).

There is no expectation that arbitrary strings can be used to instantiate instances of the type. Only strings that were produced from valid instances in the first place.

3 Likes

A failable initializer would be useful. We could add:

extension Character {

  // FIXME: @_implements(LosslessStringConvertible, init(_:))
  public init?(_ string: some StringProtocol) {
    guard
      let character = string.first,
      string.index(after: string.startIndex) == string.endIndex
    else {
      return nil
    }
    self = character
  }
}
1 Like

this is why the init(_:) is failable, so that it can reject strings that were not produced from valid instances in the first place.

after all, this is how RawRepresentable.init(rawValue:) behaves, even though RawRepresentable has a way better claim to trapping behavior than LosslessStringConvertible does.

and this is not how the LSC witnesses Unicode.Scalar.init(_:), Bool.init(_:), Int.init(_:), etc. behave. they all return nil on out-of-domain input.

finally, as a general design principle, i think that calling a protocol requirement should not crash at runtime unless someone has violated a precondition that is documented as part of that requirement. and there’s nothing in the docs for LosslessStringConvertible.init(_:) that says that the string must have been generated from the description property of an instance of the same type.

what are the chances we could get this in as a bug fix for swift 5.8?

1 Like

Perhaps the initializer's documentation could be more explicit about that, because it is certainly implied from the docs for the LosslessStringConvertible protocol:

The description property of a conforming type must be a value-preserving representation of the original value. As such, it should be possible to re-create an instance from its string representation.

It starts talking about the description, and that you should be able to re-create the value from "its string representation" (which I read to mean the value returned by description).

I don't think there is any implication that the initializer should support other strings.

i think there is a difference between "does not support other strings" and "does not crash on other strings".

Int.init("abc")

does not support "abc" as an input string. it shouldn’t support it. that is why it returns nil.

Character.init("abc")

crashes. this is different from saying that Character.init(_:) does not support "abc" as an input string.

1 Like

Sure, but that's not what you said:

You are formally violating its preconditions, because it does say that the string should have come from the description property.

Now, it is reasonable to expect, since the protocol allows for the initializer to be failable, that misuse should be signalled by returning nil (and I agree that it would be friendlier to do that), but actually that's the part that is not formally specified.

i read that as a warning that the description witness must not discard any information needed to reconstruct the value, not as saying it is unsafe to pass any value to the init that did not come from description.

My hunch is that it's precisely because an unfailable init can satisfy this requirement, as it does in Character, that the standard library protocol documentation deliberately remains silent on this point.

I do agree it would make sense for Character to do something friendlier if possible, such as what @benrimmington suggests. However, seeing as @_implements is not a public feature, making it a semantic requirement of the protocol to behave in this way would prohibit any end user type with an unfailable init from conforming to the protocol and I don't think we'd want to do that.

The protocol documentation doesn't say anything about the behavior of init? when the value does not come from description, and there is no general rule that a failable init? must return nil to signal all failures unless otherwise documented.

I don't think you'll get any argument that this isn't the most user-friendly state of affairs, nor do I think it's a bad idea to make standard library types behave more consistently on this point where possible (from an ABI compatibility and performance perspective), but I don't see a way out in the general case short of major changes to the language.

1 Like

@_implements is because Character.init(_:) doesn't have an argument label, so it is needed to disambiguate. it has nothing to do with the semantics of LSC.

other types can continue to have inits with trapping behavior by just giving the parameter an argument label, like init(description:).

Right, but if you want to formalize that the semantics of LosslessStringConvertible require init? never to trap, then any conforming type with an unlabeled init(_: String) that traps would have to use @_implements in order to conform, but @_implements is not a public feature—meaning, any user type with an unlabeled init(_: String) would be prohibited from conforming to LosslessStringConvertible if init? were specified with never-trapping semantics.

1 Like

any witness is "allowed to trap", i don't think we have many requirements in the standard library that explicitly say "the witness for this requirement cannot trap".

so LSC, like all the other protocols, is silent on this matter, and that's okay.

Character on the other hand is a pretty commonly-used database schema type, and it would be tremendously helpful if it could take advantage of error-handling paths people writing LSC-based generic code have already implemented instead of crashing the server.

@_implements is just a way to make this change to the standard library without breaking existing code that uses the non-failable initializer in a concrete type context.

i'm kind of surprised by the amount of pushback on this idea, is there really any code out there that is relying on the trapping behavior?

2 Likes

No pushback from me on making Character be more user-friendly if it can be done in an ABI-compatible way without a performance hit for the non-failing case.

The pushback is that this is what's required by LosslessStringConvertible semantics such that your example func test<S: LosslessStringConvertible>(s: S.Type) will never crash at runtime for any arbitrary correctly conforming type S.

1 Like

Exactly this. LosslessStringConvertible is not Parsable.

If you try to construct unknown types from arbitrary strings, some of them may fail by returning nil, but others may be stricter and decide to trap - because you violated the protocol's semantics.

I would compare it to something like calling index(after: endIndex) on a Collection. It's an invalid operation and the protocol does not specify any particular behaviour (it just says it must be well defined, and return the same value every time) - some implementations trap, other implementations may decide to keep returning endIndex forever. In a generic context, you can't expect any particular outcome.

My suggestion was to add a failable overload, with some StringProtocol parameter, which would need an evolution proposal. It's probably too late for Swift 5.8, because the release branch will be cut on December 19 (and then locked for the holidays).

The existing initializer would remain (for ABI and source compatibility), and would continue to satisfy the protocol requirement, unless @_implements can be made to work.

that's… disappointing to hear.

i guess i will probably end up working around this by adding concrete Character overloads to my own APIs that are currently generic over LosslessStringConvertible so that those are called (in concrete type contexts) instead of the generic API.

1 Like