[Pitch] String Input-validating Initializers

I gave a full example in the post above for the existing String(decoding:as:) initialiser.

I can copy/paste that in to an Xcode project and it works, and has autocomplete. Does it not work for you?

Ah, I see. I'd missed a part of your post (user error) and misconstrued the example as hypothetical.

Here is a draft implementation of this proposal as a staged package:

Note that up to this point the (all-or-nothing) validation code has not been updated. Making it better in a variety of ways is follow-up work to this proposal. A final implementation in the standard library would be slightly different, but functionally the same.

I think that throwing an error with no information other than "we failed" is functionally identical to optional (you can even use try? to make it an optional) and it means that the future direction is simply adding more cases to the error with more fine-grained information. Is there a reason that ending up with a throwing and an optional initializer is preferable?

1 Like

Interestingly I do not see returning Result<String, Error> being discussed as an alternative. FWIW it's is somewhat more "round trip compatible" with throw compared to optional. Has Result got out of vogue?

Result was never really in vogue, especially for the core team. It was a type of necessity added to the standard library both as a hedge against the time it would take for true async support as well as the niche functionality it could provide immediately (typed error propagation), and its resulting popularity in the community due to those uses. While I've used it extensively in the past for complex typed-error flows and more modern promise-like behavior with async, its overall ergonomics don't recommend it for normal error production. While its typed errors can be beneficial in some circumstances, most users don't care about that most of the time. And really, since Result<String, Error> is just Result { try stringProducer() }, there's no benefit to producing it directly at all. Additionally, the same performance issues that arise with throws arise with Result, and may be worse given throws slightly more optimal runtime behavior, so it doesn't gain us anything here.

3 Likes

If there's enough interest we could spin this off. To me the simplest solution would be to make throw a function, accepting a closure, that closure would be evaluated when the function in question is called with "try" and not evaluated when it is called with "try?"

    throw { SomeError.error(param1, makeCostlyParam()) }

(those who fancy autoclosure would prefer that form:)

    throw(SomeError.error(param1, makeCostlyParam()))

Plus the relevant (hopefully not massive) changes to the compiler.

If this is done, the "try?" would really become a zero cost abstraction similar in performance to returning an Optional.


OTOH.. This optimises the failure path. Normally throwing an error (or an exception in other languages) is considered an exceptional event, and typically only the "happy path" is worth making "zero cost" as "unhappy path" won't happen too often.

Well, that's the age-old philosophical religious war, isn't it? :laughing:

Objective-C has true exceptions that are genuinely only used for really exceptional stuff - where you're basically going to crash most of the time anyway - and has manual error handling (NSError etc) instead for things that are softer errors; things that often make sense to handle gracefully. [1]

Swift doesn't really have the same setup. There's returning nil (plus failable initialisers) but that's not really equivalent to NSError since it provides absolutely no information on why the failure occurred. So Swift exceptions have to serve that purpose.

Thus, I don't think Swift realistically allows you to avoid throwing exceptions - potentially frequently - in real-world, correctly-functioning code.

Easy to test:

Then just put "Swift throw" in Xcode's console output filter and see throws happening in realtime, along with the current count. They don't happen too often in my app.

it is a lot harder to write a chain type over two (or more) base collections that conforms to Collection than it is to write a chain type that only conforms to Sequence, because Sequence only requires an iterator.

i have written the latter kind of wrapper type countless times. the former is something i only attempt as a last resort.

2 Likes

To refine my position: I think that if this pitch proposes convenience initializers for validation, it should also include convenience initializers for (error-correcting) decoding, which is the more common/preferred API path. This would elevate the visibility of decoding.

E.g.:

extension String {
  @_alwaysEmitIntoClient
  public init(decodingUTF8 bytes: some Collection<UInt8>) {
    self.init(decoding: bytes as: UTF8.self)
  }

  @_alwaysEmitIntoClient
  public init(decodingUTF16 bytes: some Collection<UInt8>) {
    self.init(decoding: bytes as: UTF16.self)
  }

  @_alwaysEmitIntoClient
  public init(decodingUTF32 bytes: some Collection<UInt8>) {
    self.init(decoding: bytes as: UTF32.self)
  }
}

Future work are normalizing inits and picking a default for String(utf8: myBytes).

This is still a lot of API surface area.

I made a new Xcode project, only imported Foundation, and here's what autocomplete gives me for String.init:

String initialisers-2

Now we're considering adding init(validatingFromUTF8:), and maybe init(decodingFromUTF8:) and init(normalizingFromUTF8:)? And then all of those again for UTF-16? And then again for UTF-32?

It's too much. If we accept that there are discoverability issues, might I suggest that we are overwhelming users with too many initialisers? Adding yet more initialisers might not be the answer we seek and may even be counterproductive.

--

Also, the 3 Unicode codecs are not of equal importance. I don't think they all deserve convenience initialisers.

  1. UTF8 is obviously vital - it doesn't have endianness concerns, so it's the best format to use for documents and data which may be used on multiple systems (e.g. basically everything on the internet or stored to disk). A UTF8 initialiser is also good for ASCII strings. I would support a convenience initialiser for UTF8.

  2. UTF16 is occasionally useful, but at least an order of magnitude less so than UTF8. For us, it's mostly important for bridging to NSString, but we mostly do that through actual bridging. I doubt that so many users manually bridge strings by decoding UTF16 code-units that it's worth adding specific validatingUTF16: and decodingUTF16: initialisers.

  3. UTF32 is another order of magnitude less common than UTF16 (or even two); it is almost never used by regular programmers. It's useful for implementing Unicode algorithms, but Swift-native implementations should probably prefer to work in terms of Unicode.Scalar. It's very hard to justify a set of convenience initialisers for UTF32.

--

Final note: as shown above, some existing APIs are named dangerously closely to the proposed ones:

  • init?(utf8String:)
  • init?(validatingUTF8:)

These take [CChar] parameters, and will both fatalError if the data isn't null-terminated.

I don't think users will be happy that init(validatingFromUTF8:) works but init(validatingUTF8:) crashes their programs. Ditto for init(decodingFromUTF8:) vs. init(utf8String:).

There's also init(utf16CodeUnits:count:) from Foundation, which takes an UnsafePointer<unichar>. I don't even know why we're still exposing that API. It should be deprecated.

Maybe we could also rename Foundation's init(bytes:encoding:) to init(bytes:legacyEncoding:) or something? To emphasise that it's for ancient encodings only.

7 Likes

I agree that the three don't have the same importance. I wonder if we could just add the UTF-8 case and use its documentation to point users towards the generic initializer.

The stdlib's init?(validatingUTF8:) is motivated by C interop and is badly named; I propose renaming it. The main validating initializer pitched here is necessary if we ever want the ability to deprecate Foundation's init?(utf8String:), though what happens to Foundation is not determined by this pitch.

2 Likes

I am also supportive of (and generally in favor of) convenience inits only for UTF8.

2 Likes

I've updated the proposal in Proposal for String Validating Initializers by glessard Β· Pull Request #2110 Β· apple/swift-evolution Β· GitHub. This removes the UTF-16 and UTF-32 convenience initializers.

From your other recent pitch:

If strlen is commonly reimplemented, then should it be in the standard library, possibly as an UnsafeBufferPointer initializer?

(It probably wouldn't validate as UTF-8/16/32, so maybe off-topic here.)

1 Like

I don’t think anybody should reimplement strlen, and it’s probably not needed in the standard library beyond the wrappers that already exist. One such wrapper does happen to be the first library call in String.init(cString:), which is why I used that idea as an example.

One direction of interest is a wrapper for pointer+terminator, for C interop. That is probably different than our pointer+length wrappers.

2 Likes