[Pitch] String Input-validating Initializers

Karl · July 19, 2023, 2:16pm

I gave a full example in the post above for the existing String(decoding:as:) initialiser.

I can copy/paste that in to an Xcode project and it works, and has autocomplete. Does it not work for you?

glessard · July 19, 2023, 2:27pm

Ah, I see. I'd missed a part of your post (user error) and misconstrued the example as hypothetical.

glessard · July 19, 2023, 5:34pm

Here is a draft implementation of this proposal as a staged package:

Note that up to this point the (all-or-nothing) validation code has not been updated. Making it better in a variety of ways is follow-up work to this proposal. A final implementation in the standard library would be slightly different, but functionally the same.

griotspeak · July 19, 2023, 7:05pm

I think that throwing an error with no information other than "we failed" is functionally identical to optional (you can even use try? to make it an optional) and it means that the future direction is simply adding more cases to the error with more fine-grained information. Is there a reason that ending up with a throwing and an optional initializer is preferable?

tera · July 20, 2023, 12:07am

Interestingly I do not see returning Result<String, Error> being discussed as an alternative. FWIW it's is somewhat more "round trip compatible" with throw compared to optional. Has Result got out of vogue?

Jon_Shier · July 20, 2023, 1:56am

Result was never really in vogue, especially for the core team. It was a type of necessity added to the standard library both as a hedge against the time it would take for true async support as well as the niche functionality it could provide immediately (typed error propagation), and its resulting popularity in the community due to those uses. While I've used it extensively in the past for complex typed-error flows and more modern promise-like behavior with async, its overall ergonomics don't recommend it for normal error production. While its typed errors can be beneficial in some circumstances, most users don't care about that most of the time. And really, since Result<String, Error> is just Result { try stringProducer() }, there's no benefit to producing it directly at all. Additionally, the same performance issues that arise with throws arise with Result, and may be worse given throws slightly more optimal runtime behavior, so it doesn't gain us anything here.

tera · July 20, 2023, 2:33am

wadetregaskis:

Can this be solved generally, as a lateral solution to this debate? e.g. could the Swift compiler automagically produce specialisations of throwing functions that just return nil on failure (stripping out any then 'dead' code related to exception generation) if the caller is ignoring the exception anyway (e.g. with try?)?

Aside from making the code simpler - no need to manually write both versions - it could be advantageous to leave that decision to the compiler, as it can hollistically evaluate whether the specialisation is actually worth it (in code size etc, potentially guided by PGO, taking into account the ripple effect into child functions, etc).

Or perhaps this can be done - a little more explicitly and manually - via a macro that can be attached to an exception-producing function to synthesise a failable variant of the function?

If there's enough interest we could spin this off. To me the simplest solution would be to make throw a function, accepting a closure, that closure would be evaluated when the function in question is called with "try" and not evaluated when it is called with "try?"

    throw { SomeError.error(param1, makeCostlyParam()) }

(those who fancy autoclosure would prefer that form:)

    throw(SomeError.error(param1, makeCostlyParam()))

Plus the relevant (hopefully not massive) changes to the compiler.

If this is done, the "try?" would really become a zero cost abstraction similar in performance to returning an Optional.

OTOH.. This optimises the failure path. Normally throwing an error (or an exception in other languages) is considered an exceptional event, and typically only the "happy path" is worth making "zero cost" as "unhappy path" won't happen too often.

wadetregaskis · July 20, 2023, 5:39am

Well, that's the age-old philosophical religious war, isn't it?

Objective-C has true exceptions that are genuinely only used for really exceptional stuff - where you're basically going to crash most of the time anyway - and has manual error handling (NSError etc) instead for things that are softer errors; things that often make sense to handle gracefully. [1]

Swift doesn't really have the same setup. There's returning nil (plus failable initialisers) but that's not really equivalent to NSError since it provides absolutely no information on why the failure occurred. So Swift exceptions have to serve that purpose.

Thus, I don't think Swift realistically allows you to avoid throwing exceptions - potentially frequently - in real-world, correctly-functioning code.

tera · July 20, 2023, 11:55am

Easy to test:

Then just put "Swift throw" in Xcode's console output filter and see throws happening in realtime, along with the current count. They don't happen too often in my app.

taylorswift · July 20, 2023, 3:10pm

it is a lot harder to write a chain type over two (or more) base collections that conforms to Collection than it is to write a chain type that only conforms to Sequence, because Sequence only requires an iterator.

i have written the latter kind of wrapper type countless times. the former is something i only attempt as a last resort.

Michael_Ilseman · July 24, 2023, 6:38pm

To refine my position: I think that if this pitch proposes convenience initializers for validation, it should also include convenience initializers for (error-correcting) decoding, which is the more common/preferred API path. This would elevate the visibility of decoding.

E.g.:

extension String {
  @_alwaysEmitIntoClient
  public init(decodingUTF8 bytes: some Collection<UInt8>) {
    self.init(decoding: bytes as: UTF8.self)
  }

  @_alwaysEmitIntoClient
  public init(decodingUTF16 bytes: some Collection<UInt8>) {
    self.init(decoding: bytes as: UTF16.self)
  }

  @_alwaysEmitIntoClient
  public init(decodingUTF32 bytes: some Collection<UInt8>) {
    self.init(decoding: bytes as: UTF32.self)
  }
}

Future work are normalizing inits and picking a default for String(utf8: myBytes).

Karl · July 24, 2023, 9:34pm

This is still a lot of API surface area.

I made a new Xcode project, only imported Foundation, and here's what autocomplete gives me for String.init:

String initialisers-2

Now we're considering adding init(validatingFromUTF8:), and maybe init(decodingFromUTF8:) and init(normalizingFromUTF8:)? And then all of those again for UTF-16? And then again for UTF-32?

It's too much. If we accept that there are discoverability issues, might I suggest that we are overwhelming users with too many initialisers? Adding yet more initialisers might not be the answer we seek and may even be counterproductive.

--

Also, the 3 Unicode codecs are not of equal importance. I don't think they all deserve convenience initialisers.

UTF8 is obviously vital - it doesn't have endianness concerns, so it's the best format to use for documents and data which may be used on multiple systems (e.g. basically everything on the internet or stored to disk). A UTF8 initialiser is also good for ASCII strings. I would support a convenience initialiser for UTF8.
UTF16 is occasionally useful, but at least an order of magnitude less so than UTF8. For us, it's mostly important for bridging to NSString, but we mostly do that through actual bridging. I doubt that so many users manually bridge strings by decoding UTF16 code-units that it's worth adding specific validatingUTF16: and decodingUTF16: initialisers.
UTF32 is another order of magnitude less common than UTF16 (or even two); it is almost never used by regular programmers. It's useful for implementing Unicode algorithms, but Swift-native implementations should probably prefer to work in terms of Unicode.Scalar. It's very hard to justify a set of convenience initialisers for UTF32.

--

Final note: as shown above, some existing APIs are named dangerously closely to the proposed ones:

init?(utf8String:)
init?(validatingUTF8:)

These take [CChar] parameters, and will both fatalError if the data isn't null-terminated.

I don't think users will be happy that init(validatingFromUTF8:) works but init(validatingUTF8:) crashes their programs. Ditto for init(decodingFromUTF8:) vs. init(utf8String:).

There's also init(utf16CodeUnits:count:) from Foundation, which takes an UnsafePointer<unichar>. I don't even know why we're still exposing that API. It should be deprecated.

Maybe we could also rename Foundation's init(bytes:encoding:) to init(bytes:legacyEncoding:) or something? To emphasise that it's for ancient encodings only.

glessard · July 24, 2023, 9:58pm

I agree that the three don't have the same importance. I wonder if we could just add the UTF-8 case and use its documentation to point users towards the generic initializer.

The stdlib's init?(validatingUTF8:) is motivated by C interop and is badly named; I propose renaming it. The main validating initializer pitched here is necessary if we ever want the ability to deprecate Foundation's init?(utf8String:), though what happens to Foundation is not determined by this pitch.

Michael_Ilseman · July 24, 2023, 10:04pm

I am also supportive of (and generally in favor of) convenience inits only for UTF8.

glessard · August 1, 2023, 11:44pm

I've updated the proposal in Proposal for String Validating Initializers by glessard · Pull Request #2110 · apple/swift-evolution · GitHub. This removes the UTF-16 and UTF-32 convenience initializers.

benrimmington · August 2, 2023, 5:23am

From your other recent pitch:

If strlen is commonly reimplemented, then should it be in the standard library, possibly as an UnsafeBufferPointer initializer?

(It probably wouldn't validate as UTF-8/16/32, so maybe off-topic here.)

glessard · August 2, 2023, 5:42am

I don’t think anybody should reimplement strlen, and it’s probably not needed in the standard library beyond the wrappers that already exist. One such wrapper does happen to be the first library call in String.init(cString:), which is why I used that idea as an example.

One direction of interest is a wrapper for pointer+terminator, for C interop. That is probably different than our pointer+length wrappers.