[Returned for revision] SE-0089: Renaming String.init<T>(_: T)

gribozavr · May 30, 2016, 6:31am

Right, this was the intent. The intent was that Streamable is
something that is a container of string-like data, as opposed to other
things that have-a string representation.

Dmitri

···

On Sun, May 29, 2016 at 11:23 PM, Brent Royal-Gordon via swift-evolution <swift-evolution@swift.org> wrote:

There are *very* few conformances to Streamable in the standard library—just Character, String, and UnicodeScalar. I think that Streamable is for data that can be *directly* written to an output stream, whereas CustomStringConvertible is a way to convert an instance that *isn't* directly Streamable into something Streamable.

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/

trs · May 31, 2016, 3:12pm

I agree with Vladimir. Having a value preserving representation is orthogonal to a human readable representation.

-Thorsten

···

Am 31.05.2016 um 00:04 schrieb Vladimir.S via swift-evolution <swift-evolution@swift.org>:

Thank you, Brent. But for me you just described a serialization to/from string ;)

So, using your example, if I have

struct A: CustomStringConvertible {
   var a = 0, b = 0
   var description: String { return "a:\(a) b:\(b)" }
}

and I want to use it in your code. Will I be able to do this?
For example, I'm ok to give it 'lossless' representation as "\(a)/\(b)" (i.e. 1/2 for example) and provide init() for string of such format.

I.e. it seems like I just should create extension and conform to LosslessStringConvertible, but as I understand I can't, as I need not only introduce an init(:String), but modify `description` property ?
I.e. assume you have no rights or don't want to modify the A type itself.

This is why I don't understand why we should have the same .description for this LosslessStringConvertible(i.e. if it will be .loslessDescription - no problems).
Most likely I don't understand something.

Also there is a question regarding your example: what to do with Double type? We can have some configuration items of Double type, but how to use this LosslessStringConvertible here?

On 31.05.2016 0:22, Brent Royal-Gordon wrote:

I can't understand this. For me ValuePreservingStringConvertible usually will be different than CustomStringConvertible. Can't I want to have some string view of my struct to present data(also in UI) *and* value preserving string value for the same struct?
So my .description will show something like "12.345-6.78" and value preserving string will contain something like "19083749347923847293487293483" to encode the data of the struct. No?

Rather than thinking of LosslessStringConvertible as a protocol for serializing data into a string, think of it as a protocol for those cases where the human-readable description is also parseable and can be used to completely recreate the instance. It's something you would use for things like command-line arguments, environment variables, interactive command-line programs, and configuration files that you expect humans to read and write by hand.

   func prompt<T: LosslessStringConvertible>(for field: String, of type: T.Type) -> T {
       while true {
           print("What's your \(field)?")

           let line = readline()

           if !line.isEmpty
               let value = T(line) { // how the hell do you indent this stupid syntax?
               return value
           }
       }
   }

   let name = prompt(for: "name", of: String)
   let age = prompt(for: "age", of: Int)

   let answer = age < 13 ? " not" : ""
   print("\(name), you are\(answer) too old to have a favorite color.")

In other words, write the `description` first, and then decide if you can write a good `init(_ description:)` to match it.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Patrick_Smith · June 1, 2016, 8:33am

Thank you Brent. I like your points, and agree that localisation is not a simple problem. Also interesting to see the latest discussion (from Vladimir, Dave, Austin, and Hooman).

Replied inline below:

Thanks Chris. I just meant where is that string going?

To a developer -> CustomDebugStringConvertible / Reflection
To standard output -> Streamable
To a user -> NSLocalizedString — no protocol (yet?)
To an API / for serialisation -> LosslessStringConvertible
To a playground -> CustomPlaygroundQuickLookable

CustomStringConvertible is left over, but doesn’t have a use case? Unless it’s an alternative to Streamable, but then why have Streamable?

There are *very* few conformances to Streamable in the standard library—just Character, String, and UnicodeScalar. I think that Streamable is for data that can be *directly* written to an output stream, whereas CustomStringConvertible is a way to convert an instance that *isn't* directly Streamable into something Streamable.

OK, that’s an interesting distinction (and interesting protocol in Streamable — it feels as though it has some functionality to come).

For developers, like ourselves, it seems straight-forward that a string is this simple primitive. We get them in, we process them, and we spit them back out. However, this is a flawed system, as it one that is made easiest for the programmer, and is really designed for a context where the user is also a programmer. It best suits technical scenarios such as configuration files, environment variables, and command line arguments, as Brent suggests. However, I don’t think this is the best case to design for.

So, here's my version of your table:

User-readable, nonlocalized: CustomStringConvertible
User- and machine-readable, nonlocalized: LosslessStringConvertible
User-readable, localized: (nothing)
Developer-readable: CustomDebugStringConvertible

(Playground isn't necessarily working with strings, so it doesn't belong in this list.)

The first item in your table ‘User-readable, non-localised’, is the big problem area to me. Ideally in my mind all of these should be moved to other areas, such as the second area that LosslessStringConvertible occupies, which command line arguments and configuration keys certainly could. And user-readable should use a system that always allows localisation to be added progressively, by use of type extensions or protocols.

In a UI application, everything that is displayed should be using a system which allows localisation. I would argue a command line tool is also a UI application. I would not advocate for a full-on locale system like the one Foundation has to be brought to the Swift standard library (unless eventually it’s easy to integrate a standard a la Unicode).

Localization is an obvious hole in our string conversions, but I think the reality here is that localization is part of a higher layer than the standard library. From what I can see, all of the "standard library" APIs which handle localization are actually part of Foundation. I'm sure that, if we build any localization-related features into the language, we'll add basic supporting code to the standard library if needed, but other than that, I don't think the standard library is the right place.

I believe best practices can be put in place with a system no more complicated for the programmer than the one we have now. This could be possible with protocols: a core protocols in the standard library that are then fleshed out in a Foundation-level framework above, with Locale / CultureCode / etc types extending or conforming.

I’m not sure if anyone else shares the concern, so I’ll leave it. I do believe it’s important however.

I do think this is an important concern, and I also think it's important to ask how interpolation interacts with it. For instance, I think it would be very useful to be able to say "interpolate developer representations" or "interpolate user representations" or "interpolate localized user representations", and have the compiler reject interpolated expressions which don't have the necessary representation.

I like this idea. I think “interpolate localised user representations” should not be distinct from “interpolate user representations”. Instead non-localised is specifically denoted as ‘technical’ or perhaps ‘en-US’. Locales, or more broadly ‘contexts’, are not something additional, instead, everything already has a context, and the context of a string could be made more explicit.

···

On 30 May 2016, at 4:23 PM, Brent Royal-Gordon <brent@architechies.com> wrote:

--
Brent Royal-Gordon
Architechies

dabrahams · May 31, 2016, 10:22pm

I too think he's making a good point. I could be missing something, but
it seems to me we don't fully understand this design yet.

···

on Tue May 31 2016, Thorsten Seitz <swift-evolution@swift.org> wrote:

I agree with Vladimir. Having a value preserving representation is
orthogonal to a human readable representation.

--
Dave

Austin · May 31, 2016, 10:43pm

My original v2 proposal draft suggested making value preserving and human
readable descriptions orthogonal, but Chris and a few other people
suggested they liked the originally suggested design better. I have a link
in the "Alternatives" section to that original draft.

Austin

···

On Tue, May 31, 2016 at 3:22 PM, Dave Abrahams via swift-evolution < swift-evolution@swift.org> wrote:

on Tue May 31 2016, Thorsten Seitz <swift-evolution@swift.org> wrote:

> I agree with Vladimir. Having a value preserving representation is
> orthogonal to a human readable representation.

I too think he's making a good point. I could be missing something, but
it seems to me we don't fully understand this design yet.

--
Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

hooman · May 31, 2016, 11:02pm

This arises a different question: Should `description` always return the same value? For example: Can `description` of “May 31th, 2016” return “Today” if we evaluate it today and return “Yesterday” if we evaluate it tomorrow? Are such side-effects (using a volatile global value) permitted? Then how about localization? Can description be locale-aware without breaking the protocol?

···

On May 31, 2016, at 3:43 PM, Austin Zheng via swift-evolution <swift-evolution@swift.org> wrote:

My original v2 proposal draft suggested making value preserving and human readable descriptions orthogonal, but Chris and a few other people suggested they liked the originally suggested design better. I have a link in the "Alternatives" section to that original draft.

Austin

On Tue, May 31, 2016 at 3:22 PM, Dave Abrahams via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

on Tue May 31 2016, Thorsten Seitz <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

> I agree with Vladimir. Having a value preserving representation is
> orthogonal to a human readable representation.

I too think he's making a good point. I could be missing something, but
it seems to me we don't fully understand this design yet.

--
Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Chris_Lattner · June 4, 2016, 6:36pm

In my opinion, no, these would be inappropriate for a “value preserving” implementation.

-Chris

···

On May 31, 2016, at 4:02 PM, Hooman Mehr via swift-evolution <swift-evolution@swift.org> wrote:

This arises a different question: Should `description` always return the same value? For example: Can `description` of “May 31th, 2016” return “Today” if we evaluate it today and return “Yesterday” if we evaluate it tomorrow? Are such side-effects (using a volatile global value) permitted? Then how about localization? Can description be locale-aware without breaking the protocol?

Brent_Royal-Gordon · June 5, 2016, 6:31am

Sorry, I meant to reply to this but forgot.

For developers, like ourselves, it seems straight-forward that a string is this simple primitive. We get them in, we process them, and we spit them back out. However, this is a flawed system, as it one that is made easiest for the programmer, and is really designed for a context where the user is also a programmer. It best suits technical scenarios such as configuration files, environment variables, and command line arguments, as Brent suggests. However, I don’t think this is the best case to design for.

It is *a* case we need to design for. It is also the *base* case: localization is layered on top of non-localized constructs.

So, here's my version of your table:

User-readable, nonlocalized: CustomStringConvertible
User- and machine-readable, nonlocalized: LosslessStringConvertible
User-readable, localized: (nothing)
Developer-readable: CustomDebugStringConvertible

(Playground isn't necessarily working with strings, so it doesn't belong in this list.)

The first item in your table ‘User-readable, non-localised’, is the big problem area to me. Ideally in my mind all of these should be moved to other areas, such as the second area that LosslessStringConvertible occupies, which command line arguments and configuration keys certainly could. And user-readable should use a system that always allows localisation to be added progressively, by use of type extensions or protocols.

In a UI application, everything that is displayed should be using a system which allows localisation.

In theory, yes. In practice? We write code that will be used only once (like an ad-hoc fix for some problem). We write code that will never be exposed to users (like a background process). We write code that is kept in one limited environment (like a company internal app). We write code that simply isn't going to be localized for business reasons (like an app that wouldn't be profitable to translate).

We write code that constructs strings without caring what their contents are (think of a Markdown converter). We write code that emits strings which are primarily for machines, but formatted to be convenient for humans—particularly human programmers working with the formats—to understand (think of a reporting tool that emits CSV with column headings in English). We write code where we know everyone will understand a certain language (air traffic control is conducted entirely in English worldwide). We write code that's too low-level to be localized. We write unit tests (hopefully).

And we write code when we're just learning how to program, and printing the result of 1 + 2 in French is the last thing on our minds.

So yes, in a meticulously-engineered ideal application, you would have little call for "user-readable, nonlocalized". But that's not what people write a lot of the time.

To be clear: If there is a *low-cost* way to make sure that UI text is localizable by default, I'm all for it. (And I even have an idea or two in that area.) But I don't think bringing localization into the standard library is how you make it low-cost. Remember, Foundation can always add localization to any standard library type it wants through extensions.

I would argue a command line tool is also a UI application.

Sure, but see the above. (Plus, command line tools *do* have a stronger legitimate need for non-localized stuff—think of things like command-line switches and environment variables, communicating over filehandles and pipes, "text" that's actually UI like twirlers and progress bars, etc.)

Localization is an obvious hole in our string conversions, but I think the reality here is that localization is part of a higher layer than the standard library. From what I can see, all of the "standard library" APIs which handle localization are actually part of Foundation. I'm sure that, if we build any localization-related features into the language, we'll add basic supporting code to the standard library if needed, but other than that, I don't think the standard library is the right place.

I believe best practices can be put in place with a system no more complicated for the programmer than the one we have now. This could be possible with protocols: a core protocols in the standard library that are then fleshed out in a Foundation-level framework above, with Locale / CultureCode / etc types extending or conforming.

I'm not sure what the purpose would be of having a protocol in the standard library which didn't offer even a lick of the promised functionality without a higher-level framework. What do we gain by having `localizedDescription` in the standard library if nothing written against only the standard library can actually emit a localized description?

I’m not sure if anyone else shares the concern, so I’ll leave it. I do believe it’s important however.

I do think this is an important concern, and I also think it's important to ask how interpolation interacts with it. For instance, I think it would be very useful to be able to say "interpolate developer representations" or "interpolate user representations" or "interpolate localized user representations", and have the compiler reject interpolated expressions which don't have the necessary representation.

I like this idea. I think “interpolate localised user representations” should not be distinct from “interpolate user representations”. Instead non-localised is specifically denoted as ‘technical’ or perhaps ‘en-US’. Locales, or more broadly ‘contexts’, are not something additional, instead, everything already has a context, and the context of a string could be made more explicit.

I mean, you can call it "non-localized" or you can call it "technical", but a rose by any other name smells just as sweet.

···

--
Brent Royal-Gordon
Architechies

Patrick_Smith · June 5, 2016, 9:20am

Sorry, I meant to reply to this but forgot.

No worries Brent! Thanks for the thoughtful reply.

For developers, like ourselves, it seems straight-forward that a string is this simple primitive. We get them in, we process them, and we spit them back out. However, this is a flawed system, as it one that is made easiest for the programmer, and is really designed for a context where the user is also a programmer. It best suits technical scenarios such as configuration files, environment variables, and command line arguments, as Brent suggests. However, I don’t think this is the best case to design for.

It is *a* case we need to design for. It is also the *base* case: localization is layered on top of non-localized constructs.

So, here's my version of your table:

User-readable, nonlocalized: CustomStringConvertible
User- and machine-readable, nonlocalized: LosslessStringConvertible
User-readable, localized: (nothing)
Developer-readable: CustomDebugStringConvertible

(Playground isn't necessarily working with strings, so it doesn't belong in this list.)

The first item in your table ‘User-readable, non-localised’, is the big problem area to me. Ideally in my mind all of these should be moved to other areas, such as the second area that LosslessStringConvertible occupies, which command line arguments and configuration keys certainly could. And user-readable should use a system that always allows localisation to be added progressively, by use of type extensions or protocols.

In a UI application, everything that is displayed should be using a system which allows localisation.

In theory, yes. In practice? We write code that will be used only once (like an ad-hoc fix for some problem). We write code that will never be exposed to users (like a background process). We write code that is kept in one limited environment (like a company internal app). We write code that simply isn't going to be localized for business reasons (like an app that wouldn't be profitable to translate).

We write code that constructs strings without caring what their contents are (think of a Markdown converter). We write code that emits strings which are primarily for machines, but formatted to be convenient for humans—particularly human programmers working with the formats—to understand (think of a reporting tool that emits CSV with column headings in English). We write code where we know everyone will understand a certain language (air traffic control is conducted entirely in English worldwide). We write code that's too low-level to be localized. We write unit tests (hopefully).

And we write code when we're just learning how to program, and printing the result of 1 + 2 in French is the last thing on our minds.

So yes, in a meticulously-engineered ideal application, you would have little call for "user-readable, nonlocalized". But that's not what people write a lot of the time.

Strings are this very flexible type. Currently the only validations I know of that the String type does are conformance to the various Unicode encodings.

I think it’s similar to pointer safety. A pointer in C can point to anything. The programmer might be sure its valid, but the computer isn’t until it dereferences it. It could be null (crash) or it could be pointing to an already deallocated object or something else entirely (worse than a crash). Swift tries to save us from making these mistakes. Similarly, the programmer could be 100% percent sure her cast will succeed, that this object is of a certain class or conforms to a certain protocol. Swift makes these casts safe, by either crashing immediately or letting the programmer decide what to do with nil, or avoids them by use of generics.

A String in Swift can mean anything. It could be empty, it could be 7000 characters long, it could be formatted incorrectly or contain illegal characters. `String` says as much to me as `id` does in Objective-C. It’s up to me to decide what the meaning is and whether it’s valid yet or not.

I wonder if most data-facing strings could use a string-represented enum or struct instead?

e.g.

enum GitCommand : String {
  case clone = "clone"
  case init = "init"
  case add = "add"
  case mv = "mv"
  ...
}

which can be conveniently shortened to:

enum GitCommand : String {
case clone, init, add, mv, ...
}

A initialized GitCommand value can only be valid, which leads to clearer and safer code.

Loose string identifiers such as CSV column headings could use a struct that conforms to RawRepresentable / LosslessStringConvertible. The failable initializer could trim whitespace and validate, and generally conform it into an ideal form. There’s no flags for `isValidated` or assumptions that you bring by using a naked String — if you have an CSVHeading value in hand, you know that it is valid:

struct CSVHeading : RawRepresentable {
typealias RawValue = String

var rawValue: String

  init?(rawValue: String) {
    let trimmed = rawValue.stringByTrimmingCharactersInSet(.whitespaceCharacterSet)

    guard trimmed.rangeOfCharacterFromSet(.illegalCharacterSet) == nil else {
      return nil
    }

// More validations here as per RFC 4180 - Common Format and MIME Type for Comma-Separated Values (CSV) Files

self.rawValue = trimmed
}
}

(This raises a point — what’s the difference between the proposed LosslessStringConvertible and RawRepresentable where RawValue = String? They both have a failable init. Is it due to current limitations with typealiases that makes this hard?)

Swift makes this so easy compared to Objective-C, where you would have worried about the overhead of allocating about a wrapper object. In Swift, as I understand it, a struct with a single String member should be of similar weight in memory and performance to using that String by itself. A whole bunch of them in a typed Collection would take up the same amount of memory?

Note the String type would be still used for a situations such as parsing and formatting. But I don’t think they need to be used for everything where something better constructed can be used. And Formatting could have a whole range of interesting designs too.

To be clear: If there is a *low-cost* way to make sure that UI text is localizable by default, I'm all for it. (And I even have an idea or two in that area.) But I don't think bringing localization into the standard library is how you make it low-cost. Remember, Foundation can always add localization to any standard library type it wants through extensions.

I would love to find a low-cost way because I think Swift opens many opportunities to enable it, has an amazing team of library designers in the Swift standard library and from the Cocoa frameworks, and we have a chance here with a blank canvas to raise the bar like with Swift’s Unicode support. Glad to hear you have some ideas — look forward to hearing them!

I would argue a command line tool is also a UI application.

Sure, but see the above. (Plus, command line tools *do* have a stronger legitimate need for non-localized stuff—think of things like command-line switches and environment variables, communicating over filehandles and pipes, "text" that's actually UI like twirlers and progress bars, etc.)

Localization is an obvious hole in our string conversions, but I think the reality here is that localization is part of a higher layer than the standard library. From what I can see, all of the "standard library" APIs which handle localization are actually part of Foundation. I'm sure that, if we build any localization-related features into the language, we'll add basic supporting code to the standard library if needed, but other than that, I don't think the standard library is the right place.

I believe best practices can be put in place with a system no more complicated for the programmer than the one we have now. This could be possible with protocols: a core protocols in the standard library that are then fleshed out in a Foundation-level framework above, with Locale / CultureCode / etc types extending or conforming.

I'm not sure what the purpose would be of having a protocol in the standard library which didn't offer even a lick of the promised functionality without a higher-level framework. What do we gain by having `localizedDescription` in the standard library if nothing written against only the standard library can actually emit a localized description?

I had tried to design something as I was writing the email. I wasn’t thinking a `localizedDescription` method (which would rely on global state, an issue with Foundation’s current design), but a context that is used generically or as a type to customise string conversion. Here’s one design idea, but I’m sure there are many others possible:

enum Fruit : String {
case raspberry, guava, passionFruit
}

extension Fruit : StringDisplayable {
  func toDisplayString(context: Swift.PrintDisplay) -> String { // Extension with `Self : RawRepresentable where RawValue = String` could add this by default one day.
    return rawValue
  }

  func toDisplayString(context: Foundation.CultureCode.EnglishUS) -> String {
    switch self {
    case raspberry: return "Raspberry"
    case guava: return "Guava"
    case passionFruit: return "Passion Fruit"
    }
  }
}

I’m not sure if anyone else shares the concern, so I’ll leave it. I do believe it’s important however.

I do think this is an important concern, and I also think it's important to ask how interpolation interacts with it. For instance, I think it would be very useful to be able to say "interpolate developer representations" or "interpolate user representations" or "interpolate localized user representations", and have the compiler reject interpolated expressions which don't have the necessary representation.

I like this idea. I think “interpolate localised user representations” should not be distinct from “interpolate user representations”. Instead non-localised is specifically denoted as ‘technical’ or perhaps ‘en-US’. Locales, or more broadly ‘contexts’, are not something additional, instead, everything already has a context, and the context of a string could be made more explicit.

I mean, you can call it "non-localized" or you can call it "technical", but a rose by any other name smells just as sweet.

Non-localised can mean ‘my language’, ‘US english’, ‘we haven’t localised this yet but might in the future’, or ‘a domain-specific key word or phrase’. Technical means just ‘a domain-specific key word or phrase’, and could have the additional properties of losslessness or robustness (conservative in what you send, liberal in what you accept).

···

On 5 Jun 2016, at 4:31 PM, Brent Royal-Gordon <brent@architechies.com> wrote:

--
Brent Royal-Gordon
Architechies

Patrick_Smith · June 5, 2016, 10:39am

To answer my own question, relooking at the proposal, RawRepresentable wouldn’t be suitable for Bool, Character, UnicodeScalar, Integer, etc.

···

On 5 Jun 2016, at 7:20 PM, Patrick Smith <pgwsmith@gmail.com> wrote:

(This raises a point — what’s the difference between the proposed LosslessStringConvertible and RawRepresentable where RawValue = String? They both have a failable init. Is it due to current limitations with typealiases that makes this hard?)

Brent_Royal-Gordon · June 5, 2016, 11:05am

To answer my own question, relooking at the proposal, RawRepresentable wouldn’t be suitable for Bool, Character, UnicodeScalar, Integer, etc.

(This raises a point — what’s the difference between the proposed LosslessStringConvertible and RawRepresentable where RawValue = String? They both have a failable init. Is it due to current limitations with typealiases that makes this hard?)

I would add that RawRepresentable does not promise the representation is human-readable.

···

On 5 Jun 2016, at 7:20 PM, Patrick Smith <pgwsmith@gmail.com> wrote:

--
Brent Royal-Gordon
Architechies