Faster/lower-level external String initialization

Sorry for not being clear. This is a breaking API change, and I think it would be nice of us to keep existing API in place marking it as deprecated for Swift 2.2 and later remove it in Swift 3. Here is an example <https://github.com/apple/swift/blob/master/stdlib/public/core/Index.swift#L76&gt; of what I meant by ‘deprecation magic’.

max

···

On Jan 13, 2016, at 1:14 AM, Zach Waldowski <zach@waldowski.me> wrote:

I might've mis-parsed the meaning of "'deprecation' magic". What's the
best path forward in the near-term? Would `decodeCString` be the only
one that becomes generic? Or, phrased differently, should there still be
`UnsafePointer<CChar>`+`strlen` versions?

Oops, that first struct should be this:

struct F {
     static func makeF() -> (F?, Int)
}

- Alex

···

On Jan 12, 2016, at 5:25 PM, Alex Migicovsky via swift-evolution <swift-evolution@swift.org> wrote:

I was trying to say that any tuple returning factory method can be turned into an initializer with an inout param. e.g.

struct F {
     func makeF() -> (F?, Int)
}

can be made into:

struct F {
     init?(inout result: Int) { … }
}

I think you should still be able to call an initializer like that with your `swiftApi(String(cString))` example, right? It would just be `swiftApi(String(cString, foo: &otherTupleValue). I thought the proposed alternative would look more like `swiftApi(String.fromCString(cString).0)` (I’ve lost track at this point about what the exact API proposal is, sorry).

With this approach every time you need to create a String you go through a String initializer—you don’t need to think if it’s a factory method or an initializer. That’s what I was trying to get at about keeping the “initializer story" consistent.

- Alex

On Jan 12, 2016, at 5:01 PM, Max Moiseev <moiseev@apple.com <mailto:moiseev@apple.com>> wrote:

Hi Alex,

If you mean that we still need to have initializers for both cases, we do. It’s just that in one of them (the repairing on) we throw away the information about whether repairs were made, which a) we don’t care in many cases and b) can still have using String.decodeCString.

Having an inout parameter in an initializer will break the (I think) common use case, where you get a CString from some C API, and want to call some Swift API that accepts String. I would do it like `swiftApi(String(cString))`, with inout it gets weird.

What do you think?

max

On Jan 12, 2016, at 1:00 PM, Alex Migicovsky <migi@apple.com <mailto:migi@apple.com>> wrote:

On Jan 12, 2016, at 12:08 PM, Max Moiseev via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

It would be nice to get some feedback from someone at Apple as to why fromCString() was implemented as a type method instead of a failable initializer. Presumably it was because there is both a repairing and a failable, non-repairing version.

There probably were no failable initializers when it was first implemented. The other thing is `fromCStringRepairingIllFormedUTF8` returns a tuple, so cannot be an initializer.

Can the initializer take an inout parameter instead? Seems like it would be better to keep a consistent "initializer story."

On Jan 12, 2016, at 11:18 AM, Charles Kissinger via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

It would be nice to get some feedback from someone at Apple as to why fromCString() was implemented as a type method instead of a failable initializer. Presumably it was because there is both a repairing and a failable, non-repairing version.

## Detailed design

See [full
implementation](https://github.com/apple/swift/compare/master...zwaldowski:string-from-code-units\).

This is a fairly straightforward renaming of the internal APIs.

The initializer, its labels, and their order were chosen to match other
non-cast
initializers in the stdlib. "Sequence" was removed, as it was a
misnomer.
"input" was kept as a generic name in order to allow for future
refinements.

The static initializer made the same changes, but was otherwise kept as
a
factory function due to its multiple return values.

`String.Type._fromWellFormedCodeUnitSequence(_:input:)` was kept as-is
for
internal use. I assume it wouldn't be good to expose publicly because,
for
lack of a better phrase, we only "trust" the stdlib to accurately know
the
wellformedness of their code units. Since it is a simple call through,
its
use could be elided throughout the stdlib.

## Impact on existing code

This is an additive change to the API.

## Alternatives considered

* A protocol-oriented API.

Some kind of `func decode<Encoding>(_:)` on `SequenceType`. It's not
really
clear this method would be related to string processing, and would
require
some kind of bounding (like `where Generator.Element:
UnsignedIntegerType`), but
that would be introducing a type bound that doesn't exist on

* Do nothing.

This seems suboptimal. For many use cases, `String` lacking this
constructor is
a limiting factor on performance for many kinds of pure-Swift
implementations.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

I think you should still be able to call an initializer like that with your `swiftApi(String(cString))` example, right? It would just be `swiftApi(String(cString, foo: &otherTupleValue). I thought the proposed alternative would look more like `swiftApi(String.fromCString(cString).0)`

Yes, you can do that, but the variable you pass as inout should be declared somewhere, so you call becomes `var inoutVar: Bool; swiftApi(String(cString, foo: &inoutVar))`. Slightly more complicated, that is.

(I’ve lost track at this point about what the exact API proposal is, sorry).

That we are trying to figure out here =)

max

···

On Jan 12, 2016, at 5:25 PM, Alex Migicovsky <migi@apple.com> wrote:

I was trying to say that any tuple returning factory method can be turned into an initializer with an inout param. e.g.

struct F {
     func makeF() -> (F?, Int)
}

can be made into:

struct F {
     init?(inout result: Int) { … }
}

I think you should still be able to call an initializer like that with your `swiftApi(String(cString))` example, right? It would just be `swiftApi(String(cString, foo: &otherTupleValue). I thought the proposed alternative would look more like `swiftApi(String.fromCString(cString).0)` (I’ve lost track at this point about what the exact API proposal is, sorry).

With this approach every time you need to create a String you go through a String initializer—you don’t need to think if it’s a factory method or an initializer. That’s what I was trying to get at about keeping the “initializer story" consistent.

- Alex

On Jan 12, 2016, at 5:01 PM, Max Moiseev <moiseev@apple.com <mailto:moiseev@apple.com>> wrote:

Hi Alex,

If you mean that we still need to have initializers for both cases, we do. It’s just that in one of them (the repairing on) we throw away the information about whether repairs were made, which a) we don’t care in many cases and b) can still have using String.decodeCString.

Having an inout parameter in an initializer will break the (I think) common use case, where you get a CString from some C API, and want to call some Swift API that accepts String. I would do it like `swiftApi(String(cString))`, with inout it gets weird.

What do you think?

max

On Jan 12, 2016, at 1:00 PM, Alex Migicovsky <migi@apple.com <mailto:migi@apple.com>> wrote:

On Jan 12, 2016, at 12:08 PM, Max Moiseev via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

It would be nice to get some feedback from someone at Apple as to why fromCString() was implemented as a type method instead of a failable initializer. Presumably it was because there is both a repairing and a failable, non-repairing version.

There probably were no failable initializers when it was first implemented. The other thing is `fromCStringRepairingIllFormedUTF8` returns a tuple, so cannot be an initializer.

Can the initializer take an inout parameter instead? Seems like it would be better to keep a consistent "initializer story."

On Jan 12, 2016, at 11:18 AM, Charles Kissinger via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

It would be nice to get some feedback from someone at Apple as to why fromCString() was implemented as a type method instead of a failable initializer. Presumably it was because there is both a repairing and a failable, non-repairing version.

## Detailed design

See [full
implementation](https://github.com/apple/swift/compare/master...zwaldowski:string-from-code-units\).

This is a fairly straightforward renaming of the internal APIs.

The initializer, its labels, and their order were chosen to match other
non-cast
initializers in the stdlib. "Sequence" was removed, as it was a
misnomer.
"input" was kept as a generic name in order to allow for future
refinements.

The static initializer made the same changes, but was otherwise kept as
a
factory function due to its multiple return values.

`String.Type._fromWellFormedCodeUnitSequence(_:input:)` was kept as-is
for
internal use. I assume it wouldn't be good to expose publicly because,
for
lack of a better phrase, we only "trust" the stdlib to accurately know
the
wellformedness of their code units. Since it is a simple call through,
its
use could be elided throughout the stdlib.

## Impact on existing code

This is an additive change to the API.

## Alternatives considered

* A protocol-oriented API.

Some kind of `func decode<Encoding>(_:)` on `SequenceType`. It's not
really
clear this method would be related to string processing, and would
require
some kind of bounding (like `where Generator.Element:
UnsignedIntegerType`), but
that would be introducing a type bound that doesn't exist on

* Do nothing.

This seems suboptimal. For many use cases, `String` lacking this
constructor is
a limiting factor on performance for many kinds of pure-Swift
implementations.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

Oh, alright!

So the complete API set would be, as I see it right now:
* `decodeCString(_:encoding:repairingInvalidCodeUnits:)`, generic over
both collection and codec for maximum flexibility
* `init(cString:)`, generic over collection
* `init?(validatingUTF8:)`, generic over collection

2.2 (by this proposal) would also have `init(cString:)` and
`init?(validatingUTF8:)` that take `UnsafePointer<CChar>` for
compatibility.

Does that make sense, or should the new inits take buffer pointers?

···

----
Zach Waldowski
zach@waldowski.me

On Wed, Jan 13, 2016, at 12:32 PM, Max Moiseev wrote:

On Jan 13, 2016, at 1:14 AM, Zach Waldowski <zach@waldowski.me> wrote:

I might've mis-parsed the meaning of "'deprecation' magic". What's the
best path forward in the near-term? Would `decodeCString` be the only
one that becomes generic? Or, phrased differently, should there still be
`UnsafePointer<CChar>`+`strlen` versions?

Sorry for not being clear. This is a breaking API change, and I think it would be nice of us to keep existing API in place marking it as deprecated for Swift 2.2 and later remove it in Swift 3. Here is an example[https://github.com/apple/swift/blob/master/stdlib/public/core/Index.swift#L76] of what I meant by ‘deprecation magic’.

max

The way I see it:
Since we are designing a new API here, I don’t see why it should not be similar in both 2.2 and 3.
I’m thinking of 2 different versions of decodeCString though: one accepting a CollectionType and another one accepting an UnsafePointer and calling strlen, since there are still cases, where the length of a cstring is unknown and it is a null-terminated one.
All the existing API’s will remain in place for Swift 2.2 but should be annotated with a deprecation attribute.

···

On Jan 13, 2016, at 9:41 AM, Zach Waldowski <zach@waldowski.me> wrote:

Oh, alright!

So the complete API set would be, as I see it right now:
* `decodeCString(_:encoding:repairingInvalidCodeUnits:)`, generic over
both collection and codec for maximum flexibility
* `init(cString:)`, generic over collection
* `init?(validatingUTF8:)`, generic over collection

2.2 (by this proposal) would also have `init(cString:)` and
`init?(validatingUTF8:)` that take `UnsafePointer<CChar>` for
compatibility.

Does that make sense, or should the new inits take buffer pointers?

----
Zach Waldowski
zach@waldowski.me

On Wed, Jan 13, 2016, at 12:32 PM, Max Moiseev wrote:

On Jan 13, 2016, at 1:14 AM, Zach Waldowski <zach@waldowski.me> wrote:

I might've mis-parsed the meaning of "'deprecation' magic". What's the
best path forward in the near-term? Would `decodeCString` be the only
one that becomes generic? Or, phrased differently, should there still be
`UnsafePointer<CChar>`+`strlen` versions?

Sorry for not being clear. This is a breaking API change, and I think it would be nice of us to keep existing API in place marking it as deprecated for Swift 2.2 and later remove it in Swift 3. Here is an example[https://github.com/apple/swift/blob/master/stdlib/public/core/Index.swift#L76\] of what I meant by ‘deprecation magic’.

max