[Draft proposal] Faster/lower-level external String initialization

Color me interested, then…

Zachary Waldowski
zach@waldowski.me

···

On Mon, Feb 1, 2016, at 05:07 PM, Dave Abrahams via swift-evolution wrote:

on Mon Feb 01 2016, Zach Waldowski <swift-evolution@swift.org> wrote:

> Due to the semantics of _StringCore and _StringBuffer (as far as I
> understand them), such a method would not be more efficient than
> creating another String with the new initializer and concatenating the
> two, and would require more significant plumbing changes to
> _StringBuffer.

We are very interested in making significant plumbing changes to String,
FWIW.

>
>
> It would be good to shop around for this proposal, though; maybe if
> someone on the core team wants to chime in.
>
> Cheers,
> Zachary Waldowski
> zach@waldowski.me
>
> On Mon, Feb 1, 2016, at 03:07 AM, Charles Kissinger wrote:
>> It occurred to me that this proposal provides a way to efficiently
>> initialize Strings from UTF code unit sequences, but it doesn’t provide a
>> way to *append* code unit sequences to existing strings. String has an
>> existing method to append Character sequences:
>>
>> String.appendContentsOf<S : SequenceType where S.Generator.Element ==
>> >(_: S)
>>
>> The equivalent for code units would presumably be:
>>
>> String.appendContentsOf<S : SequenceType, Encoding: UnicodeCodecType
>> where Encoding.CodeUnit == Input.Generator.Element>(_: S, encoding:
>> Encoding.Type)
>>
>> Is there any interest in adding that to the proposal? It would only have
>> a lot of value if it could be implemented in a more efficient way than
>> just calling String.Append() for each decoded Character. From looking at
>> the code, that might not be straightforward.
>>
>> —CK
>>
>> > On Jan 26, 2016, at 3:14 PM, Zach Waldowski via swift-evolution <swift-evolution@swift.org> wrote:
>> >
>> > Since this seems to have gone quiet, and the code was already done, I've
>> > posted the PR to Swift itself:
>> >
>> > [stdlib] String from code units API by zwaldowski · Pull Request #1109 · apple/swift · GitHub
>> >
>> > The existing proposal PR:
>> >
>> > Proposal: String from code units API by zwaldowski · Pull Request #101 · apple/swift-evolution · GitHub
>> >
>> > --
>> > Sincerely,
>> > Zachary Waldowski
>> > zach@waldowski.me
>> >
>> > On Wed, Jan 20, 2016, at 06:08 PM, Zach Waldowski via swift-evolution > >> > wrote:
>> >> Thanks, Dave.
>> >>
>> >> I definitely wasn't hard to convince on this. The change has already
>> >> been made to the proposal, its PR, and the pending PR to the stdlib.
>> >>
>> >> Cheers!
>> >> Zach Waldowski
>> >> zach@waldowski.me
>> >>
>> >> On Wed, Jan 20, 2016, at 01:23 PM, Dave Abrahams via swift-evolution > >> >> wrote:
>> >>>
>> >>> on Fri Jan 15 2016, Zach Waldowski via swift-evolution > >> >>> <swift-evolution-m3FHrko0VLzYtjvyW6yDsg-AT-public.gmane.org> wrote:
>> >>>
>> >>>> Charles -
>> >>>>
>> >>>> I shared the same concern, and mention them in the proposal. I thought
>> >>>> `decode(_:as:)` to be too simple to the point of being
>> >>>> non-descriptive,
>> >>>
>> >>> The names of methods don't need to be descriptive. It's the use-sites
>> >>> (and secondarily, declarations) that need to be clear. Trying to make
>> >>> the names of methods descriptive by themselves just hurts readability at
>> >>> the use-site.
>> >>>
>> >>> -Dave
>> >>>
>> >>> _______________________________________________
>> >>> swift-evolution mailing list
>> >>> swift-evolution@swift.org
>> >>> https://lists.swift.org/mailman/listinfo/swift-evolution
>> >> _______________________________________________
>> >> swift-evolution mailing list
>> >> swift-evolution@swift.org
>> >> https://lists.swift.org/mailman/listinfo/swift-evolution
>> > _______________________________________________
>> > swift-evolution mailing list
>> > swift-evolution@swift.org
>> > https://lists.swift.org/mailman/listinfo/swift-evolution
>>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

--
-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

That'd seem reasonable.

I guess I'm not entirely sold on the benefit of the extra method here,
and all the weight on maintenance that'd entail. Obviously I get the
benefit of skipping the storage reservation, but I can't imagine a
scenario where building something up using
`appendContentsOf(_:encoding:)` would be that much better then plumb
concatenation. I'd love to hear an example, though.

Cheers!
Zach Waldowski
zach@waldowski.me

···

On Mon, Feb 1, 2016, at 08:36 PM, Charles Kissinger via swift-evolution wrote:

> On Feb 1, 2016, at 2:07 PM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:
>
>
> on Mon Feb 01 2016, Zach Waldowski <swift-evolution@swift.org> wrote:
>
>> Due to the semantics of _StringCore and _StringBuffer (as far as I
>> understand them), such a method would not be more efficient than
>> creating another String with the new initializer and concatenating the
>> two, and would require more significant plumbing changes to
>> _StringBuffer.
>
> We are very interested in making significant plumbing changes to String, FWIW.
>

In that case, perhaps it would make sense to add String.append() for code
unit sequences over the exiting plumbing just for completeness of the
API, on the assumption that efficiency would come later when String gets
its makeover.

—CK

>>
>>
>> It would be good to shop around for this proposal, though; maybe if
>> someone on the core team wants to chime in.
>>
>> Cheers,
>> Zachary Waldowski
>> zach@waldowski.me
>>
>> On Mon, Feb 1, 2016, at 03:07 AM, Charles Kissinger wrote:
>>> It occurred to me that this proposal provides a way to efficiently
>>> initialize Strings from UTF code unit sequences, but it doesn’t provide a
>>> way to *append* code unit sequences to existing strings. String has an
>>> existing method to append Character sequences:
>>>
>>> String.appendContentsOf<S : SequenceType where S.Generator.Element ==
>>> >(_: S)
>>>
>>> The equivalent for code units would presumably be:
>>>
>>> String.appendContentsOf<S : SequenceType, Encoding: UnicodeCodecType
>>> where Encoding.CodeUnit == Input.Generator.Element>(_: S, encoding:
>>> Encoding.Type)
>>>
>>> Is there any interest in adding that to the proposal? It would only have
>>> a lot of value if it could be implemented in a more efficient way than
>>> just calling String.Append() for each decoded Character. From looking at
>>> the code, that might not be straightforward.
>>>
>>> —CK
>>>
>>>> On Jan 26, 2016, at 3:14 PM, Zach Waldowski via swift-evolution <swift-evolution@swift.org> wrote:
>>>>
>>>> Since this seems to have gone quiet, and the code was already done, I've
>>>> posted the PR to Swift itself:
>>>>
>>>> [stdlib] String from code units API by zwaldowski · Pull Request #1109 · apple/swift · GitHub
>>>>
>>>> The existing proposal PR:
>>>>
>>>> Proposal: String from code units API by zwaldowski · Pull Request #101 · apple/swift-evolution · GitHub
>>>>
>>>> --
>>>> Sincerely,
>>>> Zachary Waldowski
>>>> zach@waldowski.me
>>>>
>>>> On Wed, Jan 20, 2016, at 06:08 PM, Zach Waldowski via swift-evolution > >>>> wrote:
>>>>> Thanks, Dave.
>>>>>
>>>>> I definitely wasn't hard to convince on this. The change has already
>>>>> been made to the proposal, its PR, and the pending PR to the stdlib.
>>>>>
>>>>> Cheers!
>>>>> Zach Waldowski
>>>>> zach@waldowski.me
>>>>>
>>>>> On Wed, Jan 20, 2016, at 01:23 PM, Dave Abrahams via swift-evolution > >>>>> wrote:
>>>>>>
>>>>>> on Fri Jan 15 2016, Zach Waldowski via swift-evolution > >>>>>> <swift-evolution-m3FHrko0VLzYtjvyW6yDsg-AT-public.gmane.org> wrote:
>>>>>>
>>>>>>> Charles -
>>>>>>>
>>>>>>> I shared the same concern, and mention them in the proposal. I thought
>>>>>>> `decode(_:as:)` to be too simple to the point of being
>>>>>>> non-descriptive,
>>>>>>
>>>>>> The names of methods don't need to be descriptive. It's the use-sites
>>>>>> (and secondarily, declarations) that need to be clear. Trying to make
>>>>>> the names of methods descriptive by themselves just hurts readability at
>>>>>> the use-site.
>>>>>>
>>>>>> -Dave
>>>>>>
>>>>>> _______________________________________________
>>>>>> swift-evolution mailing list
>>>>>> swift-evolution@swift.org
>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>> _______________________________________________
>>>>> swift-evolution mailing list
>>>>> swift-evolution@swift.org
>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>> _______________________________________________
>>>> swift-evolution mailing list
>>>> swift-evolution@swift.org
>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution@swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>
> --
> -Dave
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

That'd seem reasonable.

I guess I'm not entirely sold on the benefit of the extra method here,
and all the weight on maintenance that'd entail. Obviously I get the
benefit of skipping the storage reservation, but I can't imagine a
scenario where building something up using
`appendContentsOf(_:encoding:)` would be that much better then plumb
concatenation. I'd love to hear an example, though.

Zach,

Here’s a real-world example:

I have a case where I am assembling a String from five short ASCII character sequences scattered around different parts of each line of an input file. The maximum length of the resulting String is predictable, so in an ideal world I could create an empty string, call String.reserveCapacity() and then suck up all of the ASCII character sequences with a series of String.appendContentsOf(_, encoding:), all with just a single memory allocation per String. (But as you mentioned, it would appear to require a significant change in the String implementation for things to be that efficient.)

Obviously, the alternative approach of instantiating a string for each of the subsequences and concatenating them would involve a minimum of six allocations. It matters in my case, because the input files are large (sometimes millions of lines).

Right now, my approach is to allocate a byte buffer, assemble the substrings in it, null-terminate and call String.fromCString(). That performs reasonably well, but it still involves an extra copy of the characters and the byte buffer allocation, neither of which would be necessary with the String.appendContentsOf(_, encoding:) method.

I hope that example was clear. If single-character String.append() became more efficient, that would reduce the need for the function I’m proposing. And if Swift strings were to get short-string optimization it would make this all much easier, but I have no idea if that is in the cards.

—CK

···

On Feb 1, 2016, at 8:53 PM, Zach Waldowski via swift-evolution <swift-evolution@swift.org> wrote:

Cheers!
Zach Waldowski
zach@waldowski.me

On Mon, Feb 1, 2016, at 08:36 PM, Charles Kissinger via swift-evolution > wrote:

On Feb 1, 2016, at 2:07 PM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:

on Mon Feb 01 2016, Zach Waldowski <swift-evolution@swift.org> wrote:

Due to the semantics of _StringCore and _StringBuffer (as far as I
understand them), such a method would not be more efficient than
creating another String with the new initializer and concatenating the
two, and would require more significant plumbing changes to
_StringBuffer.

We are very interested in making significant plumbing changes to String, FWIW.

In that case, perhaps it would make sense to add String.append() for code
unit sequences over the exiting plumbing just for completeness of the
API, on the assumption that efficiency would come later when String gets
its makeover.

—CK

It would be good to shop around for this proposal, though; maybe if
someone on the core team wants to chime in.

Cheers,
Zachary Waldowski
zach@waldowski.me

On Mon, Feb 1, 2016, at 03:07 AM, Charles Kissinger wrote:

It occurred to me that this proposal provides a way to efficiently
initialize Strings from UTF code unit sequences, but it doesn’t provide a
way to *append* code unit sequences to existing strings. String has an
existing method to append Character sequences:

String.appendContentsOf<S : SequenceType where S.Generator.Element ==
>(_: S)

The equivalent for code units would presumably be:

String.appendContentsOf<S : SequenceType, Encoding: UnicodeCodecType
where Encoding.CodeUnit == Input.Generator.Element>(_: S, encoding:
Encoding.Type)

Is there any interest in adding that to the proposal? It would only have
a lot of value if it could be implemented in a more efficient way than
just calling String.Append() for each decoded Character. From looking at
the code, that might not be straightforward.

—CK

On Jan 26, 2016, at 3:14 PM, Zach Waldowski via swift-evolution <swift-evolution@swift.org> wrote:

Since this seems to have gone quiet, and the code was already done, I've
posted the PR to Swift itself:

[stdlib] String from code units API by zwaldowski · Pull Request #1109 · apple/swift · GitHub

The existing proposal PR:

Proposal: String from code units API by zwaldowski · Pull Request #101 · apple/swift-evolution · GitHub

--
Sincerely,
Zachary Waldowski
zach@waldowski.me

On Wed, Jan 20, 2016, at 06:08 PM, Zach Waldowski via swift-evolution >>>>>> wrote:

Thanks, Dave.

I definitely wasn't hard to convince on this. The change has already
been made to the proposal, its PR, and the pending PR to the stdlib.

Cheers!
Zach Waldowski
zach@waldowski.me

On Wed, Jan 20, 2016, at 01:23 PM, Dave Abrahams via swift-evolution >>>>>>> wrote:

on Fri Jan 15 2016, Zach Waldowski via swift-evolution >>>>>>>> <swift-evolution-m3FHrko0VLzYtjvyW6yDsg-AT-public.gmane.org> wrote:

Charles -

I shared the same concern, and mention them in the proposal. I thought
`decode(_:as:)` to be too simple to the point of being
non-descriptive,

The names of methods don't need to be descriptive. It's the use-sites
(and secondarily, declarations) that need to be clear. Trying to make
the names of methods descriptive by themselves just hurts readability at
the use-site.

-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Charles —

This certainly makes a lot of sense. My primary response is that I think
the bad behavior of reserveCapacity should be reported by one of us as a
bug. My second thought is that the extra method should be proposed
separately; whereas the current proposal surfaces things that already
exist, what you need is purely additive but would require underlying
changes. I don't see a point in implementing it now for API completeness
if it can't make good on its performance; that's the exact predicament
we're in today with reserveCapacity and append/appendContentsOf.

Zach Waldowski
zach@waldowski.me

···

On Tue, Feb 2, 2016, at 03:24 AM, Charles Kissinger wrote:

> On Feb 1, 2016, at 8:53 PM, Zach Waldowski via swift-evolution <swift-evolution@swift.org> wrote:
>
> That'd seem reasonable.
>
> I guess I'm not entirely sold on the benefit of the extra method here,
> and all the weight on maintenance that'd entail. Obviously I get the
> benefit of skipping the storage reservation, but I can't imagine a
> scenario where building something up using
> `appendContentsOf(_:encoding:)` would be that much better then plumb
> concatenation. I'd love to hear an example, though.

Zach,

Here’s a real-world example:

I have a case where I am assembling a String from five short ASCII
character sequences scattered around different parts of each line of an
input file. The maximum length of the resulting String is predictable, so
in an ideal world I could create an empty string, call
String.reserveCapacity() and then suck up all of the ASCII character
sequences with a series of String.appendContentsOf(_, encoding:), all
with just a single memory allocation per String. (But as you mentioned,
it would appear to require a significant change in the String
implementation for things to be that efficient.)

Obviously, the alternative approach of instantiating a string for each of
the subsequences and concatenating them would involve a minimum of six
allocations. It matters in my case, because the input files are large
(sometimes millions of lines).

Right now, my approach is to allocate a byte buffer, assemble the
substrings in it, null-terminate and call String.fromCString(). That
performs reasonably well, but it still involves an extra copy of the
characters and the byte buffer allocation, neither of which would be
necessary with the String.appendContentsOf(_, encoding:) method.

I hope that example was clear. If single-character String.append() became
more efficient, that would reduce the need for the function I’m
proposing. And if Swift strings were to get short-string optimization it
would make this all much easier, but I have no idea if that is in the
cards.

—CK

>
> Cheers!
> Zach Waldowski
> zach@waldowski.me
>
> On Mon, Feb 1, 2016, at 08:36 PM, Charles Kissinger via swift-evolution > > wrote:
>>
>>> On Feb 1, 2016, at 2:07 PM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:
>>>
>>>
>>> on Mon Feb 01 2016, Zach Waldowski <swift-evolution@swift.org> wrote:
>>>
>>>> Due to the semantics of _StringCore and _StringBuffer (as far as I
>>>> understand them), such a method would not be more efficient than
>>>> creating another String with the new initializer and concatenating the
>>>> two, and would require more significant plumbing changes to
>>>> _StringBuffer.
>>>
>>> We are very interested in making significant plumbing changes to String, FWIW.
>>>
>>
>> In that case, perhaps it would make sense to add String.append() for code
>> unit sequences over the exiting plumbing just for completeness of the
>> API, on the assumption that efficiency would come later when String gets
>> its makeover.
>>
>> —CK
>>
>>>>
>>>>
>>>> It would be good to shop around for this proposal, though; maybe if
>>>> someone on the core team wants to chime in.
>>>>
>>>> Cheers,
>>>> Zachary Waldowski
>>>> zach@waldowski.me
>>>>
>>>> On Mon, Feb 1, 2016, at 03:07 AM, Charles Kissinger wrote:
>>>>> It occurred to me that this proposal provides a way to efficiently
>>>>> initialize Strings from UTF code unit sequences, but it doesn’t provide a
>>>>> way to *append* code unit sequences to existing strings. String has an
>>>>> existing method to append Character sequences:
>>>>>
>>>>> String.appendContentsOf<S : SequenceType where S.Generator.Element ==
>>>>> >(_: S)
>>>>>
>>>>> The equivalent for code units would presumably be:
>>>>>
>>>>> String.appendContentsOf<S : SequenceType, Encoding: UnicodeCodecType
>>>>> where Encoding.CodeUnit == Input.Generator.Element>(_: S, encoding:
>>>>> Encoding.Type)
>>>>>
>>>>> Is there any interest in adding that to the proposal? It would only have
>>>>> a lot of value if it could be implemented in a more efficient way than
>>>>> just calling String.Append() for each decoded Character. From looking at
>>>>> the code, that might not be straightforward.
>>>>>
>>>>> —CK
>>>>>
>>>>>> On Jan 26, 2016, at 3:14 PM, Zach Waldowski via swift-evolution <swift-evolution@swift.org> wrote:
>>>>>>
>>>>>> Since this seems to have gone quiet, and the code was already done, I've
>>>>>> posted the PR to Swift itself:
>>>>>>
>>>>>> [stdlib] String from code units API by zwaldowski · Pull Request #1109 · apple/swift · GitHub
>>>>>>
>>>>>> The existing proposal PR:
>>>>>>
>>>>>> Proposal: String from code units API by zwaldowski · Pull Request #101 · apple/swift-evolution · GitHub
>>>>>>
>>>>>> --
>>>>>> Sincerely,
>>>>>> Zachary Waldowski
>>>>>> zach@waldowski.me
>>>>>>
>>>>>> On Wed, Jan 20, 2016, at 06:08 PM, Zach Waldowski via swift-evolution > >>>>>> wrote:
>>>>>>> Thanks, Dave.
>>>>>>>
>>>>>>> I definitely wasn't hard to convince on this. The change has already
>>>>>>> been made to the proposal, its PR, and the pending PR to the stdlib.
>>>>>>>
>>>>>>> Cheers!
>>>>>>> Zach Waldowski
>>>>>>> zach@waldowski.me
>>>>>>>
>>>>>>> On Wed, Jan 20, 2016, at 01:23 PM, Dave Abrahams via swift-evolution > >>>>>>> wrote:
>>>>>>>>
>>>>>>>> on Fri Jan 15 2016, Zach Waldowski via swift-evolution > >>>>>>>> <swift-evolution-m3FHrko0VLzYtjvyW6yDsg-AT-public.gmane.org> wrote:
>>>>>>>>
>>>>>>>>> Charles -
>>>>>>>>>
>>>>>>>>> I shared the same concern, and mention them in the proposal. I thought
>>>>>>>>> `decode(_:as:)` to be too simple to the point of being
>>>>>>>>> non-descriptive,
>>>>>>>>
>>>>>>>> The names of methods don't need to be descriptive. It's the use-sites
>>>>>>>> (and secondarily, declarations) that need to be clear. Trying to make
>>>>>>>> the names of methods descriptive by themselves just hurts readability at
>>>>>>>> the use-site.
>>>>>>>>
>>>>>>>> -Dave
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> swift-evolution mailing list
>>>>>>>> swift-evolution@swift.org
>>>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>>>> _______________________________________________
>>>>>>> swift-evolution mailing list
>>>>>>> swift-evolution@swift.org
>>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>>> _______________________________________________
>>>>>> swift-evolution mailing list
>>>>>> swift-evolution@swift.org
>>>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>>>
>>>> _______________________________________________
>>>> swift-evolution mailing list
>>>> swift-evolution@swift.org
>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>
>>> --
>>> -Dave
>>>
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution@swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution@swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

Charles —

This certainly makes a lot of sense. My primary response is that I think
the bad behavior of reserveCapacity should be reported by one of us as a
bug.

From the little bit of poking around that I’ve done, the problem might not lie with reserveCapacity() itself. It *appears* that when calling String.append(_:Character), each character is converted to a String and then concatenated. So the savings in memory allocations that reserveCapacity() provides might be getting swamped out by the per-character allocations of temporary strings. I only took a quick look though, and haven’t verified this.

My second thought is that the extra method should be proposed
separately; whereas the current proposal surfaces things that already
exist, what you need is purely additive but would require underlying
changes. I don't see a point in implementing it now for API completeness
if it can't make good on its performance; that's the exact predicament
we're in today with reserveCapacity and append/appendContentsOf.

Fair enough. We can always revisit the “API completeness” argument when the proposal actual undergoes review.

Thanks again for putting together the proposal and code!

—CK

···

On Feb 3, 2016, at 10:18 AM, Zach Waldowski <zach@waldowski.me> wrote:

Zach Waldowski
zach@waldowski.me

On Tue, Feb 2, 2016, at 03:24 AM, Charles Kissinger wrote:

On Feb 1, 2016, at 8:53 PM, Zach Waldowski via swift-evolution <swift-evolution@swift.org> wrote:

That'd seem reasonable.

I guess I'm not entirely sold on the benefit of the extra method here,
and all the weight on maintenance that'd entail. Obviously I get the
benefit of skipping the storage reservation, but I can't imagine a
scenario where building something up using
`appendContentsOf(_:encoding:)` would be that much better then plumb
concatenation. I'd love to hear an example, though.

Zach,

Here’s a real-world example:

I have a case where I am assembling a String from five short ASCII
character sequences scattered around different parts of each line of an
input file. The maximum length of the resulting String is predictable, so
in an ideal world I could create an empty string, call
String.reserveCapacity() and then suck up all of the ASCII character
sequences with a series of String.appendContentsOf(_, encoding:), all
with just a single memory allocation per String. (But as you mentioned,
it would appear to require a significant change in the String
implementation for things to be that efficient.)

Obviously, the alternative approach of instantiating a string for each of
the subsequences and concatenating them would involve a minimum of six
allocations. It matters in my case, because the input files are large
(sometimes millions of lines).

Right now, my approach is to allocate a byte buffer, assemble the
substrings in it, null-terminate and call String.fromCString(). That
performs reasonably well, but it still involves an extra copy of the
characters and the byte buffer allocation, neither of which would be
necessary with the String.appendContentsOf(_, encoding:) method.

I hope that example was clear. If single-character String.append() became
more efficient, that would reduce the need for the function I’m
proposing. And if Swift strings were to get short-string optimization it
would make this all much easier, but I have no idea if that is in the
cards.

—CK

Cheers!
Zach Waldowski
zach@waldowski.me

On Mon, Feb 1, 2016, at 08:36 PM, Charles Kissinger via swift-evolution >>> wrote:

On Feb 1, 2016, at 2:07 PM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:

on Mon Feb 01 2016, Zach Waldowski <swift-evolution@swift.org> wrote:

Due to the semantics of _StringCore and _StringBuffer (as far as I
understand them), such a method would not be more efficient than
creating another String with the new initializer and concatenating the
two, and would require more significant plumbing changes to
_StringBuffer.

We are very interested in making significant plumbing changes to String, FWIW.

In that case, perhaps it would make sense to add String.append() for code
unit sequences over the exiting plumbing just for completeness of the
API, on the assumption that efficiency would come later when String gets
its makeover.

—CK

It would be good to shop around for this proposal, though; maybe if
someone on the core team wants to chime in.

Cheers,
Zachary Waldowski
zach@waldowski.me

On Mon, Feb 1, 2016, at 03:07 AM, Charles Kissinger wrote:

It occurred to me that this proposal provides a way to efficiently
initialize Strings from UTF code unit sequences, but it doesn’t provide a
way to *append* code unit sequences to existing strings. String has an
existing method to append Character sequences:

String.appendContentsOf<S : SequenceType where S.Generator.Element ==
>(_: S)

The equivalent for code units would presumably be:

String.appendContentsOf<S : SequenceType, Encoding: UnicodeCodecType
where Encoding.CodeUnit == Input.Generator.Element>(_: S, encoding:
Encoding.Type)

Is there any interest in adding that to the proposal? It would only have
a lot of value if it could be implemented in a more efficient way than
just calling String.Append() for each decoded Character. From looking at
the code, that might not be straightforward.

—CK

On Jan 26, 2016, at 3:14 PM, Zach Waldowski via swift-evolution <swift-evolution@swift.org> wrote:

Since this seems to have gone quiet, and the code was already done, I've
posted the PR to Swift itself:

[stdlib] String from code units API by zwaldowski · Pull Request #1109 · apple/swift · GitHub

The existing proposal PR:

Proposal: String from code units API by zwaldowski · Pull Request #101 · apple/swift-evolution · GitHub

--
Sincerely,
Zachary Waldowski
zach@waldowski.me

On Wed, Jan 20, 2016, at 06:08 PM, Zach Waldowski via swift-evolution >>>>>>>> wrote:

Thanks, Dave.

I definitely wasn't hard to convince on this. The change has already
been made to the proposal, its PR, and the pending PR to the stdlib.

Cheers!
Zach Waldowski
zach@waldowski.me

On Wed, Jan 20, 2016, at 01:23 PM, Dave Abrahams via swift-evolution >>>>>>>>> wrote:

on Fri Jan 15 2016, Zach Waldowski via swift-evolution >>>>>>>>>> <swift-evolution-m3FHrko0VLzYtjvyW6yDsg-AT-public.gmane.org> wrote:

Charles -

I shared the same concern, and mention them in the proposal. I thought
`decode(_:as:)` to be too simple to the point of being
non-descriptive,

The names of methods don't need to be descriptive. It's the use-sites
(and secondarily, declarations) that need to be clear. Trying to make
the names of methods descriptive by themselves just hurts readability at
the use-site.

-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution