I've been thinking about a standard library 'Data' type for a while, analogous to NSData in the same way Swift's Arrays and Dictionaries are analogous to NSArrays and NSDictionaries. A first-class container for binary data that is available to every Swift user, conforms to Swift semantics, and is safer and easier to work with than UnsafeBufferPointer seems like a natural fit for the standard library.
As such, I've put together a very preliminary proposal, which can be found here: https://github.com/austinzheng/swift-evolution/blob/d2/proposals/XXXX-stdlib-data.md\. I present it not as a way to impose a vision of what such a Data type should look like, but rather as a way to catalyze discussion (including discussion as to whether or not a Data type is even a good idea in the first place).
Some thoughts:
- It's not clear if the methods to convert to and from base-64 encoded data are necessary. The state flag that tries to mark whether or not a Data represents base-64-encoded string stored as a data may be unnecessary as well.
- I didn't really go into how NSData should be bridged. Special consideration needs to be given to how any native Data type would interact with the overlays described in https://github.com/apple/swift-evolution/blob/master/proposals/0069-swift-mutability-for-foundation.md\. It's possible (and only if a compelling technical reason exists) that the Foundation implementation of NSData can in the future be moved into Swift/supplanted by such a native data type, with API extensions to provide conformance to the Objective-C Foundation API. This proposal should not be seen as an attempt to usurp Foundation's job, though - there are plenty of to-be-value types in Foundation whose inclusion directly in the standard library makes little sense.
- Perhaps Data should be generic over various types of fixed-width integers (signed and unsigned, 8, 16, 32, 64, machine-width, etc). In that case it might also provide generic views (for example, to allow iteration over a Data<UInt64> as if it were a collection of UInt8 bytes). I'm not yet sure if this is feasible or desirable.
Finally, it's possible that this is strictly Swift 4 territory, in which case I'm happy to withdraw from discussion until the time is right later this year.
One thing that I would like to suggest for us to consider is
justifying why Data needs to be a separate type from Array<Int8> and
Array<UInt8>. We can add conditional extensions to Array of Int8 and
UInt8 if we find that existing NSData/dispatch_data_t usecases need a
few special APIs that won't make sense on arrays in general.
For example, something that I would imagine people want to do with
"data buffer" types is being able to make an unaligned or type punned
load or store. For example, in Java, this is one of the primary
usecases for a type similar in spirit, java.nio.ByteBuffer
(JDK 21 Documentation - Home).
Another usecase that is a crossover between Array and Data, allow
Array to (unsafely) adopt ownership of an existing initialized unsafe
buffer pointer. We had quite a few requests for this. Do you think
this is an interesting usecase? Does it overlap with this discussion?
Dmitri
···
On Wed, May 11, 2016 at 2:37 AM, Austin Zheng via swift-evolution <swift-evolution@swift.org> wrote:
Hello swift-evolution,
I've been thinking about a standard library 'Data' type for a while,
analogous to NSData in the same way Swift's Arrays and Dictionaries are
analogous to NSArrays and NSDictionaries. A first-class container for binary
data that is available to every Swift user, conforms to Swift semantics, and
is safer and easier to work with than UnsafeBufferPointer seems like a
natural fit for the standard library.
- It's not clear if the methods to convert to and from base-64 encoded data are necessary. The state flag that tries to mark whether or not a Data represents base-64-encoded string stored as a data may be unnecessary as well.
I would definitely vote for having the base64 conversions in. The state flag would have less utility IMO.
I would also ask that you consider conversions to and from hex encoded strings. If nothing else, it would make it easier to convert git commit hashes to a readable form in my forthcoming git client vapourware.
···
On 11 May 2016, at 10:37, Austin Zheng via swift-evolution <swift-evolution@swift.org> wrote:
The proposal looks well fleshed out! Another alternative to consider is the ‘DispatchData’ struct from libdispatch currently being reviewed? Some of additions these could be added as an extension to that type? Or perhaps a protocol could be made ‘DataProtocol’, that has a base set of required methods and a further set of extensions using that base. Then NSData and DispatchData can conform and implement those base methods and each get the functionality. But personally I think it would be nice to make DispatchData the native Swift data type, whether the libdispatch team would accept extensions in the future like this I don’t know, but I think it would be interesting.
Patrick
···
On 11 May 2016, at 7:37 PM, Austin Zheng via swift-evolution <swift-evolution@swift.org> wrote:
Hello swift-evolution,
I've been thinking about a standard library 'Data' type for a while, analogous to NSData in the same way Swift's Arrays and Dictionaries are analogous to NSArrays and NSDictionaries. A first-class container for binary data that is available to every Swift user, conforms to Swift semantics, and is safer and easier to work with than UnsafeBufferPointer seems like a natural fit for the standard library.
As such, I've put together a very preliminary proposal, which can be found here: https://github.com/austinzheng/swift-evolution/blob/d2/proposals/XXXX-stdlib-data.md\. I present it not as a way to impose a vision of what such a Data type should look like, but rather as a way to catalyze discussion (including discussion as to whether or not a Data type is even a good idea in the first place).
Some thoughts:
- It's not clear if the methods to convert to and from base-64 encoded data are necessary. The state flag that tries to mark whether or not a Data represents base-64-encoded string stored as a data may be unnecessary as well.
- I didn't really go into how NSData should be bridged. Special consideration needs to be given to how any native Data type would interact with the overlays described in https://github.com/apple/swift-evolution/blob/master/proposals/0069-swift-mutability-for-foundation.md\. It's possible (and only if a compelling technical reason exists) that the Foundation implementation of NSData can in the future be moved into Swift/supplanted by such a native data type, with API extensions to provide conformance to the Objective-C Foundation API. This proposal should not be seen as an attempt to usurp Foundation's job, though - there are plenty of to-be-value types in Foundation whose inclusion directly in the standard library makes little sense.
- Perhaps Data should be generic over various types of fixed-width integers (signed and unsigned, 8, 16, 32, 64, machine-width, etc). In that case it might also provide generic views (for example, to allow iteration over a Data<UInt64> as if it were a collection of UInt8 bytes). I'm not yet sure if this is feasible or desirable.
Finally, it's possible that this is strictly Swift 4 territory, in which case I'm happy to withdraw from discussion until the time is right later this year.
dispatch_data_t certainly is an intriguing alternative. I don't think it would be a good fit for a Swift standard library type verbatim, given that it's written in C and contains APIs that would not really be relevant within a Swift environment, but it should definitely be considered as a model.
One question that this brings up is whether supporting non-contiguous data regions in a native Swift data type is worth the complexity costs. There are good reasons for dispatch_data_t to be implemented the way it is, but NSData has always assumed that it is modeling a contiguous area in memory, and it provides users with raw access to the underlying buffer. A cursory examination of a few other languages (Java, Python, Haskell) show that these languages all model binary data as some sort of contiguous array-like construct containing bytes.
I do think, at the very least, overlays should exist to provide initializers that convert between dispatch_data_t in Swift's libdispatch, the native Data type, and NSData (if NSData is still to be a separate type after Data is implemented). If contiguity is not important (and there is no compelling reason for access to raw pointers to be exposed in a public interface), it makes a lot of sense to 'unify' the various data types under a DataProtocol protocol that inherits from RandomAccessCollection.
Another question is how a Data type's API will interact with however Swift eventually decides to handle native serialization/deserialization, but that's almost certainly a >= Swift 4 topic and I won't go into detail right now.
Austin
···
On May 11, 2016, at 3:57 AM, Patrick Smith <pgwsmith@gmail.com> wrote:
Hi Austin,
The proposal looks well fleshed out! Another alternative to consider is the ‘DispatchData’ struct from libdispatch currently being reviewed? Some of additions these could be added as an extension to that type? Or perhaps a protocol could be made ‘DataProtocol’, that has a base set of required methods and a further set of extensions using that base. Then NSData and DispatchData can conform and implement those base methods and each get the functionality. But personally I think it would be nice to make DispatchData the native Swift data type, whether the libdispatch team would accept extensions in the future like this I don’t know, but I think it would be interesting.
Patrick
On 11 May 2016, at 7:37 PM, Austin Zheng via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
Hello swift-evolution,
I've been thinking about a standard library 'Data' type for a while, analogous to NSData in the same way Swift's Arrays and Dictionaries are analogous to NSArrays and NSDictionaries. A first-class container for binary data that is available to every Swift user, conforms to Swift semantics, and is safer and easier to work with than UnsafeBufferPointer seems like a natural fit for the standard library.
As such, I've put together a very preliminary proposal, which can be found here: https://github.com/austinzheng/swift-evolution/blob/d2/proposals/XXXX-stdlib-data.md\. I present it not as a way to impose a vision of what such a Data type should look like, but rather as a way to catalyze discussion (including discussion as to whether or not a Data type is even a good idea in the first place).
Some thoughts:
- It's not clear if the methods to convert to and from base-64 encoded data are necessary. The state flag that tries to mark whether or not a Data represents base-64-encoded string stored as a data may be unnecessary as well.
- I didn't really go into how NSData should be bridged. Special consideration needs to be given to how any native Data type would interact with the overlays described in https://github.com/apple/swift-evolution/blob/master/proposals/0069-swift-mutability-for-foundation.md\. It's possible (and only if a compelling technical reason exists) that the Foundation implementation of NSData can in the future be moved into Swift/supplanted by such a native data type, with API extensions to provide conformance to the Objective-C Foundation API. This proposal should not be seen as an attempt to usurp Foundation's job, though - there are plenty of to-be-value types in Foundation whose inclusion directly in the standard library makes little sense.
- Perhaps Data should be generic over various types of fixed-width integers (signed and unsigned, 8, 16, 32, 64, machine-width, etc). In that case it might also provide generic views (for example, to allow iteration over a Data<UInt64> as if it were a collection of UInt8 bytes). I'm not yet sure if this is feasible or desirable.
Finally, it's possible that this is strictly Swift 4 territory, in which case I'm happy to withdraw from discussion until the time is right later this year.
I think hex encoded string conversion is a important use case. Another idea - instead of making the Data type itself generic, there can be a generic Data.View<T : IntegerType> into the data, or the Data type can come with a number of Views for each of the fixed-width integer types.
Austin
···
On May 11, 2016, at 7:09 AM, Jeremy Pereira <jeremy.j.pereira@googlemail.com> wrote:
On 11 May 2016, at 10:37, Austin Zheng via swift-evolution <swift-evolution@swift.org> wrote:
Hello swift-evolution,
...
Some thoughts:
- It's not clear if the methods to convert to and from base-64 encoded data are necessary. The state flag that tries to mark whether or not a Data represents base-64-encoded string stored as a data may be unnecessary as well.
I would definitely vote for having the base64 conversions in. The state flag would have less utility IMO.
I would also ask that you consider conversions to and from hex encoded strings. If nothing else, it would make it easier to convert git commit hashes to a readable form in my forthcoming git client vapourware.
This is good to know, thanks! I will look into dispatch_data_t's
implementation more closely; I didn't know it was bridged to NSData.
I completely agree that if there are no requirements for a contiguous
buffer, then there should be no requirement to implement a Data object as a
contiguous buffer. There is nothing about the Collection abstraction that
requires a contiguous buffer, anyways.
Austin
···
On Wed, May 11, 2016 at 10:33 AM, Zach Waldowski via swift-evolution < swift-evolution@swift.org> wrote:
On Wed, May 11, 2016, at 11:38 AM, Austin Zheng via swift-evolution wrote:
One question that this brings up is whether supporting non-contiguous data
regions in a native Swift data type is worth the complexity costs. There
are good reasons for dispatch_data_t to be implemented the way it is, but
NSData has always assumed that it is modeling a contiguous area in memory,
and it provides users with raw access to the underlying buffer. A cursory
examination of a few other languages (Java, Python, Haskell) show that
these languages all model binary data as some sort of contiguous array-like
construct containing bytes.
I do not find this convincing.
NSData has not "always assumed" this; that it is transparently bridged
with dispatch_data_t on Darwin contradicts that directly.
It would be prohibitive on efficiency to have to use the
lowest-common-denominator of contiguous bytes to be useful in Swift. XPC
and NSURLSession, among others, both use dispatch_data_t via NSData to
efficiently push large buffers across process boundaries.
That there are complexities involved should not be reason to not address
them. It's 2016 and we don't always deal with buffers of a conveniently
small size, just like we don't deal with Strings that are conveniently
UTF-8. If sufficiently small buffers are the only thing being addressed for
ease, then I don't find the described API that much more valuable than
[UInt8] and UnsafeBufferPointer.
Another language having represented it a certain way does not make it
foregone how Swift must do it. Other languages also lack value types,
Unicode-correct strings, or memory safety. Swift is living proof that doing
things the way C or Java did is not the automatic solution.
(By the way, I re-read the quoted part of your message, and I realized the
question I posed sounded like a rhetorical question. That wasn't my intent.
I'm sorry about that, and I appreciate you explaining why non-contiguous
data buffers have been important for certain use cases.)
Austin
···
On Wed, May 11, 2016 at 10:47 AM, Austin Zheng <austinzheng@gmail.com> wrote:
This is good to know, thanks! I will look into dispatch_data_t's
implementation more closely; I didn't know it was bridged to NSData.
I completely agree that if there are no requirements for a contiguous
buffer, then there should be no requirement to implement a Data object as a
contiguous buffer. There is nothing about the Collection abstraction that
requires a contiguous buffer, anyways.
Austin
On Wed, May 11, 2016 at 10:33 AM, Zach Waldowski via swift-evolution < > swift-evolution@swift.org> wrote:
On Wed, May 11, 2016, at 11:38 AM, Austin Zheng via swift-evolution wrote:
One question that this brings up is whether supporting non-contiguous
data regions in a native Swift data type is worth the complexity costs.
There are good reasons for dispatch_data_t to be implemented the way it is,
but NSData has always assumed that it is modeling a contiguous area in
memory, and it provides users with raw access to the underlying buffer. A
cursory examination of a few other languages (Java, Python, Haskell) show
that these languages all model binary data as some sort of contiguous
array-like construct containing bytes.
I do not find this convincing.
NSData has not "always assumed" this; that it is transparently bridged
with dispatch_data_t on Darwin contradicts that directly.
It would be prohibitive on efficiency to have to use the
lowest-common-denominator of contiguous bytes to be useful in Swift. XPC
and NSURLSession, among others, both use dispatch_data_t via NSData to
efficiently push large buffers across process boundaries.
That there are complexities involved should not be reason to not address
them. It's 2016 and we don't always deal with buffers of a conveniently
small size, just like we don't deal with Strings that are conveniently
UTF-8. If sufficiently small buffers are the only thing being addressed for
ease, then I don't find the described API that much more valuable than
[UInt8] and UnsafeBufferPointer.
Another language having represented it a certain way does not make it
foregone how Swift must do it. Other languages also lack value types,
Unicode-correct strings, or memory safety. Swift is living proof that doing
things the way C or Java did is not the automatic solution.
Thanks for the feedback! I'm glad that we could start a conversation on the
lists, and happy to see people offering their unvarnished opinions.
I think conditional conformances upon Array<UInt8> is definitely an avenue
worth exploring. I'm not sure what the performance implications are - Zach
brought up use cases in which the ability for a data type to be backed by
non-contiguous storage was important. More generally, I wanted to open up
discussion as to what people wanted from a native Data type.
It seems like a DataProtocol-like protocol may be a good idea. Array<UInt8>
could conform through conditional conformances to provide an implementation
for people wanting a simple contiguous buffer that could be punned to an
array or other linear collection, while a more robust dispatch_data_t-like
conforming Swift stdlib type could be provided for more demanding use
cases. This actually seems to be a good fit - if you only care about a data
buffer as an arbitrary collection of bytes, the abstract protocol interface
gives you flexibility, while if you have requirements that require a
specific representation of data in memory you should use a concrete type.
Best,
Austin
···
On Wed, May 11, 2016 at 11:01 AM, Dmitri Gribenko <gribozavr@gmail.com> wrote:
On Wed, May 11, 2016 at 2:37 AM, Austin Zheng via swift-evolution > <swift-evolution@swift.org> wrote:
> Hello swift-evolution,
>
> I've been thinking about a standard library 'Data' type for a while,
> analogous to NSData in the same way Swift's Arrays and Dictionaries are
> analogous to NSArrays and NSDictionaries. A first-class container for
binary
> data that is available to every Swift user, conforms to Swift semantics,
and
> is safer and easier to work with than UnsafeBufferPointer seems like a
> natural fit for the standard library.
Hi Austin,
This is an interesting territory!
One thing that I would like to suggest for us to consider is
justifying why Data needs to be a separate type from Array<Int8> and
Array<UInt8>. We can add conditional extensions to Array of Int8 and
UInt8 if we find that existing NSData/dispatch_data_t usecases need a
few special APIs that won't make sense on arrays in general.
For example, something that I would imagine people want to do with
"data buffer" types is being able to make an unaligned or type punned
load or store. For example, in Java, this is one of the primary
usecases for a type similar in spirit, java.nio.ByteBuffer
(JDK 21 Documentation - Home).
Another usecase that is a crossover between Array and Data, allow
Array to (unsafely) adopt ownership of an existing initialized unsafe
buffer pointer. We had quite a few requests for this. Do you think
this is an interesting usecase? Does it overlap with this discussion?
NSData has not "always assumed" this; that it is transparently bridged
with dispatch_data_t on Darwin contradicts that directly.
It would be prohibitive on efficiency to have to use the lowest-common-
denominator of contiguous bytes to be useful in Swift. XPC and
NSURLSession, among others, both use dispatch_data_t via NSData to
efficiently push large buffers across process boundaries.
That there are complexities involved should not be reason to not address
them. It's 2016 and we don't always deal with buffers of a conveniently
small size, just like we don't deal with Strings that are conveniently
UTF-8. If sufficiently small buffers are the only thing being addressed
for ease, then I don't find the described API that much more valuable
than [UInt8] and UnsafeBufferPointer.
Another language having represented it a certain way does not make it
foregone how Swift must do it. Other languages also lack value types,
Unicode-correct strings, or memory safety. Swift is living proof that
doing things the way C or Java did is not the automatic solution.
Zachary Waldowski
zach@waldowski.me
···
On Wed, May 11, 2016, at 11:38 AM, Austin Zheng via swift-evolution wrote:
One question that this brings up is whether supporting non-contiguous
data regions in a native Swift data type is worth the complexity
costs. There are good reasons for dispatch_data_t to be implemented
the way it is, but NSData has always assumed that it is modeling a
contiguous area in memory, and it provides users with raw access to
the underlying buffer. A cursory examination of a few other languages
(Java, Python, Haskell) show that these languages all model binary
data as some sort of contiguous array-like construct containing bytes.
In short, much of the API interface has been extracted into a `Data` protocol; two concrete implementations (one exploiting Swift 3's conditional protocol conformances) can be used for different purposes. The API should properly model data objects using both contiguous and non-contiguous backing stores.
Further thoughts, opinions, criticism, or just ideas as to what a great `Data` type would be capable of doing are much appreciated. Thanks again!
Best,
Austin
···
On May 11, 2016, at 11:29 AM, Austin Zheng <austinzheng@gmail.com> wrote:
Hi Dmitri,
Thanks for the feedback! I'm glad that we could start a conversation on the lists, and happy to see people offering their unvarnished opinions.
I think conditional conformances upon Array<UInt8> is definitely an avenue worth exploring. I'm not sure what the performance implications are - Zach brought up use cases in which the ability for a data type to be backed by non-contiguous storage was important. More generally, I wanted to open up discussion as to what people wanted from a native Data type.
It seems like a DataProtocol-like protocol may be a good idea. Array<UInt8> could conform through conditional conformances to provide an implementation for people wanting a simple contiguous buffer that could be punned to an array or other linear collection, while a more robust dispatch_data_t-like conforming Swift stdlib type could be provided for more demanding use cases. This actually seems to be a good fit - if you only care about a data buffer as an arbitrary collection of bytes, the abstract protocol interface gives you flexibility, while if you have requirements that require a specific representation of data in memory you should use a concrete type.
Best,
Austin
On Wed, May 11, 2016 at 11:01 AM, Dmitri Gribenko <gribozavr@gmail.com <mailto:gribozavr@gmail.com>> wrote:
On Wed, May 11, 2016 at 2:37 AM, Austin Zheng via swift-evolution > <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
> Hello swift-evolution,
>
> I've been thinking about a standard library 'Data' type for a while,
> analogous to NSData in the same way Swift's Arrays and Dictionaries are
> analogous to NSArrays and NSDictionaries. A first-class container for binary
> data that is available to every Swift user, conforms to Swift semantics, and
> is safer and easier to work with than UnsafeBufferPointer seems like a
> natural fit for the standard library.
Hi Austin,
This is an interesting territory!
One thing that I would like to suggest for us to consider is
justifying why Data needs to be a separate type from Array<Int8> and
Array<UInt8>. We can add conditional extensions to Array of Int8 and
UInt8 if we find that existing NSData/dispatch_data_t usecases need a
few special APIs that won't make sense on arrays in general.
For example, something that I would imagine people want to do with
"data buffer" types is being able to make an unaligned or type punned
load or store. For example, in Java, this is one of the primary
usecases for a type similar in spirit, java.nio.ByteBuffer
(https://docs.oracle.com/javase/7/docs/api/java/nio/ByteBuffer.html\).
Another usecase that is a crossover between Array and Data, allow
Array to (unsafely) adopt ownership of an existing initialized unsafe
buffer pointer. We had quite a few requests for this. Do you think
this is an interesting usecase? Does it overlap with this discussion?
I think there is a typo on the initialiser from a hex tuple string. The external name of the first parameter should not be `base64EncodedString`. In fact, I’m not sure why base64String and hexTupleString aren’t OK for the external name of first parameter of those initialisers.
···
On 12 May 2016, at 11:42, Austin Zheng via swift-evolution <swift-evolution@swift.org> wrote:
In short, much of the API interface has been extracted into a `Data` protocol; two concrete implementations (one exploiting Swift 3's conditional protocol conformances) can be used for different purposes. The API should properly model data objects using both contiguous and non-contiguous backing stores.
Further thoughts, opinions, criticism, or just ideas as to what a great `Data` type would be capable of doing are much appreciated. Thanks again!
Best,
Austin
On May 11, 2016, at 11:29 AM, Austin Zheng <austinzheng@gmail.com> wrote:
Hi Dmitri,
Thanks for the feedback! I'm glad that we could start a conversation on the lists, and happy to see people offering their unvarnished opinions.
I think conditional conformances upon Array<UInt8> is definitely an avenue worth exploring. I'm not sure what the performance implications are - Zach brought up use cases in which the ability for a data type to be backed by non-contiguous storage was important. More generally, I wanted to open up discussion as to what people wanted from a native Data type.
It seems like a DataProtocol-like protocol may be a good idea. Array<UInt8> could conform through conditional conformances to provide an implementation for people wanting a simple contiguous buffer that could be punned to an array or other linear collection, while a more robust dispatch_data_t-like conforming Swift stdlib type could be provided for more demanding use cases. This actually seems to be a good fit - if you only care about a data buffer as an arbitrary collection of bytes, the abstract protocol interface gives you flexibility, while if you have requirements that require a specific representation of data in memory you should use a concrete type.
Best,
Austin
On Wed, May 11, 2016 at 11:01 AM, Dmitri Gribenko <gribozavr@gmail.com> wrote:
On Wed, May 11, 2016 at 2:37 AM, Austin Zheng via swift-evolution >> <swift-evolution@swift.org> wrote:
> Hello swift-evolution,
>
> I've been thinking about a standard library 'Data' type for a while,
> analogous to NSData in the same way Swift's Arrays and Dictionaries are
> analogous to NSArrays and NSDictionaries. A first-class container for binary
> data that is available to every Swift user, conforms to Swift semantics, and
> is safer and easier to work with than UnsafeBufferPointer seems like a
> natural fit for the standard library.
Hi Austin,
This is an interesting territory!
One thing that I would like to suggest for us to consider is
justifying why Data needs to be a separate type from Array<Int8> and
Array<UInt8>. We can add conditional extensions to Array of Int8 and
UInt8 if we find that existing NSData/dispatch_data_t usecases need a
few special APIs that won't make sense on arrays in general.
For example, something that I would imagine people want to do with
"data buffer" types is being able to make an unaligned or type punned
load or store. For example, in Java, this is one of the primary
usecases for a type similar in spirit, java.nio.ByteBuffer
(https://docs.oracle.com/javase/7/docs/api/java/nio/ByteBuffer.html\).
Another usecase that is a crossover between Array and Data, allow
Array to (unsafely) adopt ownership of an existing initialized unsafe
buffer pointer. We had quite a few requests for this. Do you think
this is an interesting usecase? Does it overlap with this discussion?