[late pitch] UnsafeBytes proposal

Michael_Ilseman · August 19, 2016, 5:39pm

It seems like there’s a potential for confusion here, in that people may see “UInt8” and assume there is some kind of typed-ness, even though the whole point is that this is untyped. Adjusting the header comments slightly might help:

/// A non-owning view of raw memory as a collection of bytes.
///
/// Reads and writes on memory via `UnsafeBytes` are untyped operations that
/// do no require binding the memory to a type. These operations are expressed
/// in terms of `UInt8`, though the underlying memory is untyped.

…

You could go even further towards hinting this fact with a `typealias Byte = UInt8`, and use Byte throughout. But, I don’t know if that’s getting too excessive.

I don't think that's too excessive at all. I might even go further and say that we should call it "Untyped" instead of "Byte", to really drive home the point (many people see "byte" and think "8-bit int", which is merely a side effect of CPUs generally not having support for types *other* than ints and floats, rather than a reflection of the true "type" of the data).

- Dave Sweeris
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

‘Byte’ is sufficient, I think.

In some sense, it is typed as bytes. It reflects the fact that anything that is representable to the computer must be expressible as a sequence of bits (the same way we have string de/serialisation — which of course is not to say that the byte representation is good for serialisation purposes). “withUnsafeBytes” can be seen as doing a reversible type conversion the same way LosslessStringConvertible does; only in this case the conversion is free.

Yes. Byte clearly refers to a value's in-memory representation. But typealias Byte = UInt8 would imply the opposite of what needs to be conveyed. The name Byte refers to raw memory being accessed, not the value being returned by the collection. The in-memory value's bytes are loaded from memory and reinterpreted as UInt8 values. UInt8 is the correct type for the value after it is loaded. Calling the collection’s element type Byte sends the wrong message. e.g. [Byte] or UnsafePointer<Byte> would be nonsense.

Keep in mind the important use case is code that needs to work with a collection of UInt8 values without knowing the type of the values in memory. This makes it intuitive and convenient to implement correctly without needing to reason about the Swift-specific notions of raw vs. typed pointers and binding memory to a type.

I agree, and sorry for the diversion.

···

On Aug 19, 2016, at 10:35 AM, Andrew Trick via swift-evolution <swift-evolution@swift.org> wrote:

On Aug 16, 2016, at 7:13 PM, Karl via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On 16 Aug 2016, at 01:14, David Sweeris via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Aug 15, 2016, at 13:55, Michael Ilseman via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

The documentation should be fixed to clarify that the in-memory value is not the same as the loaded value.

-Andy
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

Karl · August 19, 2016, 7:32pm

Well, a byte is a numerical type as much as a UInt8 is. We attach meaning to it (e.g. a memory location), but it’s just a number. Perhaps it shouldn’t be a typealias then (if the alias would have some kind of impure semantics), but its own type which is exactly the same as UInt8. Typing raw memory accesses with `Byte` to indicate that the number was read from raw memory is a good idea for type-safety IMO.

You’d wonder if we could have initialisers for other integer types which take a fixed-size array of `Byte`s - e.g. UInt16(_: [2 * Byte]). That wouldn’t make as much sense with two UInt8s.

Karl

···

On 19 Aug 2016, at 19:35, Andrew Trick <atrick@apple.com> wrote:

On Aug 16, 2016, at 7:13 PM, Karl via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On 16 Aug 2016, at 01:14, David Sweeris via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Aug 15, 2016, at 13:55, Michael Ilseman via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

It seems like there’s a potential for confusion here, in that people may see “UInt8” and assume there is some kind of typed-ness, even though the whole point is that this is untyped. Adjusting the header comments slightly might help:

/// A non-owning view of raw memory as a collection of bytes.
///
/// Reads and writes on memory via `UnsafeBytes` are untyped operations that
/// do no require binding the memory to a type. These operations are expressed
/// in terms of `UInt8`, though the underlying memory is untyped.

…

You could go even further towards hinting this fact with a `typealias Byte = UInt8`, and use Byte throughout. But, I don’t know if that’s getting too excessive.

I don't think that's too excessive at all. I might even go further and say that we should call it "Untyped" instead of "Byte", to really drive home the point (many people see "byte" and think "8-bit int", which is merely a side effect of CPUs generally not having support for types *other* than ints and floats, rather than a reflection of the true "type" of the data).

- Dave Sweeris
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

‘Byte’ is sufficient, I think.

In some sense, it is typed as bytes. It reflects the fact that anything that is representable to the computer must be expressible as a sequence of bits (the same way we have string de/serialisation — which of course is not to say that the byte representation is good for serialisation purposes). “withUnsafeBytes” can be seen as doing a reversible type conversion the same way LosslessStringConvertible does; only in this case the conversion is free.

Yes. Byte clearly refers to a value's in-memory representation. But typealias Byte = UInt8 would imply the opposite of what needs to be conveyed. The name Byte refers to raw memory being accessed, not the value being returned by the collection. The in-memory value's bytes are loaded from memory and reinterpreted as UInt8 values. UInt8 is the correct type for the value after it is loaded. Calling the collection’s element type Byte sends the wrong message. e.g. [Byte] or UnsafePointer<Byte> would be nonsense.

Keep in mind the important use case is code that needs to work with a collection of UInt8 values without knowing the type of the values in memory. This makes it intuitive and convenient to implement correctly without needing to reason about the Swift-specific notions of raw vs. typed pointers and binding memory to a type.

The documentation should be fixed to clarify that the in-memory value is not the same as the loaded value.

-Andy

Andrew_Trick · August 19, 2016, 8:48pm

It seems like there’s a potential for confusion here, in that people may see “UInt8” and assume there is some kind of typed-ness, even though the whole point is that this is untyped. Adjusting the header comments slightly might help:

/// A non-owning view of raw memory as a collection of bytes.
///
/// Reads and writes on memory via `UnsafeBytes` are untyped operations that
/// do no require binding the memory to a type. These operations are expressed
/// in terms of `UInt8`, though the underlying memory is untyped.

…

You could go even further towards hinting this fact with a `typealias Byte = UInt8`, and use Byte throughout. But, I don’t know if that’s getting too excessive.

I don't think that's too excessive at all. I might even go further and say that we should call it "Untyped" instead of "Byte", to really drive home the point (many people see "byte" and think "8-bit int", which is merely a side effect of CPUs generally not having support for types *other* than ints and floats, rather than a reflection of the true "type" of the data).

- Dave Sweeris
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

‘Byte’ is sufficient, I think.

In some sense, it is typed as bytes. It reflects the fact that anything that is representable to the computer must be expressible as a sequence of bits (the same way we have string de/serialisation — which of course is not to say that the byte representation is good for serialisation purposes). “withUnsafeBytes” can be seen as doing a reversible type conversion the same way LosslessStringConvertible does; only in this case the conversion is free.

Yes. Byte clearly refers to a value's in-memory representation. But typealias Byte = UInt8 would imply the opposite of what needs to be conveyed. The name Byte refers to raw memory being accessed, not the value being returned by the collection. The in-memory value's bytes are loaded from memory and reinterpreted as UInt8 values. UInt8 is the correct type for the value after it is loaded. Calling the collection’s element type Byte sends the wrong message. e.g. [Byte] or UnsafePointer<Byte> would be nonsense.

Keep in mind the important use case is code that needs to work with a collection of UInt8 values without knowing the type of the values in memory. This makes it intuitive and convenient to implement correctly without needing to reason about the Swift-specific notions of raw vs. typed pointers and binding memory to a type.

The documentation should be fixed to clarify that the in-memory value is not the same as the loaded value.

-Andy

Well, a byte is a numerical type as much as a UInt8 is. We attach meaning to it (e.g. a memory location), but it’s just a number.

But I thought what Andy's saying is that he's proposing to standardize the usage of the word byte to mean raw memory and not a number?

That’s right. That’s exactly how the name “bytes” is being used in APIs and method names. A byte is not itself a number but it is common practice to reinterpret a byte as a number in [0,256). IMO this isn’t a problem that needs to be fixed.

Perhaps it shouldn’t be a typealias then (if the alias would have some kind of impure semantics), but its own type which is exactly the same as UInt8. Typing raw memory accesses with `Byte` to indicate that the number was read from raw memory is a good idea for type-safety IMO.

You’d wonder if we could have initialisers for other integer types which take a fixed-size array of `Byte`s - e.g. UInt16(_: [2 * Byte]). That wouldn’t make as much sense with two UInt8s.

You would always go through memory to reinterpret the bits. There’s nothing wrong with this if you know the underlying pointer is aligned:

bytes.load(as: UInt16.self)

UInt8 is the right default for the collection API because it’s common practice to work with buffers of [UInt8].

Most use cases are not going to exercise the numeric properties of UInt8, but I don’t see that as a problem in practice.

-Andy

···

On Aug 19, 2016, at 12:43 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
On Fri, Aug 19, 2016 at 2:32 PM, Karl via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On 19 Aug 2016, at 19:35, Andrew Trick <atrick@apple.com <mailto:atrick@apple.com>> wrote:

On Aug 16, 2016, at 7:13 PM, Karl via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On 16 Aug 2016, at 01:14, David Sweeris via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Aug 15, 2016, at 13:55, Michael Ilseman via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

xwu · August 19, 2016, 7:43pm

It seems like there’s a potential for confusion here, in that people may
see “UInt8” and assume there is some kind of typed-ness, even though the
whole point is that this is untyped. Adjusting the header comments slightly
might help:

/// A non-owning view of raw memory as a collection of bytes.
///
/// Reads and writes on memory via `UnsafeBytes` are untyped operations
that
/// do no require binding the memory to a type. These operations are
expressed
/// in terms of `UInt8`, though the underlying memory is untyped.

…

You could go even further towards hinting this fact with a `typealias Byte
= UInt8`, and use Byte throughout. But, I don’t know if that’s getting too
excessive.

I don't think that's too excessive at all. I might even go further and say
that we should call it "Untyped" instead of "Byte", to really drive home
the point (many people see "byte" and think "8-bit int", which is merely a
side effect of CPUs generally not having support for types *other* than
ints and floats, rather than a reflection of the true "type" of the data).

- Dave Sweeris
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

‘Byte’ is sufficient, I think.

In some sense, it is typed as bytes. It reflects the fact that anything
that is representable to the computer must be expressible as a sequence of
bits (the same way we have string de/serialisation — which of course is not
to say that the byte representation is good for serialisation purposes).
“withUnsafeBytes” can be seen as doing a reversible type conversion the
same way LosslessStringConvertible does; only in this case the conversion
is free.

Yes. Byte clearly refers to a value's in-memory representation. But
typealias Byte = UInt8 would imply the opposite of what needs to be
conveyed. The name Byte refers to raw memory being accessed, not the value
being returned by the collection. The in-memory value's bytes are loaded
from memory and reinterpreted as UInt8 values. UInt8 is the correct type
for the value after it is loaded. Calling the collection’s element type
Byte sends the wrong message. e.g. [Byte] or UnsafePointer<Byte> would be
nonsense.

Keep in mind the important use case is code that needs to work with a
collection of UInt8 values without knowing the type of the values in
memory. This makes it intuitive and convenient to implement correctly
without needing to reason about the Swift-specific notions of raw vs. typed
pointers and binding memory to a type.

The documentation should be fixed to clarify that the in-memory value is
not the same as the loaded value.

-Andy

Well, a byte is a numerical type as much as a UInt8 is. We attach meaning
to it (e.g. a memory location), but it’s just a number.

But I thought what Andy's saying is that he's proposing to standardize the
usage of the word byte to mean raw memory and not a number?

···

On Fri, Aug 19, 2016 at 2:32 PM, Karl via swift-evolution < swift-evolution@swift.org> wrote:

On 19 Aug 2016, at 19:35, Andrew Trick <atrick@apple.com> wrote:
On Aug 16, 2016, at 7:13 PM, Karl via swift-evolution < > swift-evolution@swift.org> wrote:
On 16 Aug 2016, at 01:14, David Sweeris via swift-evolution < > swift-evolution@swift.org> wrote:
On Aug 15, 2016, at 13:55, Michael Ilseman via swift-evolution < > swift-evolution@swift.org> wrote:

Perhaps it shouldn’t be a typealias then (if the alias would have some
kind of impure semantics), but its own type which is exactly the same as
UInt8. Typing raw memory accesses with `Byte` to indicate that the number
was read from raw memory is a good idea for type-safety IMO.

You’d wonder if we could have initialisers for other integer types which
take a fixed-size array of `Byte`s - e.g. UInt16(_: [2 * Byte]). That
wouldn’t make as much sense with two UInt8s.

Karl

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution