Tiny Pitch: `ByteOrder` type

YOCKOW · August 19, 2024, 6:29am

Latest version is available at gist: ByteOrderType.md · GitHub

`ByteOrder` type

Introduction

You may often take endianness (or byte-order) into account when you get involved in, for example, stream programming. While network byte-order is defined as big endian, you may have to handle some other stream whose endian is unknown in advance. In such case, you want a type that identifies endianness to write code like if theStream.byteOrder == .littleEndian { ... }.

Motivation

CoreFoundation has a such type that represents endianness, and its related definitions look like the following from Swift.

public struct __CFByteOrder: RawRepresentable, Equatable {
    public init(_ rawValue: UInt32)
    public init(rawValue: UInt32)
    public var rawValue: UInt32
}
public var CFByteOrderUnknown: __CFByteOrder { get }
public var CFByteOrderLittleEndian: __CFByteOrder { get }
public var CFByteOrderBigEndian: __CFByteOrder { get }
public typealias CFByteOrder = CFIndex

public func CFByteOrderGetCurrent() -> CFByteOrder

You can implement a type of stream representation as below, if you use directly __CFByteOrder.

import CoreFoundation

public struct TheirStream {
  // Some implementation here.

  /// The byte order of this stream.
  public var byteOrder: __CFByteOrder {
    // returns CFByteOrderUnknown, CFByteOrderLittleEndian, or CFByteOrderBigEndian
  }
}

The issues here are:

__CFByteOrder is double-underscored. That means this type might not be officially public.
CFByteOrder{Unknwon|LittleEndian|BigEndian} is a global variable, not a enum case.
You can use CFByteOrder instead, but it is a typealias of CFIndex which is just an integer.
There are some platforms on which CoreFoundation is unavailable from Swift in the first place.

Proposed solution

The way to solve the issues is simple. It's to add a type for byte-order.
Such kind of type is proposed as a part of the other pitch but is not realized yet.

Detailed design

The type would be implemented as follows if it corresponds to current CF* definitions described above.

/// A type that identifies byte order.
@frozen
public enum ByteOrder: /* UInt32, */ Equatable, Sendable {
  /// The byte order is unknown.
  case unknown // = 0

  /// Multi-byte values are stored with the least-significant bytes stored first.
  /// Pentium CPUs are little endian.
  case littleEndian // = 1

  /// Multi-byte values are stored with the most-significant bytes stored first.
  /// PowerPC CPUs are big endian.
  case bigEndian // = 2

  /// The byte order of the current computer.
  public static var current: ByteOrder {
    #if _endian(little)
    return .littleEndian
    #elseif _endian(big)
    return .bigEndian
    #else
    return .unknown
    #endif
  }
}

Source compatibility

There are no source compatibility concerns because just a new type is added by this proposal.

ABI compatibility

Adding the new type does not affect ABI.

Implications on adoption

This feature can be freely adopted and un-adopted in source
code with no deployment constraints and without affecting source or ABI
compatibility.

Future directions

Becomes a basis of any other features

This proposal is inspired by the other pitch that contains the feature to add this kind of type. This proposal could be a part of that feature.

Furthermore, of course, ByteOrder is so highly versatile (in other words, so simple) that there's the possibility of being adopted by any other features.
For example, you can use it when you want to implement something like Data.View.

Alternatives considered

Other type name

We can choose Endianness or other names instead of ByteOrder.
However, ByteOrder is preferred since the name of the existing type from CoreFoundation comprises "byte-order".

Other static property name

There could be also some alternative names for static var current: ByteOrder { get } such as host or native. However, same as above, current derives from CFByteOrderGetCurrent.

Omit `unknown` endian

There are unusual orderings that are generically called "middle-endian" or "mixed-endian". Swift doesn't currently support such architectures.
We can leave out unknown case if Swift decides never to support such rare endians forever. Otherwise, the type should include unknown case to be @frozen.

`RawRepresentable` conformance

ByteOrder type should conform to RawRepresentable where its RawValue is UInt32 if we let it compatible with __CFByteOrder.

Wait for other evolutions

The change by this proposal is so small that it can be a part of other evolution. Indeed, as described above, this proposal is inspired by the other pitch. On the other hand, there are points to discuss about this type, i.g. alternatives mentioned in this section. That's why we should discuss this small change separately from other proposals.

Use other packages

For example, swift-syntax and swift-nio have their own enum Endianness ^[1] ^[2]. Although you can import them, they are not compatible with each other. Generally speaking, it is not desirable that definitions for the same purpose are scattered. In conclusion, this type should be defined in the standard library.

maartene · August 19, 2024, 7:14am

Sounds like a good idea!

Do you also propose to create bridges to/from __CFByteOrder?

Nickolas_Pohilets · August 19, 2024, 8:37am

Not a fan of having unknown as part of the enumeration. I would prefer to have a type that models known byte order, and combine it with Optional<> when byte order can be unknown.

benrimmington · August 19, 2024, 8:39am

I think CFByteOrderGetCurrent() and NSHostByteOrder() are only available on Apple platforms.

C++20 has std::endian::native in the <bit> header.

C23 will have __STDC_ENDIAN_NATIVE__ in the <stdbit.h> header.

ktoso · August 19, 2024, 8:55am

Wouldn't it be better to hold off introducing such types for when we tackle the long standing issue of introducing a "good" ByteBuffer type? I think we're getting close to being able to do so with the introduction of move-only and non-escaping types. Such new type may have APIs which benefit from having a ByteOrder type then.

Karl · August 19, 2024, 10:04am

Whenever I see code that asks what the native byte order is, it's almost certain the code is either wrong or misguided. [...]

The byte order of the computer doesn't matter much at all except to compiler writers and the like, who fuss over allocation of bytes of memory mapped to register pieces. Chances are you're not a compiler writer, so the computer's byte order shouldn't matter to you one bit.

Notice the phrase "computer's byte order". What does matter is the byte order of a peripheral or encoded data stream, but--and this is the key point--the byte order of the computer doing the processing is irrelevant to the processing of the data itself. If the data stream encodes values with byte order B, then the algorithm to decode the value on computer with byte order C should be about B, not about the relationship between B and C.

Rob Pike - The byte order fallacy

The current APIs offered on Swift's fixed-width integer types help you focus on the endianness of data. If your data contains a big-endian Int32, you call Int32(bigEndian: value) and it gives you the correct numeric value on every machine. Similarly, if you need to produce a big-endian Int32, value.bigEndian.

Other than querying the machine's native endianness, I'm not sure what additional value a ByteOrder type would offer over the existing APIs.

1-877-547-7272 · August 19, 2024, 11:25am

As the post you quoted points out, it is often useful to know the byte order of a data stream so that you can process it correctly. The Unicode Processing APIs pitch contains a type like this because it’s important to know the endianness of a Unicode stream when processing it.

YOCKOW · August 19, 2024, 11:56am

Honestly I'm reluctant to do it because CoreFoundation via Swift is not flawlessly cross-platform.

At least CFByteOrderGetCurrent() is also available on Linux: Code
At any rate, CF-APIs seem not to be available on Windows.
A bona-fide cross-platform type/API is desirable.

Definitely.
I was too affected by CoreFoundation’s C-style API.

Sorry, but I didn’t write enough background of this pitch.
This pitch has an aspect of response to the other pitch “Unicode Processing APIs” which include introducing such type.
I thought it is necessary to discuss about the type itself separately.

j-f1 · August 19, 2024, 1:13pm

It looks like there are two groups of APIs in the standard library that already use endianness: the littleEndian and bigEndian initializers/properties on FixedWidthInteger and the endianness-aware UTF-16/32 String encodings. Would it make sense to also pitch new APIs for those that take a ByteOrder parameter to allow for dynamic conversion?

YOCKOW · August 20, 2024, 1:00am

It can be one of future directions.
The reason why I focus on this ByteOrder type itself is that I think there are not a few points to discuss.

YOCKOW · August 22, 2024, 7:36am

I'm not sure how many folks are interested in this, but updated the pitch and uploaded it to gist.

What's changed:

Explicitly mentions which proposal inspires this proposal.
Explicitly mentions that this proposal is focusing on the type itself.
Removed unknown case and let current return an optional type.

dwaite · August 30, 2024, 2:57am

It is specifically a parameter in the initializer for processing a buffer, e.g. no different from having init(bigEndian: ) and the like.

The need to capture byte order alongside some unprocessed data is really an application consideration.

For instance, an application that wants to defer the expensive conversion of received text in various character encoding to a UTF-8 buffer until it is needed might encapsulate a buffer along with an input character encoding. These could be supplied in an initializer to a particular type. But the ability to read this character encoding out of the created object breaks SOLID principles.

Let's also not forget that middle-endianness is a thing - there aren't just two endiannesses (sp?). Codifying that only two will ever be supportable might be fine for Apple's CoreFoundation (e.g. endianness as a platform-wide concept), but not appropriate for a programming language (endianness as a language-wide concept).

YOCKOW · August 30, 2024, 6:32am

ByteOrder type provides one solution for every application.
Current status is that each applications/modules implement their own such types independently. As I wrote in "Alternatives considered" section, for example, swift-syntax and swift-nio have their own enum Endianness which are incompatible with each other.
If ByteOrder type is realized in standard library, they can be unified.

We are aware of the existence of such endianness, but vacillate over how to express it:

case unknown (original pitch)
Optional<ByteOrder>.none (current pitch)
case middleEndian or case mixedEndian (alternatives)

mattcox · September 3, 2024, 8:24am

This would be useful to have.

I do something similar in my Pack library, and it would be helpful to have a built in.

https://github.com/mattcox/Pack/blob/main/Sources/Pack/ByteOrder.swift

johannesweiss · September 3, 2024, 10:09am

I agree with @Karl here. Querying the native byte order is almost always wrong.

YOCKOW · September 5, 2024, 9:43am

I get interested in the fact that there could be discussion about raison d'être for API to get current (host, or native) byte order.

tevamerlin · September 5, 2024, 1:00pm

The way I see it, it’s more useful to have functions for converting data to/from a given byte order. You don’t really need to know the current byte order, then.

But maybe this is not what the previous posters had in mind.

YOCKOW · September 7, 2024, 2:05am

Updated gist of this pitch to mention the raison d'être for API that enables to get current (host, or native) byte order.