Constant u16 data

bbrk24 · October 13, 2023, 7:06pm

I know that string literals are emitted into the binary, but array literals are constructed at runtime. So, in some cases, it's preferable to use string literals over array literals for constant data.
What if the constant data has 16-bit elements rather than 8-bit? As I see it my options are:

use an array literal and accept the penalties.
use a macro to create a string literal, and either:
- design it so that the UTF-16 view has the elements I want. Unfortunately, iteration will convert it one value at a time rather than storing it in UTF-16 in the first place, so I'm not sure whether this is better or worse than the previous option.
- use it as a StaticString, and unsafely treat the UTF-8 pointer as UnsafePointer<UInt16>. I don't know if this is guaranteed to have the right alignment so I may have to use UnsafeRawPointer.loadUnaligned instead. There's also a potential endianness concern.
put it in a separate binary file and include that as a resource, rather than embedding the data directly in the code. I have a feeling that file I/O will be slower than any of the other options.

David_Smith · October 13, 2023, 7:08pm

Another option (which the standard library uses in some places) is to just declare it in a C header and import it

wadetregaskis · October 13, 2023, 8:07pm

I've wondered about this myself, re. why array (and dictionary) literals aren't "pre-compiled" into their final form, by the compiler. I haven't checked properly, but I get the impression from all the @inlineable properties & methods that their layout is essentially baked into the calling module anyway…? Though I'm not sure how bridging integrates with that.

My interest is just academic - I haven't noticed any performance bottlenecks from this - but especially with growing interest in using Swift in the embedded space, any deviation from C's ability to trivially declare such things seems like a potential barrier to broader adoption.

dmt · October 13, 2023, 11:06pm

I'd prefer this to be solved in a more generalized way than some arbitrary ad-hoc hacks for simple cases like arrays of integers.
Unlike C structs, typescript's "types", structs in Swift aren't just conjunctions or disjunctions of their internals. They can only be constructed with their inits. Some inits are trivial - simple memberwise assignment from arguments to fields, pretty much like in tuples. But in general they can be much more complicated, be failable, perform validation, transformation, etc. This advanced logic sometimes can depend on things that are only available at runtime (OS version checking, reading shared memory, or simply use malloc). Moreover structs can be resilient and their respective inits can be non-inlinable, which means the actual code of init will be available only at runtime. So in general it's impossible to construct a "final" binary representation of a struct at compile time.
But in many cases it certainly is. And to achieve this we need to somehow segregate one inits from the others. C++ does this with constexpr. Requiring a constructor to be a constexpr implies a lot of limitations of what it can do inside the body, but as a result it can be invoked at compile time.
Ideally I'd like to have a full featured VM in the compiler to interpret any language statements including loops and if-else statements. But even under the restrictions of constexpr we can go pretty far.

bbrk24 · October 13, 2023, 11:36pm

This is all very true, but constexpr in C++ is a Whole Thing. From my experience it’s a function color and a lot of things tend to just be marked constexpr almost by default, even if it’ll never be evaluated at compile-time.

jrose · October 14, 2023, 1:03am

I dislike function colors as much as the next person, but constexpr is a function color in C++ (and Rust) because it’s an API stability promise. If Swift gets a generalized form of constexpr, it will almost certainly be an explicit annotation on public/open functions.

Meanwhile, the compiler often does optimize an array into constant data already (though not dictionaries or anything else), with only the class pointer being lazily initialized at run time. What we lack is a way to guarantee that, and honestly I’d take a hardcoded solution for certain stdlib types sooner as long as it didn’t preclude something generalized in the future.

Karl · October 14, 2023, 1:17am

Yeah, I think you should be able to rely on the optimiser to emit static POD values (and arrays of PODs) as constant data.

Like, when I use Array<Int>, I don't have to explicitly tell the compiler to generate a specialisation of Array for Int using some special attribute; I can just trust that it will do so given the optimisation guidelines I give it. This is clearly a win for optimised code, so I shouldn't need to make my Arrays constexpr to get this.

If compile-time constants are to become part of the language model, it should be because there is a semantically important reason for such things to exist -- e.g. because the array needs to have a fixed static address (which is not guaranteed or exposed at all right now). The problem is that it becomes infectious, and everything becomes constexpr by default, just in case somebody wants to use it as part of a constexpr computation.

FWIW: Here's how the largest Unicode tables used by WebURL compile down. There's a lot of compile-generated setup code in main: that I would really love to not exist, but I think/hope we can do that with just optimiser improvements. I shouldn't need to annotate anything with constexpr for that to happen.

dmt · October 14, 2023, 5:42pm

I think it would be a mistake to think about something like constexpr is Swift as just an optimization machinery. Inherently it's a much broader concept. One of the other usecases is Generic value parameters.
If we want to build types representing a constantly sized Vector(or matrix/tensors) its declaration will look something like struct Vector<Element, let Count: Int>, and its usage is like let v: Vector<Int, 42>. The compiler have to transform somehow the literal 42 into something that can be used in generic context and static context. And this is semantically different from optimization.
Ok, we might be able to hardcode this into the compiler for easy types. But how far should we go with this? Should we support Ranges? E.g. should we be able to construct type Clamped<Scalar: Comparable, let Range: ClosedRange<Scalar>>? What if Scalar type is a user defined type?
Another example can be found in this recent thread.
Yet another interesting example is using compile time constants in type contexts which is particularly useful with Union types:

let x: "foo" ∪ "bar"

x is a variable with a type that's isomorphic to Bool, but whose values could be either "foo" or "bar".
This kind of "types" can be used as discriminators when you parse something:

struct Foo: Codable {
  var type: "foo"
  var f: Int
}
struct Bar: Codable {
  var type: "bar"
  var f: Float
}
let fooOrBar = (Foo ∪ Bar).init(from: decoder)

benrimmington · October 15, 2023, 10:56am

SwiftPM 5.9 generates array literals for embedded resources:

Basic support for a new .embedInCode resource rule which allows embedding the contents of the resource into the executable code by generating a byte array, e.g.
struct PackageResources {
  static let best_txt: [UInt8] = [104,101,108,108,111,32,119,111,114,108,100,10]
}

An issue of slow debug builds has been reported, when using larger resources.