How to create a static/constant `UnsafePointer`?

Using StaticString.utf8Start, it is possible to get the contents of a static/constant string as an UnsafePointer<UInt8>. Presumably, the contents of the string are stored in the data segment of the program, and therefore have a constant reusable pointer that doesn't need to be heap allocated.

But what if I want to create a pointer to a byte array that is not a valid UTF8 string, but rather, a set of arbitrary bytes?

I reckon I could always define a const byte array in C, and then use it from Swift. But I'd rather have a pure-Swift solution, if at all possible.

Is there any such option in Swift?

Make it global?

// at top level:
typealias B = UInt8
var bytes: (B, B, B, .....) = (42, 24, .....)

That creates a tuple, though. As far as I know, there's no way to convert a tuple to an UnsafePointer that isn't scoped (Limited to a function call, and can't be stored as an immutable global)

I believe this works "de facto" for a global tuple:

let address = withUnsafePointer(to: &bytes) { $0 }

Although as they'd say this is triggering UB "de jure" and thus could cause the moon crashing into the earth in the next Swift release. To not risk us all you could put your code in some function, right?

withUnsafePointer(to: &bytes) { address in
    main(address)
}

func main(_ address: UnsafePointer<UInt8>) {
    // use address here
}

or stash it into a global to not pass it through as a parameter:

var address: UnsafePointer<UInt8>!

withUnsafePointer(to: &bytes) {
    address = $0
    main()
    address = nil
}

func main() {
    // use address here
}

I want to use it in a library, so wrapping main is not an option. I'd also appreciate if it doesn't generate any code, since almost all platforms allow a data segment. And Swift already uses it for string constants.

The UB option seems decent. Too bad it's UB. And also probably generates at least a bit of a static initializer. But at least the address is recognized as global and constant.

Slight caveat: address can be immutable in the UB case, but bytes cannot. It doesn't matter for the usage, but I assume it matters for the generated code.

Why not declare it as Data? Or you can export it as a function that gives a temporary UnsafePointer<UInt8> to a closure argument?

As I've said, the point is for it to be global, not temporary. I cannot wrap main. It needs to be directly usable.

I don't see how Data could be converted to UnsafePointer. If anything, the opposite is easier, albeit it would involve a dynamic allocation and a memcpy.

1 Like

When you write:

// global
const char data[] = { 42, 24, ... };

in C it is memory mapped from the app executable into a readonly data segment. You can't modify it by normal language means (e.g. via some type casting tricks), you'd have to go and change the memory protection of a particular memory segment first, if that's at all possible (probably not possible from within an unprivileged app). Looks like OP wants to get as close to this as possible without using C. Or at least avoid the heap allocation, and an ability to accidentally change memory that happens to be in a "var" instead of a "let".

1 Like

If you have time for that, you could work on a proposal for StaticData, similar to StaticString. But as part of proposal text, you would need to provide a convincing motivation for the feature.

I'm curious how you are using this pointer, and why temporary pointers don't suit you.

Here's a heavy weight idea: put your static data into a standalone file, put that file into the app's resources, memory map that file making it readonly. A bit heavy but ticks the other boxes (readonly, avoids heap allocation). Looking at this option declaring static data in a C file starts looking appealing :slight_smile:

2 Likes

Taking the address of a mutable variable and escaping it is de facto UB, regardless of whether it's global or not, because you cannot mutate a variable outside of an exclusivity scope.

Using the tools the language has today, you could declare an immutable global array constant, and escape the result of withUnsafeBufferPointer. The array constant will probably be optimized into a static constant in the binary, and its global-ness and immutability ensures the buffer won't be reallocated, copied, or deallocated, and no exclusivity scopes will be asserted on it. There isn't a defined way in Swift today to reserve global mutable static state that you can safely persist a pointer to.

3 Likes

I wasn't looking for a global mutable static state. If anything, immutable is preferred.

Can you explain the array method in more detail? If I do [UInt8]([1, 2, 3]).withUnsafeBufferPointer, and just return the result, is this considered "safe", and is the compiler likely to optimize away the function call into a compile-time constant?

This is currently the best/most robust way, especially if you want a stable pointer to the contents.

You won't get any guarantees of static initialisation other than StaticString. You could be like me and stubbornly generate static tables as Swift anyway, but I'm not sure I'd recommend it. For me it's more about experimentation and trying to prompt compiler improvements.

1 Like

Strictly speaking, it's against the rules, but there's no practical way it could become unsafe in practice. This would be formalized using nonescaping types where a value dependent on an immortal value is itself immortal. And it looks like it does compile down to a global array constant and a direct pointer access: Compiler Explorer

2 Likes

But this is just an optimization, in a debug build it's very clearly a violation because we allocate the array in the function and escape the pointer from the function.

1 Like

Part of me wishes that StaticString didn't require UTF-8 valid text. It'd be ugly, but I could get by with encoding large binary blobs of data if I could just do something like let x: StaticString = "\xF0..." (an escape sequence which of course doesn't exist in Swift) and not worry about obscure type checker, lowering, or optimizer behavior if I try to do it with tuples or arrays.

1 Like

Good point. You still have to actually assign the array literal to a global let constant.

Using a global let and without -O, it seems the array pointer is being treated as a lazily evaluated value. That is, it generates a function that calculates the value of the pointer, caches it, and returns the value. And that function is called every time the value is accessed.

Then again, while inefficient, I think it retains correctness: The array is stored globally as immutable, so the pointer produced from it is unlikely to move. Additionally, lazy initialization means the library itself doesn't need to export some "initialization" function for the client to call. It does make the function thread unsafe, due to the risk of parallel one-time initialization making it not-so-one-time, but that's a lesser concern.

But this does make me think maybe I do need to make a pitch for something that guarantees code-free initialization, with or without optimizations. And all of this without considering more "complex" data that is knowable at compile time, like hard-coded pointers to other data buffers, or function addresses.

1 Like

This is what I am getting "de facto":

import Foundation

typealias B = UInt8

struct S {
    static var bytes: (B, B, B, B, B, B, B, B, B, B, B, B, B, B, B, B, B, B) = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
}
func bar() -> UnsafeMutablePointer<UInt8> {
    withUnsafeMutableBytes(of: &S.bytes) { bp in
        bp.baseAddress!.assumingMemoryBound(to: UInt8.self)
    }
}

let r = bar()
print("bar address", r) // 0x000000010000c018 (Static data)
print("bar", r[0], r[1], r[2]) // bar 1 2 3
print(S.bytes) // (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
r[1] = 42
print("bar", r[0], r[1], r[2]) // bar 1 42 3
print(S.bytes) // (1, 42, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)

Same happens when I declare "bytes" as a global variable. I can see the static address is being returned (checked with vmmap).

So while it is triggering UB, it at least works for me on all tested configurations today (tested on both godbolt and macOS with debug and release).


I can see this working on godbolt (debug+release) and macOS(release) but interestingly not on macOS+debug:

import Foundation
func foo() -> UnsafePointer<UInt8> {
    [UInt8(1),2,3].withUnsafeBufferPointer { $0.baseAddress! }
}
let p = foo()
print("foo", p[0], p[1], p[2]) // 0, 0, 0 (macOS + debug)

When it doesn't work I could see the heap address being returned (and where I works the static data address is being returned).

More elaborate examples have and will be broken by UB. Don't do it.

Yep, like @Alejandro noted, the literal needs to be assigned to a global constant.

We are already looking at improving things in this area soon.

1 Like