Usability of pointers in Swift

audulus · March 30, 2020, 3:13pm

I'm thinking of a pointer type that includes a count just for bounds checking purposes. Treating pointers as collections in such fancy ways leads to the annoying and hard to understand situation we have in Swift.

For any pointer vended from a C API, it will be known somehow to the programmer what the length of the collection is.

Note I sad "provided it doesn't make pointers annoying to use".

audulus · March 30, 2020, 3:19pm

Maybe there's a bit of Swift groupthink here.

You have to ask yourself: Why didn't other languages do this? Rust doesn't AFAICT. C++ could have added a bunch of fancy pseudo-safe pointer types doing similar things in the standard library. I'm sure someone did it in some C++ library, but it never caught on

I mean, it's not like these pointer types are rocket science. Kind of an obvious idea. Why haven't I seen them anywhere else?

What amazes me further is that Swift has to talk to C APIs a fair amount. Especially some of the graphics code I write. So you'd think that easy-as-possibile interoperability for pointers would be a priority.

lukasa · March 30, 2020, 3:52pm

Sure, that can totally be done. I'm just explaining what the buffer pointers are for, and why all pointers cannot be buffer pointers. Not constraining what the pointer types could do. FWIW though, most pointers vended from Swift already are buffer pointers.

Yes, but not known to Swift. I'm only talking about why it's not possible for Swift to vend buffer pointers from C APIs, and therefore there are separate pointer types.

That's possible. But I think you might be mistaking me for someone who thinks the current situation is great, and I don't. I've filed repeated bugs about different parts of the pointer API, I've bent @Andrew_Trick's ear at great length over problems I've had with the API, and many of the complaints above I echo.

All I'm doing is talking about buffer pointers. Why? Because (in my opinion) buffer pointers are great. They're conceptually extremely straightforward (pointer + length), they give pointers a bunch of useful superpowers (safe iteration! functional APIs! straightforward application to generic collection algorithms such as sort!), and they don't excessively burden your work because if you have a buffer pointer and don't want one you can just say .baseAddress and move on with your day. I think they are straight-up the easiest part of the Swift pointer API to understand when coming from C-based languages, and the rewards they grant are vastly larger than the complexity they bring. If you wanted to delete types from the API, I would strongly resist deleting those, and if you did delete them I'd simply re-invent them in my own code.

The complexity in buffer pointers derives only from the complexity of the types they wrap. Many people see types that vend them UnsafeRawBufferPointer, get into a mess around type bindings, and then go "well clearly buffer pointer is dumb". But the problem wasn't the word "buffer", it was the word "raw". UnsafeRawPointer has exactly the same problems, whereas UnsafeBufferPointer<T> does not.

I have many concerns with Swift's pointer APIs, but my complaints are basically identical to @Andrew_Trick's. (Except that OpaquePointer is terrible and should be burned.)

Don't mistake someone defending and explaining what they see as the good parts of the API as someone defending the whole.

audulus · March 30, 2020, 3:56pm

Oh don't worry @lukasa, I was thinking of the whole discussion, not just your part. Glad you have some healthy skepticism about this stuff.

Cool. I will try my best to warm up to them.

audulus · March 30, 2020, 6:48pm

Ok humor me for a moment @lukasa. Say every pointer is a buffer pointer. If you come from a C API, then the count is INT_MAX. If you try to create a collection from a pointer with INT_MAX count, that's a runtime error. Still do bounds checking in debug mode. Seems like that would improve the ergonomics of this a bit, no? What's wrong with that approach?

And what is this Raw stuff? Why not have a pointer to an Int8?

Joe_Groff · March 30, 2020, 6:52pm

Typed pointers in Swift are subject to type-based aliasing rules, like in C or C++. The load and store operations on the Raw variations are explicitly untyped, so you can safely type-pun or load heterogeneous data from them without worrying about undefined behavior from invalid aliasing.

audulus · March 30, 2020, 6:59pm

Whoosh that flew right over my head.

Can you (or someone) give a concrete example of that? I know what pointer aliasing is at least.

Lantua · March 30, 2020, 7:12pm

IIRC in C (haven't used in a very long time), the behaviour of reading union as a different type from last-assigned value is an implementation detail. It was even outright undefined in C++. Maybe it's something similar?

Joe_Groff · March 30, 2020, 7:47pm

So in C, this is undefined behavior, because pointers of different types are not allowed to alias:

float bad(float *p, int *q) {
  *p = 2.0f;
  *q = 0;

  return *p;
}

int main() {
  float x = 1.0f;
  float y = bad(&x, (int*)&x);
  // With clang, this prints `0 2` with -O,
  // because inside `bad`, the optimizer assumes `p` and `q`
  // don't alias, and so assumes that `*p` is still `2.0`
  // even after writing to `q`
  printf("%g %g\n", x, y);
}

And Swift has the same problem if you use typed pointers:

func bad(p: UnsafeMutablePointer<Float>, q: UnsafeMutablePointer<Int32>) -> Float {
  p.pointee = 2
  q.pointee = 0

  return p.pointee
}

var x: Float = 1
let y = withUnsafeMutablePointer(to: &x) { p in
  return p.withMemoryRebound(to: Int32.self, capacity: 1) { q in
    return bad(p: p, q: q)
  }
}
// Could print "0 0" or "0 2" depending on the optimizer's mood
print("\(x) \(y)")

However, using UnsafeRawPointer, you're guaranteed to get defined behavior even if you load and store different types from different pointers aliasing the same memory:

func not_bad(p: UnsafeMutableRawPointer, q: UnsafeMutableRawPointer) -> Float {
  p.storeBytes(of: 2, as: Float.self)
  q.storeBytes(of: 0, as: Int32.self)

  return p.load(as: Float.self)
}

var x: Float = 1
let y = withUnsafeMutableBytes(of: &x) { p in
  return not_bad(p: p.baseAddress!, q: p.baseAddress!)
}
// Will always print "0 0"
print("\(x) \(y)")

Ben_Cohen · March 30, 2020, 8:04pm

Admin note – since this thread is long-running, I've altered the title to be less combatative since it keeps popping to the top of the home screen.

audulus · March 30, 2020, 8:35pm

Ok got it. So that's a problem that needed solving?

I've been programming in C/C++ professionally for nearly two decades doing pointery graphics stuff and I can't recall ever hitting that.

This reminds me of that UX principle that every feature you add incrementally diminishes the usefulness of other features. So when you try to autocomplete on withUnsafe you really have to think about which one you need. At least I do. I'll probably get it eventually.

Joe_Groff · March 30, 2020, 9:07pm

Yes. If we're airing our grievances about Swift pointers here, mine would be that the "raw" semantics are what you really want 90% of the time you're intentionally working with pointers, so it'd be nice if the memory binding and typed pointers in their current form were deemphasized, and instead we had composable views over raw pointers to give you a typed interface without the memory binding and type aliasing restrictions.

audulus · March 30, 2020, 9:08pm

Hey that sounds great. Whatever reduces the number of pointer types

Andrew_Trick · March 31, 2020, 1:55am

I had always assumed we would add a ReinterpretedPointer<T>. It's straightforward to implement on top of a raw pointer. The main argument against it initially was that we already have too many pointer types

Since then, frankly, I haven't been able to make a good argument for including it the standard library. Almost all the problems people hit with Swift pointers stem from C/ObjC interop. For native Swift memory buffers, we would get a lot more mileage out of adding a safe ByteBuffer type with a non-pointer-based API for reading/writing arbitrary types. Or, when people really want pointers, which we should discourage, it could vend a typed pointer over some limited scope.

It would still be nice to have a ReinterpretedPointer<T> package that people can reach for, just so they aren't tempted to extend UnsafePointer<T> and do memory binding nonsense under the hood... that blows a hole of undefined behavior wide open in the hull of type safety.

Currently, the biggest reason that people are forced to reach for "memory binding" is that we still haven't implemented [SR-10246] Support limited implicit pointer conversion when calling C functions UnsafeRawPointer to UnsafePointer<CChar>?

gwendal.roue · March 31, 2020, 6:20am

To sum up my own difficulties with the pointers API:

They are all named "unsafe", but some are more unsafe than others.

Look no further than above for one example.
Swift pointers create an "axiomatic" system of guarantees against know traps, but both axioms and traps are obscure/unknown to many developers, and practical problem solving need theorems more than axioms.

Many developers are looking for an Arithmetic Rope on their desk, rather than a copy of Euclid's Elements in their bookshelf. The first is easy to use and never lies, when the latter requires you to derive proofs (and make mistakes on the way).
In the traps I can remember from memory (aliasing, wrong alignment, uninitialized memory, the difference between assumingMemoryBound and bindMemory and the rationale behind it, short-lived pointers, ...), some are "classic" (can be found in other languages, and Googled), and some are specific to Swift (surprise!).

I totally dig that a sane and sound "axiomatic" system needed to be built first. But what's next? Who will derive the few simple and useful theorems that will make pointers easy to use?

gwendal.roue · March 31, 2020, 7:22am

On the specific issue of short-lived pointers and the pyramid of doom they create.

Such short-lived pointers are provided explicitly by withUnsafePointer(to:_:), ContiguousBytes.withUnsafeBytes(_:), String.withUTF8(_:), etc.

They also are provided implicitly with the & operator, but this is not the topic of this post.

Short-lived pointers are desirable. They allow Swift to efficiently manage the location of values until a pointer to them is requested. To name a few locations: none (a compile-time constant), in a register, encoded in a tagged pointer, actually stored in a memory slot. A location can be shared by several variables if the optimizer can prove it's ok.

Closure-based access to short-lived pointers, which means runtime management of those pointers, is necessary until the Swift compiler is able to perform syntactic analysis of the life-time of such pointers. Since Rust we know that this requires a lot of work around ownership, move-only types, etc.

So I strongly believe it will get much better. But not quite tomorrow.

Yet I'd like to stress out that improving the pointer API is a desired use case for the ownership manifesto.

lukasa · March 31, 2020, 8:59am

The problem is that whether something is a collection or not is not a runtime concern, it's a compile-time concerning. Being a collection is inherent in the type: a type either is or is not a collection, it cannot be decided later.

It could be arranged that using the collection methods crashes, but that's pretty unfriendly, and requires larger lifetime analysis of the program to know if any operation will be safe. It's easier to have the type system encode whether you concretely know the length of a pointer or not.

If the buffer pointer types were very complex, or could be interchangeably used with the regular pointers, I'd agree with you. However, they aren't and cannot be. This does make working with them easy though: if you have an UnsafeBufferPointer<Foo> and an API that takes an UnsafePointer<Foo>, Swift will throw up a type error. Then you'll look for ways to get an UnsafePointer<Foo> and quickly find .baseAddress on UnsafeBufferPointer<Foo>. This will help solidify your mental understanding of the buffer pointer (again, pointer and length), and then you'll move on.

My position is that the real complexity is in the raw pointers, but that the buffer pointers are an easier target because they seem so trivial. My view is this triviality is good: a simple type gives a lot of power. We would be better served dealing with the triple-distinction between typed, raw, and opaque than we would be by removing buffers.

(Apropos of nothing: in my projects I aggressively create buffer pointers from regular pointers wherever I can, expressly because they are so much more useful.)

lukasa · March 31, 2020, 9:10am

They are all equally unsafe: unsafe is a term of art with a specific meaning in Swift APIs (it refers to memory safety). The above example is not memory-unsafe.

gwendal.roue · March 31, 2020, 9:19am

All right, I meant "error-prone", then. Apologies to any reader who was confused. I still hope my post made some sense for the sensible readers, especially the people in charge of this API.

cukr · March 31, 2020, 9:46am

Doesn't C++ already have lots of pointer types that do the same things as the ones in Swift, just written as separate words instead of mashing them into one word? Correct me if I'm wrong, I don't program in C++ much

	Swift	C++
immutable/mutable pointer	let/var	const/""
immutable/mutable pointee	""/Mutable	const/"" (in a different place)
points to nonspecific type	Raw	void
reference counted pointer	normal class	std::shared_ptr
weak pointer	weak var	std::weak_ptr
move-only pointer	Swift doesn't have move-only types yet	std::unique_ptr
pointer to array	Buffer	foo bar[123] or foo* with null termination or foo* and passing second length parameter or remembering the length if it's constant