Usability of pointers in Swift

I had always assumed we would add a ReinterpretedPointer<T>. It's straightforward to implement on top of a raw pointer. The main argument against it initially was that we already have too many pointer types :slight_smile:

Since then, frankly, I haven't been able to make a good argument for including it the standard library. Almost all the problems people hit with Swift pointers stem from C/ObjC interop. For native Swift memory buffers, we would get a lot more mileage out of adding a safe ByteBuffer type with a non-pointer-based API for reading/writing arbitrary types. Or, when people really want pointers, which we should discourage, it could vend a typed pointer over some limited scope.

It would still be nice to have a ReinterpretedPointer<T> package that people can reach for, just so they aren't tempted to extend UnsafePointer<T> and do memory binding nonsense under the hood... that blows a hole of undefined behavior wide open in the hull of type safety.

Currently, the biggest reason that people are forced to reach for "memory binding" is that we still haven't implemented [SR-10246] Support limited implicit pointer conversion when calling C functions UnsafeRawPointer to UnsafePointer<CChar>?

9 Likes

+1

yes.

To sum up my own difficulties with the pointers API:

  1. They are all named "unsafe", but some are more unsafe than others.

    Look no further than above for one example.

  2. Swift pointers create an "axiomatic" system of guarantees against know traps, but both axioms and traps are obscure/unknown to many developers, and practical problem solving need theorems more than axioms.

    Many developers are looking for an Arithmetic Rope on their desk, rather than a copy of Euclid's Elements in their bookshelf. The first is easy to use and never lies, when the latter requires you to derive proofs (and make mistakes on the way).

  3. In the traps I can remember from memory (aliasing, wrong alignment, uninitialized memory, the difference between assumingMemoryBound and bindMemory and the rationale behind it, short-lived pointers, ...), some are "classic" (can be found in other languages, and Googled), and some are specific to Swift (surprise!).

I totally dig that a sane and sound "axiomatic" system needed to be built first. But what's next? Who will derive the few simple and useful theorems that will make pointers easy to use?

7 Likes

On the specific issue of short-lived pointers and the pyramid of doom they create.

Such short-lived pointers are provided explicitly by withUnsafePointer(to:_:), ContiguousBytes.withUnsafeBytes(_:), String.withUTF8(_:), etc.

They also are provided implicitly with the & operator, but this is not the topic of this post.

Short-lived pointers are desirable. They allow Swift to efficiently manage the location of values until a pointer to them is requested. To name a few locations: none (a compile-time constant), in a register, encoded in a tagged pointer, actually stored in a memory slot. A location can be shared by several variables if the optimizer can prove it's ok.

Closure-based access to short-lived pointers, which means runtime management of those pointers, is necessary until the Swift compiler is able to perform syntactic analysis of the life-time of such pointers. Since Rust we know that this requires a lot of work around ownership, move-only types, etc.

So I strongly believe it will get much better. But not quite tomorrow.

Yet I'd like to stress out that improving the pointer API is a desired use case for the ownership manifesto.

The problem is that whether something is a collection or not is not a runtime concern, it's a compile-time concerning. Being a collection is inherent in the type: a type either is or is not a collection, it cannot be decided later.

It could be arranged that using the collection methods crashes, but that's pretty unfriendly, and requires larger lifetime analysis of the program to know if any operation will be safe. It's easier to have the type system encode whether you concretely know the length of a pointer or not.

If the buffer pointer types were very complex, or could be interchangeably used with the regular pointers, I'd agree with you. However, they aren't and cannot be. This does make working with them easy though: if you have an UnsafeBufferPointer<Foo> and an API that takes an UnsafePointer<Foo>, Swift will throw up a type error. Then you'll look for ways to get an UnsafePointer<Foo> and quickly find .baseAddress on UnsafeBufferPointer<Foo>. This will help solidify your mental understanding of the buffer pointer (again, pointer and length), and then you'll move on.

My position is that the real complexity is in the raw pointers, but that the buffer pointers are an easier target because they seem so trivial. My view is this triviality is good: a simple type gives a lot of power. We would be better served dealing with the triple-distinction between typed, raw, and opaque than we would be by removing buffers.

(Apropos of nothing: in my projects I aggressively create buffer pointers from regular pointers wherever I can, expressly because they are so much more useful.)

1 Like

They are all equally unsafe: unsafe is a term of art with a specific meaning in Swift APIs (it refers to memory safety). The above example is not memory-unsafe.

All right, I meant "error-prone", then. Apologies to any reader who was confused. I still hope my post made some sense for the sensible readers, especially the people in charge of this API.

Doesn't C++ already have lots of pointer types that do the same things as the ones in Swift, just written as separate words instead of mashing them into one word? Correct me if I'm wrong, I don't program in C++ much

Swift C++
immutable/mutable pointer let/var const/""
immutable/mutable pointee ""/Mutable const/"" (in a different place)
points to nonspecific type Raw void
reference counted pointer normal class std::shared_ptr
weak pointer weak var std::weak_ptr
move-only pointer Swift doesn't have move-only types yet std::unique_ptr
pointer to array Buffer foo bar[123] or foo* with null termination or foo* and passing second length parameter or remembering the length if it's constant

You're presenting this a something that has to be a particular way, but I think this seems like a subjective matter of design. For example, as we touched on above, C doesn't make a distinction: int* could refer to a single int or an array of ints. In general the compiler doesn't know.

It seems way more friendly than the current state of affairs, to me. You get cleaner easier-to-read code, I would think. The price would be some runtime checks.

As to lifetime analysis, I don't know why it would be really different. Maybe you could explain more?

it seems like your complaints boil down to "it's not how C does it". Not being like C with respect to memory management is one of Swift's design goals. You're not going to get much traction arguing in this vein.

5 Likes

I think that's pretty flippant and dismissive of some of the details I've been expressing, but hey if you want to be like that.

AFAIK, in C++ there's no attempt to prevent the aliasing issue mentioned by @Joe_Groff above, and there isn't a bounds-checked pointer type like UnsafeBufferPointer. I think you could build both things if you wanted to. I think it's telling that nothing like that ever caught on in C++.

Also, in C/C++, it's way easier to convert between pointer types.

But anyway, it does seem like folks have some ideas for making this stuff more user-friendly, so I should probably just wait and see what they come up with :slight_smile:

Wouldn't strict aliasing rule be their attempt?

1 Like

Handling aliasing problems is in an ongoing research problem in C and C++, and there are still-unsolved semantic problems with the memory model. C++17 adds magic functions to try to paper over some classes of aliasing problems. That you haven't run into them personally is some combination of luck and the ongoing negotiation between the standards bodies, compiler implementers, and real codebases trying to keep the whole mess working. And although standard C++ has not historically included a "buffer pointer" type, nearly every codebase I've worked with has had one of its own, and C++17 finally standardizes this by adding string_view and span. Swift's API definitely needs improvement, but the memory model was designed by folks deeply familiar with C and C++'s design, and I think it does a decent job of avoiding many of the fundamental problems you end up if you stare too deeply into C's model. Hopefully the API will catch up with the model someday.

14 Likes

I'm making a statement about Swift. In Swift, the question of whether type T conforms to Collection is a static one: it either does or does not.

That's fair enough, and I think we just have a difference of opinion here.

What I'm getting at is that, in your world where all pointers are collections, if I write code like this:

func countTheZeroes(_ ptr: UnsafePointer<UInt8>) -> Int {
    return ptr.lazy.filter { $0 != 0 }.count
}

I don't know if this code will crash or not (for reasons other than SIGSEGV). The only way to know is to follow every pointer in the program that is ever passed into this code and find out where it came from. This is what static analysers do in other languages.

However, if the parameter is an UnsafeBufferPointer<UInt8> instead I am confident that this will not crash. Furthermore, if I ever get a SIGSEGV out of the code I know that somewhere in my code I construct an UnsafeBufferPointer with invalid length, and so can audit only those call sites, rather than everywhere a pointer may have entered my program from C.

1 Like

Thanks for the example. It strikes me that using ! on an optional can crash in a similar way, and would require the same sort of static analysis, but doesn't seem to cause the same sort of concerns. Is that true? Can you help me understand why that wouldn't be analogous?

It does cause the same kind of concern. That's why you must write !, instead of it always being implicit. Also, dereferencing a nil optional always traps. With pointers, anything can happen, including nothing at all. That means pointers are far more unsafe than Optionals and its "unsafe" affordances.

4 Likes

As @avi says, it absolutely is analogous, which is why you must state “I know I am doing something risky”. The SSWG’s guidance (intended as an example of a policy, not necessarily as an endorsement) on ! is that either it should be replaced with a safe alternative that handles the risk of being nil, or it should be possible to describe in a code comment why the ! is either impossible to trigger or the crash is acceptable.

As to “it doesn’t share the same concerns”, you’ll find many people on this forum who consider the appearance of ! in a codebase to be entirely unacceptable in all circumstances.

1 Like

I'd think that the presence of the UnsafePointer would be a good signal of doing something risky.

Anyway, I was just trying to contribute some idea based on your love of the buffer pointers rather than just complaining about the API.