Usability of pointers in Swift

audulus · March 26, 2020, 5:41pm

No worries. This is all packed/interleaved vertex data coming from ModelIO, so I don't think there are any alignment guarantees. You just get an offset and stride for the position attribute.

Joe_Groff · March 26, 2020, 5:42pm

Making raw pointer loads/stores not require alignment for trivial types is AIUI one of the things @Andrew_Trick hopes to fix in the near term.

audulus · March 26, 2020, 5:43pm

Ok, so can someone do better than my original ugly code?

Seems like my point is being proved here.

scanon · March 26, 2020, 5:50pm

Well, yes and no. This is also proving that your C code may have undefined behavior, which Swift's (verbose, unwieldy) model is protecting you from. In particular, the memcpy in your C code reads four floats each iteration, but only the first three are part of the vertexData position. This means that it may read off the end of the buffer on the final iteration unless vertexData is laid out in memory with explicit padding to allow this.

audulus · March 26, 2020, 6:01pm

Well I undermined my point by not posting my actual Swift code. The line should be:

_ = vdata.copyBytes(to: ptr, from: start..<start+12)

so I had already fixed that.

But figured "oh that won't look good if I post a hard-coded value" so I actually introduced a bug by changing the 12 to MemoryLayout<float3>.size (16). Oops!

And guess what? Swift didn't save me.

Anyway, point still being proven.

Andrew_Trick · March 26, 2020, 8:41pm

I'll completely set aside the issue of using withXXX closure-taking functions to express variable scope; that's an interesting issue on its own that should be a separate thread.

Naming: Swift pointer names are objectively hilarious. While I do not personally buy into Swift's naming philosophy (I think names are for fast recogonition and disambiguation, while descriptions should be in descriptions), I do appreciate that Swift has a consistent philosophy and there are reasons for it. At least I was able to negotiate to withUnsafeMutableBytes down from withUnsafeMutableRawBufferPointer. I would have preferred the type name to be UnsafeMutableBytes, but I can live without it. The more important design goal here is that Swift programmers rarely need to actually type or even read those names. Given the default safety of Swift, the "Unsafe" prefix does need to be there. The proper solution for that is:

use your own type aliases in places where you don't want to be reminded of the unsafety

Documentation: The main issue with Swift programmers hit when working with pointers is that they're trying cast pointer types when they should really avoid that. I want to have:

a basic Swift pointer programming guide that steers programmers away from casting pointers and toward proper use of the UnsafeRawPointer API.
much better compiler diagnostics with educational notes that point programmers directly to that guide, particularly when they unnecessarily use a "memory binding" API.

Type-casting pointers: Whether you're doing this in C or Swift, there are some serious pitfalls that most programmers are blissfully unaware of because their code will run fine, until one day when it doesn't. Swift's goal is absolutely to stop people from doing this. So, yeah it's painful, but not nearly as painful as it should be.

Memory Binding APIs: If you need to use bindMemory or assumingMemoryBound to work with regular memory that just holds one type, then that's a major usability bug. I think fixing those issues should have a fairly high priority. Here's a list of what can be fixed immediately, roughly in decreasing priority:

[SR-10246] Support limited implicit pointer conversion when calling C functions.
[SR-4649] Don't require & to pass value to UnsafeRawPointer
[SR-10273] Add an UnsafeRaw[Buffer]Pointer API for loading and storing unaligned/packed data
[SR-11082] Support withMemoryRebound(to:) with mismatched sizes
[SR-11147] Add withUnsafeElements(to: variable) where variable is any homogenous aggregate
[SR-11156] Add UnsafePointer[KeyPath] API that applies the field's offset
[SR-11087] Add a closure taking API: UnsafeRaw[Mutable][Buffer]Pointer.withMemoryRebound(to:[capacity:])
[SR-11300] Provide a RawPointer version of ManagedBuffer
API for "load from contiguous as: T" (no bug yet)
Allow comparison of raw and typed pointers (no bug filed)
Pointers to uninhabited types, OpaquePointer (no bug?)

The basic problem with your example is that Foundation.Data has erased all your element types. That's a higher-level problem with the APIs that I can't speak to. But without fixing that, we could still make straightforward additive changes to the compiler, standard library, and Foundation.Data, so the code could be written like this:

  // Transform vertexData
  vdata.withMemoryRebound(to: float3.self) { inputBuffer in
    vertexData.withMemoryRebound(to: MCVertex.self) { outputBuffer in
      for i in 0..<mesh.vertexCount {
        let v = inputBuffer[i + attr.offset]
        outputBuffer[i].position = v * scale + translation
      }
    }
  }

All we would need for that is:

[SR-11087] Add a closure taking API: UnsafeRaw[Mutable][Buffer]Pointer.withMemoryRebound(to:[capacity:])
Add an equvalent API do Data.Foundation

Of course, that only makes sense if the data is laid out correctly for the SIMD type. This is the exactly the sort of problem that Swift guards against by forcing you to explicitly rebind memory if you want to use typed pointers.

As you mentioned later in the thread, you're actually reading an unaligned bytestream. So in your case, it makes no sense to use a typed pointer for the input stream. You should explicitly load the values out of a pointer to raw bytes. Unfortunately, that means using copyBytes, until this small API addition is implemented:

[SR-10273] Add an UnsafeRaw[Buffer]Pointer API for loading and storing unaligned/packed data

Then you can do something like what @John_McCall's posted.

audulus · March 26, 2020, 9:26pm

Thanks for your thoughtful reply @Andrew_Trick.

Seems like this makes things slightly more pleasant:

extension Data {
    
    func read<T>(to: inout T, offset: Int) {
        Swift.withUnsafeMutableBytes(of: &to) { ptr in
            _ = self.copyBytes(to: ptr, from: offset..<offset+MemoryLayout<T>.size)
        }
    }
    
}

so then I can do just:

        // Transform data.
        vertexData.withUnsafeMutableBytes { buf in
            
            let ptr = buf.baseAddress!.assumingMemoryBound(to: MCVertex.self)
            
            for i in 0..<mesh.vertexCount {
                
                var v = MCVertex() // MCVertex is from a C header for compatibility with Metal.
                let start = i*stride+attr.offset
                vdata.read(to: &v, offset: start)
                
                let p = float3(v.position.0, v.position.1, v.position.2) * scale + translation
                
                ptr[i].position.0 = p.x
                ptr[i].position.1 = p.y
                ptr[i].position.2 = p.z
            }
            
        }

audulus · March 26, 2020, 9:36pm

Even better:

extension Data {
    
    func get<T>(value: inout T, offset: Int) {
        Swift.withUnsafeMutableBytes(of: &value) { ptr in
            _ = self.copyBytes(to: ptr, from: offset..<offset+MemoryLayout<T>.size)
        }
    }
    
    mutating func set<T>(value: T, index: Int) {
        
        self.withUnsafeMutableBytes { buf in
            let ptr = buf.baseAddress!.assumingMemoryBound(to: T.self)
            ptr[index] = value
        }
    }
    
}

which allows:

        // Transform data.
        for i in 0..<mesh.vertexCount {
                
            var v = MCVertex()
            let start = i*stride+attr.offset
            vdata.get(value: &v, offset: start)
            
            let p = float3(v.position.0, v.position.1, v.position.2) * scale + translation
            
            vertexData.set(value: MCVertex(position: (p.x, p.y, p.z)), index: i)
        }

Andrew_Trick · March 26, 2020, 9:40pm

Your Data.read extension works. If we actually wanted to add that to Foundation, it would need an "unsafe" prefix ;) Hiding unsafe APIs within safe-looking APIs is generally a concern.

audulus · March 26, 2020, 10:40pm

Perhaps one thing to keep in mind is if you make things too painful, developers will just flee to ObjC(++) for the unsafe code. And then what was the point? Most of my app's core data structures, and geometry processing code is written in C++ in part for that reason (and doubts about Swift performance).

Andrew_Trick · March 26, 2020, 11:02pm

Perhaps one thing to keep in mind is if you make things too painful, developers will just flee to ObjC(++) for the unsafe code. And then what was the point? Most of my app's core data structures, and geometry processing code is written in C++ in part for that reason (and doubts about Swift performance).

I misspoke. I don't want to inflict pain. I want to provide direct feedback when code patterns can't be proven safe by the type system or other diagnostics. The necessary complement is that there needs to be a painless way to do the same thing safely. Type-casting pointers is just inherently dangerous and difficult to check. But I recognize it's hard to avoid when passing data between arbitrary C libraries.

Lantua · March 27, 2020, 12:03am

I don't think that's the core of the issue. You only use assumingMemoryRebound when converting untyped memory into a typed one. And you'd rarely use bindMemory yourself.

Though there are a few conversion pain point when more and more APIs are using BufferPointers but only non-buffered version gets implicit bridging, including Data APIs.

robnik · March 27, 2020, 3:49pm

You forgot about OpaquePointer and Unmanaged<T>. :)

Andrew_Trick · March 27, 2020, 6:58pm

Right. There's an impedance mismatch between new Swift code and existing libraries where we always want Swift APIs to use UnsafeBufferPointers but we're importing standalone UnsafePointers. That adds to the name explosion and difficulty building the right pointer type. Over time, I hope we get more Swift APIs on top of libraries that at least use UnsafeBufferPointers, if not safe types.

benjamin.g · March 29, 2020, 9:22pm

If i may add to the topic, having had to deal with Unsafe Swift apis when working with CoreGraphics PDF APIs :

i like that swift makes it obvious when i'm entering the realm of "unsafe" memory manipulation.
However, i don't really understand what are the benefit the various swift memory-manipulation APIs provide me with over C ptr direct manipulation. The api probably deals with a limited number of orthogonal concerns regarding memory access (copying, read/write permission, memory alignment guarantees, casting to another type or not, etc), but those aren't easy to grasp just by looking at the Apis.
It makes it really hard to understand the tradeoff of using one method vs another. What i end up doing very often, is try to use any combination of them, until it compiles. I doesn't feel really nice, and more importantly, it doesn't increase my confidence that my code is actually doing something sane.

audulus · March 29, 2020, 10:23pm

I guess some of the pointer types are bounds checked in debug mode? Whoop-te-doo.

You'd think actually that bounds checking wouldn't need to exposed to the developer. Just make the pointer types pairs under the hood in debug mode.

But you know, everything has to be solved with more static types. That's the name of the game.

Anecdotal, but when I think of all the bugs I've fixed over the past few years, very few of them have to do with things as mundane as bounds checking on pointers. It's more like race conditions, or GPU details.

Another data point: all the Metal example code is written in ObjC, even new stuff, AFAICT. That's probably because using Swift with Metal is annoying.

lukasa · March 30, 2020, 6:38am

Which specific APIs do you have in mind here? The surface area of the pointer API is large. @audulus seems to have assumed that you were talking about the distinction between buffer and non-buffer pointers, but that was not actually clear to me here.

lukasa · March 30, 2020, 6:41am

This cannot be done. To make pointer types pairs in debug mode you have to know what the length is when the pointer is constructed. That's plenty do-able when the pointer has originated in Swift, but many pointers are vended by C APIs, and those pointers do not know their length.

The real reason buffer pointers have a length, however, is not bounds checking. As you yourself note, that bounds checking is gone in release mode. The reason buffer pointers have lengths is because they are collections. That is, you can implement generic collection algorithms against buffer pointers, as well as use their collection conformance to provide a number of helper operations. For example, creating an Array from a buffer pointer is trivial (Array(bufferPointer)), while doing so from a non-buffer pointer generally is impossible (what's the termination condition?).

audulus · March 30, 2020, 2:54pm

Why not have pointers vended by C APIs have INT_MAX length? After all, that's the best you can do.

Impossible? Really? Array(pointer, count).

Well whatever the real reason for buffer pointer length, the bounds checking in debug is more valuable than the fancy collection stuff, IMO. Bounds checking your pointers in debug is a great feature, provided it doesn't make pointers annoying to use.

lukasa · March 30, 2020, 3:04pm

Not if things are collections. It is senseless to ask for the count of a pointer of unbounded size, it is even more senseless to trivially allow map operations on them. This design works if you want to make the non-buffer pointers bounds checked, when vended from Swift, but given that buffer pointers have a separate reason for existing (i.e. that they're collections) you'd still want to keep that concept around.

What I said was generally impossible, by which I mean "impossible in the general case". That is, given an arbitrary pointer vended from C, there is no unambiguous way to determine what the size of the buffer is. Sometimes you were also given the length by the API. Sometimes you were given the length in bytes, not in elements, and need to do a conversion based on MemoryLayout. Sometimes you have to call strlen or some other length function. And some pointers aren't buffers at all and have no meaningful length, or are of conceptual length "1".

Fundamentally to get a correct implementation of Collection (or of bounds checking) you must know the actual length of the buffer, and Swift cannot know this for you: you need to tell it.

That's great, I'm glad you appreciate the feature. You can have it whenever you want by turning your pointers into buffer pointers.