Revisiting the pointer conversions

John_McCall · January 9, 2019, 7:44am

Swift currently has special-case conversions to turn various kinds of arguments into various pointer types. All of these conversions are only allowed in arguments; however, some are only allowed in ordinary call arguments, while others are also allowed as operator arguments. Some of these conversions require the & argument decoration. All of these conversions also permit the parameter type to have a single level of optionality.

In this discussion, I will use the the following abbreviations:

UP == UnsafePointer<T>
UMP == UnsafeMutablePointer<T>
AUMP == AutoreleasingUnsafeMutablePointer<T>
URP == UnsafeRawPointer
UMRP == UnsafeMutableRawPointer

Here are the current rules:

A mutable T l-value can be passed as an UP, UMP, AUMP, URP, or UMRP. This requires & and is allowed for operators.
A mutable [T] l-value can be passed as an UP, UMP, URP, or UMRP. This requires & and is allowed for operators.
An UMP can be passed as an UP, UMP, URP, or UMRP. This does not allow & and is not allowed for operators.
An UP can be passed as an UP or URP.
An AUMP can be passed as an UP or URP.
A [T] can be passed as an UP or URP. This does not allow & and is not allowed for operators.
A String value can be passed as an UnsafePointer<Int8>, UnsafePointer<UInt8>, UnsafePointer<Void>, or URP. This does not allow & and is not allowed for operators.
All of the pointer-to-pointer conversions to UP or UMP allow the argument type to change to Void.

Here are some reasons why this isn't great:

Allowing an implicit conversion from Array/String to a pointer is quite out of keeping with Swift's normal strong-typing rules. We've seen program bugs and memory corruption in the wild caused by unintended use of this feature.
All of these conversions have to be hard-coded in the compiler; it is not a generalizable feature.
There are some glaring omissions from the conversions: e.g. there are no conversions from Array/String to the bounded Unsafe{Mutable}BufferPointer types, just to the unbounded pointer types.
Doing any of these conversions — or similar operations that are not in the hard-coded set — explicitly rather than relying on the compiler magic is extremely awkward.
Most of the conversions need to be "scoped" to an immediate access and so make sense as argument-only conversions, but the pointer-to-pointer conversions are pretty much just subtyping rules.
The use of & is strange:
- It's inconsistent with the use of & for inout because the resulting parameter isn't necessarily a mutable pointer.
- It's inconsistent with the legacy of & from C because there's a serious conversion going on with e.g. arrays and strings.
- The & is required to get a non-mutable pointer from a mutable l-value but not required to get a non-mutable pointer from an array.

Here are some assorted ideas that I'd love to see someone pick up:

The implicit conversion from Array/String should be deprecated in favor of something that requires an explicit syntactic element (even if it's just &).
The pointer-to-pointer conversions should be allowed in arbitrary places that permit conversions, not just arguments.
(Credit to Joe Groff) There should be some generalized way to pass a "scoped" argument: we do some operation to produce a value, then clean up after that after the call returns. This is very analogous to what happens with using statements in languages like C#, and perhaps there should be a statement version of this as well.

This feels essentially coroutine-ish and could probably be reasonably built on top of the technical infrastructure created for generalized accessors. This raises the question of whether we could actually just make these storage accesses of some sort, but I don't think that would be a good fit.

A sufficiently useful general feature here could potentially be used to deprecate all of these conversions (except maybe the pointer-to-pointer conversions).

Note that "deprecation" would of course just mean that the conversions are disallowed in some future language revision; programs compiled for Swift versions <= 5 would have to continue to work.

DevAndArtist · January 9, 2019, 10:13am

My experience with the Unsafe type family is very low, but I would like to mention that instead of ˋ&ˋ prefix I personally would prefer something different to avoid the ambiguity with ˋinoutˋ and ˋyieldˋ. Instead how about a ˋ*ˋ prefix for explicit pointer type conversions?

We could have multiple global ˋ*ˋ prefix functions such as:

prefix func * <T>(value: T) -> UnsafePointer<T>
// etc.

Or something that has a trailing closure instead to ensure that we use the pointer in an explicit scope. This will allow us to write something like yield &(*value) which will yield a pointer to some value.

johannesweiss · January 9, 2019, 11:19am

Hi John,

Thanks for bringing this up. I have hit pretty most of these annoyances and especially ...

... this point alone would be good enough for me to give a huge +1 here.

nuclearace · January 9, 2019, 12:06pm

I'm totally for this idea. Half of the time I forget that these implicit conversions exist for some types, and end up scrambling for 10 minutes trying to figure out the best way of transforming the value into the expected pointer type.

I see a lot of people bringing up the conflation of & to refer to both inout arguments and explicit pointer conversion but I don't quite understand the issue with this. Yes it means that from the call site your're not entirely sure what kind of conversion is happening, but is that really actively harmful? The biggest issue I see with it is (which also ties into the recent thread about allowing UnsafePointer conversions to let values) is that you can't tell the difference between a conversion to a mutable pointer vs one that isn't mutable. So in that regard, I wouldn't mind a new bit of syntax to differentiate between an inout/UnsafeMutablePointer conversion and one to UnsafePointer.

But beyond that, I generally view & for inout and explicit pointer conversions as roughly meaning, "Okay, this parameter being passed in could be changed by the callee, but I don't know how exactly it'll happen". This is probably a bad mental model, and if it is, someone please call me out on it and explain yours.

benrimmington · January 9, 2019, 12:14pm

The _convertConstStringToUTF8PointerArgument function currently creates a temporary array. If the argument is a string literal, users might expect it to behave like a static string. This should be possible with the new String ABI.

Should string literal arguments be exempted from the explicit & syntax?

Torust · January 9, 2019, 12:40pm

I’d be in favour of ^ as a sufficiently-pointer explicit to-pointer operator subsuming all existing explicit and implicit conversions. I’d worry * would look too much like a dereference to people used to C.

In terms of the coroutine-type thing: the way that’s generally currently done in Swift is with nested blocks which have a type signature like:

func someFunc<T, R>(argA: A, argB: B, perform: (T) -> R) -> R

One possible solution might be to allow a call to a function that returns R and takes T as an argument to be spelt like:

someOtherFunc(arg: inplace someFunc(argA: a, argB: b))

as syntactic shorthand for:

someFunc(argA: a, argB: b) { return someOtherFunc(arg: $0) }

That would enable:

doSomething(ptrArg: inplace withUnsafePointer(to: someValue))

and ^ could be additional shorthand for inplace withUnsafePointer and its variants.

xwu · January 9, 2019, 10:24pm

To push back, however, while the feature is out of keeping with a lot of Swift, I do have to say that from a pragmatic point of view it's made interacting with C APIs much nicer.

If there were some way of coupling the removal of implicit conversion with (a) a feature analogous to @autoclosure for conversions from Array or String; and (b) a way to import C APIs so that parameters automatically use that feature, then we could recover some of that pragmatic "niceness" while getting rid of an implicit special-case rule.

jrose · January 9, 2019, 10:30pm

Yeah, I have to agree with Xiaodi and register my concerns about this approach. Being able to pass string literals to C APIs is incredibly useful; being able to do it with other strings is still pretty useful in practice. For arrays, the main benefit is telling people to just use Array to manage memory in Swift. Yes, both the Array and String dynamic cases could be done using explicit closure scoping (withUnsafeBufferPointer and withCString), but that seems way heavier.

(Also, using & in particular for something declared using let seems very fishy to me.)

xwu · January 9, 2019, 10:32pm

May I propose that we postpone discussing the merits of & versus * versus ^ for the moment to discuss the larger issue? I think the spelling here is largely besides the point, although if we make any change it will be a salient topic at that time.

John_McCall · January 9, 2019, 10:32pm

Well, part of what I’d like here for sure is something that feels less cumbersome than withUnsafePointer. I’m not suggesting deprecation with no effective replacement.

Joe_Groff · January 9, 2019, 10:52pm

withUnsafePointer is also problematic going forward to a world with more coroutines, since the closure body is a separate function, and you can't yield out of it, even though it would be very useful:

extension Array {
  // in some future language version with generators…
  func generate() yields Element {
    self.withUnsafeBufferPointer {
      for i in 0..<count {
        yield $0[i] // oops, we can't yield out of the closure, it's not a coroutine
      }
    }
  }
}

It would be nice to be able to have accessor-like coroutines that work with a using-style block, or maybe some lighter-weight syntax as well, for this sort of use case.

Torust · January 9, 2019, 11:27pm

Would it be possible to extend some closures with a “you can yield out of this” attribute? Since trailing closures are the current preferred method of providing scoped access, having some lightweight affordances to prevent nesting (e.g. using in argument position with scope extended to the enclosing function call) seems like it would get us most of the way there.

From what I understand, moveonly types combined with endLifetime calls should provide scoped access in many use cases. The ones that it doesn’t work for are closures that take an argument inout or otherwise require exclusive access, and in that case I don’t personally see a better way of signifying that the inout argument can not be accessed during the closure other than the scope block already enforced by the closure.

jrose · January 9, 2019, 11:30pm

I think it's (arguably) safe to yield out of any noescape closure, as long as you don't violate exclusivity when you do it. But I don't know how hard that is to implement.

John_McCall · January 10, 2019, 12:00am

Well, we could create a new stack to call the function which takes the closure, or we could redesign the implementation of coroutines. But neither of these would allow us to make easy statements about how many times the yield occurred.