Pitch: Implicit Pointer Conversion for C Interoperability

Obviously. The key phrase you uttered in another reply seemed to be "Binding memory throws up an optimization barrier.” which has moved my understanding of its operation on a bit. The “strict aliasing rule” is unfortunate in my opinion but we have to live with it. The best description I found was this link from my little book where even then its consequences seem under dispute seeing the colourful language used.

This likely precludes trying to implement your pitch using any universal approach where conversions are expressed in Swift but I'm wondering where will this magic be documented?

Thanks for taking a look @Douglas_Gregor, with respect to extrapolating benchmarks they were also unchanged when I added the declarations for implicit conversions to the source even though they were not used. The bidirectional CGFloat <-> Double conversion already works in the prototype as you can see from the tests as the way it's implemented is a post type-checking operation only applying one conversion at a time. I've looked though the existing conversions you enumerate later and very few could be re-expressed in Swift. My principal interest was simple type conversions but the prototype was able to implement the mutable to immutable conversions at a pinch.

I'd spend more time on it but you say in a later reply you'd be reluctant to put that power in the hands of developers out of bad experience from C++ which is a perfectly valid position. I only wanted to prove that fears about type checker performance shouldn't drive the decision on their own.

3 Likes

This is often suggested, but I'm not aware of a case where this worked out in practice - we usually end up with both the old thing and the new thing (e.g. lazy and property wrappers both).

The reason they don't work out is that the new general features isn't as powerful as the special case things, don't behave exactly the same way (and we can't take a source break), or because no one goes back and cleans things up. Are you aware of any examples where this has worked out?

In this specific case, these types directly interact with other existing implicit conversions around inout. Understanding how these work when starting from a general model is likely to produce a principled and composable solution - implementing more special cases for this isn't likely to be subsumable into a general model later.

Sure. From my perspective, CGFloat conversions were done and justified as a special case because (just like this) there was a specific narrow need for them. Work went into solving that one problem, which introduced a lot of complexity into the compiler (see the post above) as well as compile time issues. These all have nothing to do with conversions being extensible etc, these are a result of how the CGFloat conversions were designed.

I'm observing that we have accumulated a lot of these special cases already (see Doug's summary above), and believe we are at the point where we should define a model for this to guide these things. Unrelated to the language feature, I also think we should have a set of principles for why and when we add one set of conversions but not another (e.g. why isn't String convertible to Substring?). Without that we have a collection of special cases without a unifying theory.

To answer your question, I am arguing (without data, this is just my opinion) that a "zoom out and look at the big picture" will make Swift a better and more consistent language, and that that view is far more important than smoothing a few weird cases in imported C code.

This is also arguably really important to figure out before Swift 6, because it is very unlikely that a new feature is going to be compatible with the existing ad-hoc conversions we have (see above). However, a language break would give us the ability to eat small incompatibilities in the new mode.

-Chris

5 Likes

Yes, this is the important discussion to be had, and this discussion is exactly what I'm encouraging. Your post makes two different arguments 1) implicit conversions are critical for some things, but 2) you don't like implicit conversions. Your conclusion is that they should be hard coded into the compiler as a defense mechanism.

We (folks working on Swift a long time + evolution) have gone around this "policy vs mechanism" wheel several times in the past, e.g. when SE-0195 Dynamic Member Lookup was added, early in the language design when we had (what became) the ExpressibleBy... protocols adoptable by user-defined types, user-inventable operators, emoji identifiers, more recently in the discussion about SE-0302 Sendable where we discussed locking down Sendable conformance to prevent bugs etc. Frequently there is a "fear of abuse" vs "generality" question that comes up - go read the DynamicMemberLookup reviews to see all the concerns of "everyone will start conforming to this and then no one will know what any code does" concerns.

Reasonable people have different opinions, but I have /consistently/ (across the last 11 years, across many language features from the very beginning) been of the opinion that:

  1. "Swift benefits from strong API design guidelines"
  2. We need composable language features with strong designs.
  3. "The API Author Knows Best".

The reason for the first is that you want consistency across the ecosystem (which isn't something that can be mechanically enforced in general). The second is hopefully obvious. The third is because domain-specific APIs have different constraints and the authors of those APIs are the best positioned to understand the needs of their customers. With #1 and #2, those API authors (who are professional software engineers, not children that need to be coddled) can weigh the tradeoffs and make informed decisions.

To make this concrete, come back to the majority of your post outlining how important it is to do bindings to C, and how this is all special. It turns out that C isn't the only language that Swift binds to - PythonKit is also a pretty important thing for a class of users, and it would also benefit from specific adoption of narrow implicit conversions.

I would much rather have a unifying theory and flush all those special cases out of the compiler. Are we going to start adding special case conversions into the compiler for other language bindings?

-Chris

EDIT: Also, your post and those of others make a specific assumption that what we end up with would be C++-like or __conversion like. I'm not sure why it always comes back to this, of course we wouldn't carry forward known mistakes into the future.

9 Likes

I like your explanation. Something along these lines should be in the API docs.

1 Like

I don't think we need to live with strict aliasing in Swift, although we might opt into it in some places for peak performance. The issue is really only forced when working with C APIs. I do think we need to stop using typed pointers in pure Swift APIs. It's easy to build nice APIs for encoding and decoding and to generally reinterpret bytes on top of raw pointers. Designing fully general utilities for those things is challenging and will take a bit more time.

As usual, the proposal itself the authoritative documentation. Of course, we can document it wherever else is appropriate. The special rule is explained in a diagnostic message in case someone tries to do the same conversion in Swift.

1 Like

From what @hborla told me, the reason why lazy is not yet generalized is due to a couple of short-comings of property wrappers which are being addressed:

  • It's not as efficient because the property wrapper has to store the initialization closure in each instance. There is a pitch to make this more efficient (Add shared storage to property wrappers).
  • Allowing lazy as a property wrapper to access the enclosing self. This requires syntax changes which haven't been completely flushed out.

If we are talking about implementation, and not the language then - yes. There are multiple ongoing efforts in the type-checker to do just that: generalize all of the performance hacks, closure, result builder handling etc.

1 Like

Right, I was just explaining why I'm not a fan of "add more special cases with the expectation that a future generalization will subsume them" - even though it is theoretically possible, I haven't seen it work in Swift in practice. It is much better (in my opinion) to start with the general feature and avoid having all the special cases. And yes, I realize the tradeoff here - designing and building general things is more difficult than putting duct tape on specific problem.

Beyond technical issues, there are also very real social and project management issues: there will always be a press for new features and capabilities, and often not enough time to go clean up existing debt. This is unfortunate, but also a reality of software project management. Continuing the example above, if we had property wrappers first, then there would be a specific reason to make them more efficient/powerful to enable the @Lazy modifier.

These are all reasons why Swift has said "no" to key things for years (e.g. concurrency) rather than scattering little things in over the course of the years. Taking time to do something right - even at the expense of having to go without the short-term improvements - works out better in the end, because you can design the systems together with the big picture in mind.

-Chris

9 Likes

I think we should agree to disagree here. There are examples where this approach could be considered successful, like lazy, where adding this feature as a special case improved the language in the short-term and bought more time for generalized design to continue. Also, I personally, and most likely nobody else here, don't disagree about the merits of upfront generalized design, but, in this particular case (as was already mentioned by multiple people), it hasn't been determined yet whether generalization of implicit conversions is something beneficial for the language.

2 Likes

This is a really interesting comment, and something I've wondered about for a long time. I don't feel it's very clear what the role of pointers in Swift actually is - are they only for C interop? If so, why are typed pointers so pervasive (even to pure Swift types which aren't exposed to C), and why is it so difficult for Swift code to reinterpret memory as a different type?

Take @Nevin's example above. I'm not really thrilled that the data passed to the function must be bound to type Float. That might be difficult if the data is provided as raw memory from a 3rd-party library (e.g. read from a file, or from the network). A function can declare that it wants data which can be interpreted as Floats, but binding is much a stronger commitment that requires you to track the provenance of the memory.

What would be better IMO is for a type that means "raw memory interpreted as Floats". Unfortunately, that wouldn't work with existing APIs such as Collection.withContiguousStorageIfAvailable or AccelerateBuffer.withUnsafeBufferPointer, for the same reason that UnsafeRawBufferPointer doesn't implement wCSIA - "raw memory interpreted as X" is not the same thing as "raw memory bound to X". Even when X is UInt8, apparently.

Ideally, if we did introduce a new interpreted-memory type, it would also guarantee the lifetime of the memory it points to.


Also, I don't think it has been discussed whether memory rebinding is safe now that we have language support for concurrency. Imagine that a function using withMemoryRebound is running in parallel with another function, which reads from the same buffer but using its original type. As it is currently documented, I would assume that isn't safe.

If that is the case, would it be unwise to make memory rebinding implicit? Or is it safe because the implicit conversions only apply to memory disappearing in to C-land?

3 Likes

Not all fears-of-abuse are equal. If the cost of abuse is incomprehensible code locally, there's a remedy for that: don't do it where it leads to a lack of clarity. Social pressure will prevent widespread abuse. I think that's the lesson from Dynamic Member Lookup: we don't need to legislate restrictions because good design sense will prevail over time.

However, if the cost of abuse is a fundamentally inefficient compilation model, everyone pays for it and there's no good remedy for that. Perhaps library A and library B both add an implicit conversion, and in isolation both are fine---small bump in compilation time that they don't notice. But a client of A and B starts to hit exponential behavior in the type checker. What's the remedy here, when we've provided a general feature that doesn't scale? The ExpressibleBy... protocols are an interesting historic example here, because their design contributes to worse type-checker performance. The generality of that feature has a cost paid by everyone in terms of build times, but there's no one thing to point to that a user did that admits a localized fix.

I think generalized implicit conversions are in both of the categories above. While we can debate whether the fear of abuse is justified for the former category, it's the latter category that's the larger problem: the presence of even well-intentioned implicit conversions can degrade the overall Swift development experience without any single point of abuse that can be corrected.

I don't know if we have that unifying theory. @johnno1962 implies that the solution he was experimenting with cannot account for many of the existing conversions:

I'm not sure; I think many of the implicit conversions that exist today could be subsumed into a general design.

To be clear, a C++- or __conversion-like design is specifically the one that's being pitched. I think something more akin to subtyping of value types would be a more reasonable design, but it still opens us up to exponential behavior.

Doug

4 Likes

Well said, I agree.

I also agree.

I hear what you're saying, but there is something huge I don't understand. The minimal version of implicit conversions is to allow the existing "conversion mapping" table in the compiler to be extensible by adding attributes. This doesn't change the algorithm at all, but allows us to use the same affordances in a Python binding as we use in the C binding. I suppose you are arguing that people could start adopting these en-mass, but that is where the API standards come in - and abuses would be similar to abuses of DynamicMemberLookup.

More generally, we have an existence proof that user-defined implicit conversions can be efficiently implemented - protocol and class subclasses convert to their base classes. The reason this is efficient is because of the DAG of conversions imposed on this, as well as limitations on when the conversions kick in. Why wouldn't such a model work for implicit conversions?

To be clear, I'm not advocating for this design!

-Chris

2 Likes

This is a really interesting comment, and something I've wondered about for a long time. I don't feel it's very clear what the role of pointers in Swift actually is - are they only for C interop?

In my opinion, typed pointers should over time become residual types almost entirely for C interop.

If so, why are typed pointers so pervasive (even to pure Swift types which aren't exposed to C),

For pure Swift, that's a mistake that needs to be gradually corrected. It's probably because programs are verbatim ported from C, and convenient alternatives are not yet part of the Swift standard library.

and why is it so difficult for Swift code to reinterpret memory as a different type?

Swift makes it easy to reinterpret memory relative to C (or at least it will once we support misalignment). There's no need for type punning. Swift's raw pointers do force you to be explicit at each point that memory is being reinterpreted, which corrects the fundamental mistake in C. But you can easily build abstractions on top of raw pointers that make it type-safe and convenient.

The single-slide example from the WWDC talk:

struct BufferView<Element> : RandomAccessCollection {
  let rawBytes: UnsafeRawBufferPointer
  let count: Int 

  init(reinterpret rawBytes: UnsafeRawBufferPointer, as: Element.Type) {
    self.rawBytes = rawBytes
    self.count = rawBytes.count / MemoryLayout<Element>.stride
    precondition(self.count * MemoryLayout<Element>.stride == rawBytes.count)
    precondition(Int(bitPattern: rawBytes.baseAddress).isMultiple(of: MemoryLayout<Element>.alignment))
  }

  public var startIndex: Int { 0 }

  public var endIndex: Int { count }

  subscript(index: Int) -> Element {
    rawBytes.load(fromByteOffset: index * MemoryLayout<Element>.stride, as: Element.self)
  }
}

Before officially proposing such a type, we need to consider broader issues of memory ownership and its relationship with existing types and possibly new "buffer" types.

I'm not really thrilled that the data passed to the function must be bound to type Float. That might be difficult if the data is provided as raw memory from a 3rd-party library (e.g. read from a file, or from the network). A function can declare that it wants data which can be interpreted as Floats, but binding is much a stronger commitment that requires you to track the provenance of the memory.

This is a legacy C API problem. Definitely should never happen in pure Swift code.

What would be better IMO is for a type that means "raw memory interpreted as Floats". Unfortunately, that wouldn't work with existing APIs such as Collection.withContiguousStorageIfAvailable or AccelerateBuffer.withUnsafeBufferPointer, for the same reason that UnsafeRawBufferPointer doesn't implement wCSIA - "raw memory interpreted as X" is not the same thing as "raw memory bound to X". Even when X is UInt8, apparently.

The standard library's performance hooks are incompatible with untyped byte buffers. We'll propose an UnsafeRawPointer.withMemoryRebound as a work around. Although, ideally future APIs will be based on a more general BufferView type.

Ideally, if we did introduce a new interpreted-memory type, it would also guarantee the lifetime of the memory it points to.

Exactly. That's one of the reasons it's not part of the standard library yet. But this is a discussion for an upcoming proposal.

Also, I don't think it has been discussed whether memory rebinding is safe now that we have language support for concurrency. Imagine that a function using withMemoryRebound is running in parallel with another function, which reads from the same buffer but using its original type. As it is currently documented, I would assume that isn't safe.

In the context of strict aliasing it is safe. It may be unsafe for other reasons of course: race conditions, exclusive access if the memory is associated with a variable, value type safety (as opposed to pointer type safety).

If that is the case, would it be unwise to make memory rebinding implicit? Or is it safe because the implicit conversions only apply to memory disappearing in to C-land?

"Implicit" rebinding only happens at the end of the withMemoryRebound closure's scope. The only rule is that the same program thread can only access that memory with the pointer type that was passed to the closure. Essentially the same rule as any other closure-taking function that vends a pointer.

7 Likes

This discussion is obviously important and productive. But I'm perplexed by why this, of all proposals, is being used as a battleground for a the architecture of the type system and debate over a potential major new language feature.

This fixes a serious bug that's been known to block Swift adoption for years. I've been advocating this solution all along with no serious objection until now.

There's no valid alternative for programmers. This isn't about syntactic sugar to make regular Swift programmers happier. It's about language adoption for new Swift projects.

The implementation is obviously straightforward and self-contained. It does not introduce any measurable cost (a single check on the AST node). It does not complicate unrelated code or create any maintenance burden.

This does not introduce any new language rules. It merely exposes C language rules to the Swift compiler when invoking C.

This does not introduce any new implicit conversions in Swift. This fixes the existing UnsafePointer argument conversion to correctly apply Swift rules to Swift functions and C rules to C functions.

Unlike the proposal where this battle started, this does not propose generalizing existing implicit conversions to operators: Automatic Mutable Pointer Conversion - #32 by johnno1962

Arguments about social engineering are relevant to many things that have happened during Swift development, but they don't hold any water when comparing a real, visceral problem for Swift adopters to a hypothetical, theoretical gain for language maintainers.

17 Likes

For what it's worth, I think this pitch as it stands is definitely valuable and is distinct from the implicit conversion issue. I've been guilty of misuing assumingMemoryBound myself when calling C APIs, so this makes sense as both a quality-of-life and a correctness fix.

As described in the other thread, I would find use in implicit conversions via value subtyping, but I think it would be harmful to Swift code to allow these specific conversions to be implicit when not calling into C.

1 Like

While complexity is subjective, the claim that the CGFloat conversion introduced a "lot of complexity" into the compiler can reasonably be described as untrue, and the post you linked doesn't suggest it.

Pavel does mention that naively adding bidirectional conversions between CGFloat and Double led to ambiguities. In the end, what unlocked CGFloat was some general improvement to the expression checker, that meant it could handle a new narrow conversion because overall performance had improved, plus some careful thought about exactly when and how to do the CGFloat conversion.

The result was a feature that was implementable and worth the time, in a way generalized conversion may well not be (as the latter part of Pavel’s post describes). But saying that the CGFloat work “got us into trouble” is a misinterpretation of Pavel’s comment. It’s also inaccurate to describe the “design” of the feature as leading to problems. Bidirectional conversion was part of the requirement, not the design. It’s fine to say a general conversion mechanism shouldn’t fulfill this requirement – but then it is also insufficient to replace all the existing conversions. It’s also fine to think the requirement wasn’t justified… but that would be relitigating SE-0307.

A grand unifying theory of conversions feels very unlikely, because these conversions are bespoke in order to fulfill specific goals. A subtyping design wouldn't achieve the requirement for CGFloat conversion to be bidirectional. The Objective-C-motivated conversions implement specific reinterpret casts for arrays of classes that probably don’t generalize. Pointer conversions have lifetime considerations. These are all important features that we cannot ignore in pursuit of neatening up the compiler (a goal that people actively working on the compiler are not calling for).

The main upshot of a general solution would be be misuse, as far as I can tell, with little evidence it would bring some big win for new language interoperability. That's very different from a language feature with a clear goal but the potential for misuse, as was the case with dynamic method invocation.

IMO the next step for this discussion is for those advocating for this kind of feature to go off and design it, demonstrating the improvements to the language it would bring and how the existing conversion mechanisms could be replaced by it (or not), and then come back to the forums and pitch it. What’s not reasonable is holding other pitches that are in line with previous direction of the language hostage to the possibility of such a design.

8 Likes

Since C pointers aren't going anywhere, and seamless support for interoperating with C is a tentpole feature of Swift, I think the pitched feature stands on its own.

I too would love to explore whether custom value subtyping which would allow extending Optional's magic to Result, for example, can be feasibly done--but that is a separate topic. By contrast, if I understand correctly, the pointer conversions detailed here are the fullest expression of what is intended to be supported in Swift and not some temporary kludge.

12 Likes

I may have been unnecessarily glib about the prospects there. Certainly a wide range of conversions can be expressed as an "implicit initialiser". It's just a matter how complicated the matcher for selecting presence of the right constructor gets which happens here. That said, with certain types of pointer conversions it also may not be possible to express correctly the lifetime of the pointer and, due to my specific implementation conversions do not cascade which may not match the semantics of the hard-coded versions.

Rather than clog up Andrew's thread any more I've re-iterated in one place all my conclusions from looking at a potential "custom implicit conversions" design here. I'd be prepared to look at it further but I'd want to see more in the way of buy-in from a member of the compiler team.

@Andrew_Trick, perhaps you could look at adding the mutable to immutable conversions to your proposal and implementation as it's very much in the same neighbourhood of "improvements to C interoperability" even though they are not essential. We could then kill two birds with one stone and unblock your pitch (the conversions for which can't entirely be represented in Swift) and spend more time looking at custom implicit conversions.

Yes, implicit pointer conversions have always supported this (unless I misunderstand), and the new conversion rules need to be consistent with the existing behavior.

In the Detailed Design, each row looks something like this:
UnsafeMutablePointer<T> -> Unsafe[Mutable]Pointer<[U]Int8>

That indicates that mutable pointers can always be passed as immutable arguments.

The test cases verify that UnsafeMutablePointers can be passed to const pointer arguments:
https://github.com/apple/swift/pull/37956/commits/c45301078a450152334a538e459b532786ef486b#diff-aa384842f629f2d3f587c514adb333fd5103d21e5c2c3e6d726d183a2051f453R154

3 Likes

I have started a topic in Evolution/Discussion to create a space to discuss generalization of implicit conversions - Generalization of Implicit Conversions.

3 Likes