Automatic Mutable Pointer Conversion

johnno1962 · June 6, 2021, 5:46pm

Hi S/E,

T'was the night before dub dub... can't await myself but in the meantime here's a new pitch for y'all:

I tend to work a lot with Unsafe pointers in Swift and there is one rough edge I bump into frequently I feel we could round off in the compiler even if it is a comparatively niche requirement. Hence:

Automatic Mutable Pointer Conversion

Proposal: SE-NNNN
Authors: John Holdsworth
Review Manager: TBD
Status: Awaiting pitch

During the review process, add the following fields as needed:

Implementation: apple/swift#37214
Decision Notes: Rationale, Additional Commentary
Bugs: SR-14511
Previous Revision: 1
Previous Proposal: SE-XXXX

Introduction

This proposal adds automatic conversion from mutable unsafe pointers to their immutable counterparts, reducing unnecessary casting when feeding mutable memory to APIs assuming immutable access. This is most common in, but not exclusive to, C-sourced APIs.

Swift-evolution thread: Pitch: Automatic Mutable Pointer Conversion

Motivation

In C, you may pass mutable pointers (specifically, void *, Type *) to calls expecting immutable pointers (const void *, const Type *). This access is safe and conventional as immutable access to a pointer's memory can be safely assumed when you have mutable access. The same reasoning holds true for Swift but no such implicit cast to immutable counterparts exists. Instead, you must explicitly cast mutable unsafe pointer types (UnsafeMutableRawPointer and UnsafeMutablePointer<Type>) to immutable versions (UnsafeRawPointer and UnsafePointer<Type>). This adds unneeded friction when passing mutable pointers to C functions or comparing pointers of differing mutability.

This friction is most commonly encountered when working with C-sourced APIS, where pointer mutability is not always consistent. Consider the following code, which segments lines in a String using the strchr() function, which accepts an immutable pointer as an input and returns a mutable pointer:

print("""
    line one
    line two
    line three

    """.withCString {
    bytes in
    var bytes = bytes // immutable
    var out = [String]()
    while let nextNewline = // mutable
        strchr(bytes, Int32(UInt8(ascii: "\n"))) {
        out.append(String(data:
            Data(bytes: UnsafePointer<Int8>(bytes),
                 count: UnsafePointer<Int8>(
                    nextNewline)-bytes),
                          encoding: .utf8) ?? "")
        bytes = UnsafePointer<Int8>(nextNewline) + 1
    }
    return out
})

In the preceding example, an unfortunate choice on the part of C API mutability requires a cascade of Swift language conversions. While this example slightly pushes the issue — it would be better to convert the pointer once and earlier — the conversions shouldn't really be necessary at all. Safe interaction between the two languages should be fluid, with minimal overhead for what seems to be unnecessary type safety bookkeeping. Swift has a history of allowing language tuning to reduce exactly this kind of friction.

Proposed solution

This proposal introduces a one-direction automatic conversion from mutable raw- and pointee-typed pointers to their immutable counterparts, allowing developers to supply a mutable pointer wherever an immutable pointer is expected as an argument. Consequently, it will also allow mixed mutability in pointer comparisons and pointer arithmetic via their overloaded operators.

Detailed design

We have prepared a small PR on the Swift compiler. This patch adds a "fix-up" applied after type checking using the existing intrinsic pointer-to-pointer conversion. This patch should not slow down type checking, the most commonly cited reservation for conversions. In our initial tests with compiler benchmarks, we've found an 8% improvement in run-time performance when initializing String from Data.

Source compatibility

This change is purely additive and will facilitate writing simpler code that would previously not compile. The change does not invalidate existing code, as tested by running the source compatibility suite.

Effect on ABI stability

Not applicable, this is source level change.

Effect on API resilience

Not applicable, this is source level change.

Alternatives considered

Continuing to have to apply conversions in code.

Acknowledgments

The Swift language.

Unfortunately I've been unable to prove at this stage that adding such a conversion does not slow the compiler down as I can't believe the results of the compiler benchmarks you'll see against the PR. The benchmarks inflate all metrics for Debug builds by 100% and Release benchmarks by 50% which is simply not credible given the scale of the change. Surprisingly, there actually seems to be a run-time benefit to certain String initialisers when the patch is applied,

Anyway, I'll float this now to see if there is support or any reasoned objections why this change wouldn't be a good idea.

Cheers.

jrose · June 6, 2021, 8:14pm

Swift does have this feature for function call arguments, implemented with the same logic as inout-to-pointer and array-to-pointer conversions. It doesn't have it for operators, though, which could be useful, or for direct assignments. (You also don't need to repeat the type, but that relies on knowing that there's no initializer to convert between pointers of different types.)

import Foundation

print("""
    line one
    line two
    line three

    """.withCString {
    bytes in
    var bytes = bytes // immutable
    var out = [String]()
    while let nextNewline = // mutable
        strchr(bytes, Int32(UInt8(ascii: "\n"))) {
        out.append(String(data:
            Data(bytes: bytes,
                 count: UnsafePointer(nextNewline)-bytes),
                          encoding: .utf8) ?? "")
        bytes = UnsafePointer(nextNewline) + 1
    }
    return out
})

johnno1962 · June 14, 2021, 7:23am

Thanks @jrose, I hadn't realised the conversion for arguments already exists. Does the patch look OK to you? Are your expert eyes able to shed any light on why the metrics of the compiler benchmarks run against the PR were so terrible?

jrose · June 14, 2021, 4:27pm

Sorry, it’s been a long time since I worked on the type checker. I don’t think DeepEquality is the place to make this change, and that could be part of the problem, but really I think you’re right that it’d be worth re-running the benchmarks to see if it’s reproducible.

johnno1962 · July 21, 2021, 12:01pm

I've been singularly unlucky testing the PR for this pitch, first giving results that seemed hard to believe then the second time testing failing to even check out the source code but there you go.

Is there any appetite for something that came up in comments from @Chris_Lattner3 against the evolution PR i.e to bring back a feature along the lines of __conversion() functions which were supported in the Swift 1 betas but never made it to the initial release. They worked by allowing developer to define any number of _conversion() functions on a type that were overloaded by return type and provided conversions of that type to say, Bool or CGFloat. The advantage being that this is done at the library level empowering Swift developers rather than requiring targeted tweaks to the compiler source which isn't really sustainable in the long term and contaminates the compiler source with knowledge of library types.

Perhaps the feature was removed because the concern was it opens the door to slowing down type checking but could we can avoid this by only applying these conversions at the "fix-up" phase as in the patch I put forward? If I understand type checking correctly, fix-ups are performed after type checking has performed the time consuming operation of finding out how most of the types in an expression fit together and is looking for some known conversions to bridge gaps.

Any takers?

Chris_Lattner3 · July 25, 2021, 5:32pm

The __conversion feature in Swift 1 days wasn't very well considered or designed. It did make the constraint solver a lot more complicated and slow, and we had a bunch of other larger problems to deal with back then (e.g. introducing error handling and protocol extensions :-)

I think we should reopen this discussion with a first principles design on this. The constraint solver is growing new special cases (most recently for CGFloat) and it would be great to generalize that logic into something that is first class and accessible to library developers. I think we should do something much more constrained than __conversion though - e.g. we could force a DAG of type conversions instead of allowing cyclic ones.

-Chris

technogen · July 25, 2021, 6:59pm

I’d love to see this happen! Sometimes, I feel the need for lightweight stateful polymorphism without having to pay for a class allocation, existential containers or worse yet, dragging around objc runtime capabilities. These use cases are best handled by C++ for now and I ‘d really love to see Swift take that niche away from C++

Ben_Cohen · July 27, 2021, 10:38pm

The core team acceptance of the CGFloat proposal was pretty clear about future directions:

The Core Team feels that implicit conversions, as proposed in SE-0307, can be wielded sparingly to solve narrowly focused problems that arise from platform compatibility and interoperability. In this case, such a solution can be justified when the developer's cognitive load is a net positive by removing a significant point of friction when writing code for frequently used APIs. In this case, we can understand the implications of the implicit conversion on the compiler's performance. Finally, it solves an apparent standing problem for which alternative solutions cannot sufficiently address.
[...]
The Core Team still believes that generalizing support for implicit conversions would be harmful to the language. Implicit conversions, in general, would make code difficult to reason about and compromise the compiler's reliability and performance to infer types.

Applying that guidance here: a narrow fix for mutable to immutable pointers is worth considering (so long as the case made is strong) and push back in favor of a more general solution isn't appropriate.

I think the case is strong. It's not as common as CGFloat, but it's fairly common and very painful/confusing for users when it does occur, and the conversion is similarly easy to understand. Unlike CGFloat the conversion is unidirectional, and doesn't tend to involve literals, so is likely to be something that can be confirmed safe from a typechecker performance perspective.

tclementdev · July 27, 2021, 10:57pm

What about UnsafeMutableRawBufferPointer to UnsafeRawBufferPointer? Shouldn't it be part of this as well?

Torust · July 27, 2021, 11:36pm

I also think a DAG-based solution is worth considering, since implicit conversions are useful whenever we have a subtyping relationship that isn't otherwise expressible by the language – the reason this would be useful for UnsafeMutablePointer -> UnsafePointer is not exclusive to those types.

As @technogen said, there are many uses for this, but to add a more specific example: my most common use case for this would be with optimised custom existential wrappers. I've written multiple types where I've represented the protocol existential in a more optimised form as a separate type – for example, a struct containing the handle for integer-handle-based types, or a class-based type-erasing wrapper for a Hashable-constrained type that deduplicates on init in another instance. Not being able to use as? casts in particular has caused many hard-to-find bugs for me, and I think implicit conversions in those specific cases would result in fewer bugs and more legible code.

technogen · July 28, 2021, 6:08am

It would also be very useful when dealing with low-level, realtime, and/or system code, where heap allocations are prohibited, making the use of classes and protocols unviable. In these cases, the only type of polymorphism would be through an enum with associated values, which is not extensible at all. C++ provides light-weight value-type polymorphism by way of (among other things) structure inheritance. Performance-critical code would be stuck with a large number of conversions function calls everywhere, which would be a huge pain and reduce the code to borderline-unreadable state.

spadafiva · July 28, 2021, 1:43pm

I've been following along this thread for a bit, but wasn't sure what a DAG was?

Zhu_Shengqi · July 28, 2021, 1:47pm

I think DAG stands for directed acyclic graph.

codafi · July 28, 2021, 5:58pm

I want to approach my strong opposition to the idea of generalizing support for user-defined conversions (especially chains of user-defined conversions) from three angles:

Algorithmic Complexity

The constraint solver embedded at the heart of the expression checker is bound to a worst-case exponential-ish time decision procedure precisely because of disjunctive forms like the kind introduced here. We must now consider, at each call site without a direct match, the complete set of applicable user-defined conversions, split any existing disjuncts, and solve. For chains of user-defined operators - rather than pursuing a C++-style “one conversion” rule, this problem becomes (literally) exponentially more difficult. The disjuncts, themselves, become subject to further disjunctions as chains of conversions are traversed. Disqualifying cyclic chains of conversions is not going to improve any of this (we must still detect those cycles anyways…)

A language feature that serves to complicate the typing rules in this manner must justify this leap in algorithmic complexity with a similar leap in functionality and quality of life improvements. I simply do not see a path forward where that standard applies to this feature.

Ergonomics over Function

What is the savings for a user-defined conversion? At the point of definition, you are still writing as much code as you would for a (convenience) init. At the point of call, you save the use of a (set of) constructor forms. A commonly-cited example is in numeric types where value-preserving conversions to higher-bitwidths are expressed over and over again to keep the type checker happy. Such conversions have a place in the language: they’re safe, they’re pure, they’re common, they’re noisy. But user-defined conversions can be none of these in practice. I’ve seen conversion operators that allow accidental mixed comparisons of typed data, operators that execute effects as part of a DSL, unsafe operators added purely for convenience. These arbitrary effects become silently inserted at the point of use.

We have to remember, too, that the cost of convenience is steep:

Impact on Readability over Writability

Implicit conversions of all kinds absolutely destroy readability. C++, with all its restrictions and formalisms, is an exemplar here: I cannot open C++ code outside of an IDE and have any idea about how the flow of data is derived from the flow of types. Without a full picture of all user-defined conversions in scope, I have no hope of being able to try. I write type checkers as a hobby, and I cannot remember the rules backing C++’s conversions and their cross-cutting interactions with the rest of the subsystems in their type system. I do not wish for Swift users to have to become human type checkers either.

What’s worse, a number of extremely surprising edge cases in the semantics of the language WRT its interactions with conversions become not just extremely common, but sources of actual peril. Consider my favorite example - what does this program do?

int main() {
    const char *x = "Hello world!";
    std::vector<std::string> v{{x, x}};
}

x is an iterator - not a candidate for std::string conversion, the begin/end iterator constructor is selected, and the vector is constructed to point into garbage.

Torust · July 28, 2021, 11:00pm

While it's true that full implicit conversions suffer from those pitfalls, I don't think that needs to exclude a narrower solution specifically for subtyping. In Swift, we already allow implicit conversions from a variable to an existential whose protocol it conforms to, or from a class to a superclass, or even from T to T?. For example, we allow this:

protocol UnsafePointerProtocol {}
extension UnsafeMutablePointer: UnsafePointerProtocol {}
let pointer: UnsafePointerProtocol = UnsafeMutablePointer<Int>.allocate(capacity: 1)

and equally, if UnsafePointer were a class, we'd allow:

class UnsafeMutablePointer: UnsafePointer {}
let pointer: UnsafePointer = UnsafeMutablePointer<Int>.allocate(capacity: 1)

Let's say that we hypothetically allowed something like:

struct UnsafeMutablePointer<Pointee>: @subtypeConvertible UnsafePointer<Pointee> {
    public func upcast() -> UnsafePointer<Pointee> {
        return UnsafePointer(self)
    }
}

I may be missing something, but I don't see how that's anything more harmful than the subtyping relationships that are already in the language. The issue I do see is that downcasts would not work as you'd expect; e.g.:

let pointer: UnsafePointer<Int> = UnsafeMutablePointer<Int>.allocate(capacity: 1)
let downcastPointer: UnsafeMutablePointer<Int>? = pointer as? UnsafeMutablePointer<Int> // nil

I don't know how to appropriately solve that for the UnsafePointer case, since it's not safe to convert an immutable pointer to a mutable pointer in general – I think the only reasonable answer is for that cast to always fail. For other cases, though, if type-checker and runtime performance allows it, we could optionally allow something like the following:

enum GPUResource {
    case texture(GPUTexture)
}

struct GPUTexture : @subtypeConvertible GPUResource {
    // required
    public func upcast() -> GPUResource {
        return .texture(self)
    }
 
    // optional, only used for `as?` casts
    public init?(downcastingFrom superType: GPUResource) -> GPUTexture? {
        switch superType { 
        case .texture(let texture):
            return texture
        default:
            return nil
        }
    }
}

nikitamounier · July 29, 2021, 3:40am

I agree, user-defined conversions shouldn’t be a thing. We lose our ability to exercise local reasoning, which is one of the foundational pillars of Swift and one of the reasons for which we have value types and let bindings, and instead fall into the world of global reasoning, since we wouldn’t know in which file some contributor added some conversion.

hborla · July 29, 2021, 9:52pm

I want to emphasize this. Personally, I think the additional axis of exponential search space is an unacceptable consequence of user-defined implicit conversions. It's important to realize that arbitrary implicit conversions would undermine many of the pruning heuristics that are in place in the constraint solver today to make simple cases fast.

For example, if there's no possibility that implicit conversions can add conformances to an argument (which is possible with an implicit conversion to a value type with different conformances, and is very rare today), conformance constraints on a parameter type can be transferred to the argument type directly, which often allows the constraint solver to prune search paths early. This heuristic allows the solver to fail before a generic argument type is attempted for generic overloads that "obviously" aren't satisfied by the given argument types. Generic arguments are only bound once the solver has attempted all disjunctions. So, in a case with several generic operators chained together, e.g. a + b + c + d, the solver has to first bind each + to an overload before any generic arguments are attempted, because only then does the solver have the complete set of possible bindings for those generic arguments. In the worst case, the solver attempts all permutations with repetition of the overloads. Without this pruning heuristic, expressions that use generic operators chained together (among other kinds of expressions) are subject to worst-case exponential behavior.

This is just one example. There are others. Another big issue is introducing more ambiguities. To resolve an ambiguity, you need to write explicit type information anyway.

Perhaps there are ways to make the constraint solver performance more immune to implicit conversions, but there is a lot of engineering work to be done before we get there. I don't think implicit conversions are feasible to implement with tolerable, let alone good type checker performance today.

Torust · July 29, 2021, 10:29pm

If we were to enforce true subtyping – i.e. UnsafeMutablePointer<Pointee>: @subtypeConvertible UnsafePointer<Pointee> meaning that UnsafeMutablePointer inherits all of the conformances of UnsafePointer (and possibly conforms by default by upcast()ing to UnsafePointer) – how many of these issues still apply? Are there still more ambiguities introduced?
Likewise, would implementing integer widening by having a chain of UInt16: @subtypeConvertible UInt8, UInt32: @subtypeConvertible UInt16 etc. have a major impact on type-checker performance?

Chris_Lattner3 · July 30, 2021, 4:54pm

Right, Robert also makes a similar argument and points to challenges with C++ implicit conversions for example. I think we can all see the dangers of a poor design, and those of us who lived through the __conversion thing early on in Swift don't want to relive that.

However, I'm not suggesting we add C++ implicit conversions or __conversion to Swift. I'm observing several things:

Swift has fully general user-defined conversions already in the form of protocol and class inheritance. These implicitly impose a DAG-based structure to the subtyping problem, among other restrictions.
Swift already has a bunch of special case implicit conversions, e.g. T to T?, the CGFloat ones, and several others. We cannot remove these for source compatibility reasons.
Because we have some of these in the language, people continue pointing out other cases where subtype relationships are useful to model, e.g. the latest is unsafe pointers.
No one (that I'm aware of) has done a systemic study of what stdlib types should be implicitly convertible, and we don't have a principle guiding this. Why should UnsafePointer be implicitly convertible, but Int8 shouldn't convert to Int? Both are subtypes, so what should our principle be?
One of the founding ideas of Swift is that much of the language is "in the library", and the reason for that is we want an expressive ecosystem of APIs that feel fluent and natural (e.g. all of "Float", "Complex" and "Quaternion" should be able to have consistent design approaches even if they are at different levels of the library stack). Continuing to privilege a few specific APIs like UnsafePointer breaks that.
There are other use-cases we should address outside direct use by familiar APIs, e.g. in bridging of other languages, C++ interop, and more.

If Swift had a huge user base and did so without implicit ad-hoc conversions yet, then we'd have a strong argument that we don't them in the language. However, (for better or worse), we already have them, our users expect to not write silly casts in some cases, and we have people asking to add more special cases to the type checker. This doesn't make it simpler.

I see several possibilities looking ahead:

We could draw the line where it is now forever and come up with reasons why CGFloat is ok but nothing else is.
We could continue adding specific things like UnsafePointer in the standard library, building in more special cases into the constraint solver endlessly.
We could provide a more general solution so this is extensible in a controlled way, and potentially subsume a bunch of the existing complexity in the type checker (the special cases, but also possibly other things like the magic argument promotions) into a more principled framework.

I think that approach 3 has a lot of merit, and there is tremendous design room around this. The most trivial thing would be to take the hardcoded mapping the constraint solver uses and making it more extensible (something that certainly wouldn't affect type checker performance more than adding other cases). It is also possible to add one step conversions. It is also possible to add something fancier like the DAG based precedence rules but for conversions.

What I am suggesting is that we explore this design space. I think that it makes sense to start by discussing requirements (e.g. on type system performance, problems we want to solve, limitations we want to impose, etc) before talking about solutions.

-Chris

hborla · July 30, 2021, 6:24pm

Sure, I'd be happy to participate in that exploration. I do see the value in allowing a subtype relationship to be defined for certain types when it's "clearly" safe; I'm not opposed to the change pitched here, for example.

Another way to phrase my opinion is that constraint system performance has to be a critical consideration in any discussion about extensible implicit conversions, because a model that introduces another axis of exponential search space into the constraint system is just not practical. I'm also certainly open to being proven wrong about the impact that extensible implicit conversions would have on the constraint system, either with restrictions in the design or with a clever implementation strategy

That is clearly a big discussion, though, and I don't think it should hold up this pitch. If mutable -> immutable pointer conversions are a useful addition to the language now, I don't think we should block it from moving forward based on a feature that may not even be feasible to ever add to Swift. I haven't seen evidence that these narrow implicit conversions are harmful to users (e.g. in their understanding of the type system) besides perhaps being mildly annoyed when they have to write an explicit initialization when it seems like it shouldn't be necessary, so I don't see a reason why we shouldn't continue using option 2 of blessing very narrow implicit conversions while option 3 is explored.

EDIT: (I suggest using an alternative term instead of 'whitelist', as there are more inclusive options. If anybody would like to discuss this further, please DM me)