Bridging Foundation reference types on Linux

Hi everyone,

We're considering how to move forward on one of the stickiest remaining issues in swift-corelibs-foundation: bridging behavior on Linux (see also: known issues). I would like to solicit some feedback from the community.

Today, the bridging fundamentals are disabled in the compiler and standard library on Linux. Given an implementation that returns something like NSArray of NSString, casting to [String] using as? will succeed on Darwin but fail on Linux at runtime.

// This could also return 'Any' for some API
func f() -> NSArray {
    return NSArray(object: NSString(string: "hello world"))
}
let r = f() as? [String]
print(String(describing: r)) 
// Output on Darwin: Optional(["hello world"])
// Output on Linux : nil

One proposed idea is to make all Foundation reference types unavailable on Linux (NSArray, NSString, NSData, etc). This would turn all attempted casts into errors. We would not change anything about availability of these types on Darwin platforms.

All API in Foundation that returns an Any would be required to return the Swift value types only, so casting Any to something like [String] would succeed. We know of a few areas in our API where this would be problematic (e.g., Data(referencing: NSData) and most of the NSKeyedArchiver API), but we are looking for real world examples.

Do you use the reference types on Linux platforms? If so, why? Would removing them have a major impact on the portability of your code from Darwin?

1 Like

The main pain point I've seen relates to JSON parsing giving NSNumber values on macOS and concrete Swift values on Linux.

Having seen the grief that this seems to cause (a quick Google search makes it plain), IMHO this plan to extend the same discrepancy to the rest of Foundation seems ill advised.

Do I read correctly between the lines that actually bringing bridging behavior to Linux is off the table? This seems problematic in light of the failure of SE-0083 to remove the bridging behavior from macOS.

1 Like

Required parts of this plan would be:

• Some mechanism to give you a warning when compiling for Darwin that your code is non-portable
• Bringing the bridging behavior of NSNumber (which is unlike most other bridges, in that it can be bridged to and from many types) to Linux

This latter part would resolve stuff like this PR.

Would that mean effectively implementing SE-0083 for Darwin, but with warnings instead of outright removal?

I'm not yet sure if it would be something close to the full scope of that proposal. SE-0083 suggests initializers are a fine replacement for as casts, but if you receive an Any from an API, it would be weird to have initializers on things like Array, Dictionary, etc. which took an Any argument.

The approach here would not add new initializers. It would just make the reference types unavailable outside the implementation of Foundation itself.

1 Like

Instead of literally bringing NSNumber bridging over, we could provide AnyNumeric or something like it as a stopgap for supporting Numeric as a generalized protocol type. Giving numeric types special dynamic type system behavior in the core language doesn't seem totally unreasonable to me.

2 Likes

If the types on the ObjC side of the bridge just weren't available at all on non-Darwin platforms (which is what I think Tony is suggesting as a possibility), that would make a lot of the issues SE-0083 tried to define away less problematic in and of itself, since you just wouldn't be able to mention the types on the ObjC side of the bridge on Linux.

NSNumber is basically AnyNumber, so yes we could do something like this. However, if it's a new type then we would have to make sure it's usable across all of Darwin too, and NSNumber is ubiquitous there. That may mean a lot of migration.

The primary goal here in the short term, though, is to get to as coherent of a state between platforms as possible.

The shape of the proposed idea seems counterintuitive to me, however.

Currently, Linux functionality in this area is less than macOS functionality. To achieve coherence, either Linux needs to gain the missing functionality or macOS needs to lose it. However, if I understand the idea, it's to subtract even more from the Linux side.

Sure, if I can't actually use NSArray on Linux, then I won't miss bridging it. But then I can't use NSArray at all. This seems to decrease instead of increase coherence.

2 Likes

If the Linux functionality is a subset of the Darwin functionality, and that subset supports all of the useful functionality, then coherence comes from sticking to that portable subset. If the subset is at a level of types not being available, we could conceivably use availability attributes to alert people wanting to write portable code when they're using nonportable types.

The issue here is that NSArray is essentially used for interoperability, mostly. It carries much fewer features than struct Array<_>: it does not carry compile-time type information for its contents, it allocates on the heap, it depends on a gamut of ObjC runtime features, it has API that is not consistent with Swift's similar API (e.g. .count vs. .length). It is, in some ways, redundant.

(The two casualties of moving, in this one case, are losing subclassability and reference semantics — for the latter any trivial Box<_> type, or idiomatic use of inout, can be used to basically the same effect. Exposing a Box could be part of this, if we feel strongly enough about it.)

I believe that a similar argument can be made for most other types that would be impacted.

2 Likes

I can buy that argument for NSArray and, likely, the other handful of types with standard library counterparts. But it leaves questions unanswered regarding other scenarios where the bridging story is incomplete. Off the top of my head:

  • Pairs of value and reference types, such as Decimal and NSDecimalNumber, that are both part of Foundation (and especially in cases such as that one where there's non-congruent sets of functionality). Is the proposal to eliminate the reference types anyway?

  • Bridging to CF types. Not ever going to happen on Linux? We're talking about a cross-platform, first-party library shipping with a language that supports seamless C interop.

Would it, though? init?(bridging: Any) sounds pretty reasonable to me. It'd be like the init?(exactly:) numeric conversion initializers: you can give it a bogus value, and if it can't be done, then the result is nil.

From what I can see, most of the operations on NSDecimalNumber also exist as free functions on NSDecimal, which could in turn be made into methods and properties in Swift. We can still use something like NSDecimalNumber internally as a box for putting Decimals inside NSNumber, and/or make the Decimal type conform to Numeric so that it interops with an AnyNumeric container, depending on which way we go (if either).

AFAIK, supporting CF as API outside of Apple platforms has never been a goal. Its usefulness for C interop on other platforms seems to me like it's pretty limited since there aren't many portable C-only APIs that look anything like CF or build on top of it.

init?(bridging: Any) feels very weird, because it allows for nonsensical inputs — e.g., I could pass a NSDictionary to an hypothetical Array.init?(bridging…), perhaps due to a typo, and lose compiler assistance in finding that problem.

It is also not entirely equivalent to bridging casts. My understanding is that given let x: NSArray, it holds that ((x as Array<Any>) as NSArray) === x, which we cannot do with a hypothetical constructor on NSArray.

2 Likes

The most often reason I'd think of on Linux to reach for one of the reference types is just to have something reference-sized. If we could get id-as-Any style boxing, I think I'd be a lot happier with forbidding those NS* types. NSNumber bridging is a hugely important thing though, and AnyNumeric seems like a lot of overhead to add to users on the Darwin side, IMHO.

Can you explain which use cases having something reference-sized solves? It hadn't come up much in our initial exploration.

C interop usually; roundtripping through a context pointer or using an atomic CAS. I also remember a team member optimizing something through T: AnyObject due to bulk deallocation or something… I don't have all the context in my head, just that I know my Linux-compatible codebases have a few Box<T> in them still or use NS* types.

1 Like

Having a standard Box<T> could be useful. We could potentially even specialize its implementation for things like existentials or collections that already need to hold pointers so that, e.g., an Array is "boxed" as a pointer to its buffer instead of getting double-indirected, or an Any stores the dynamically-typed value in its own object instead of putting the existential container buffer in an object and also potentially ending up with double indirection.

1 Like

I'd be very interested in that! To my original point, it is a shame that there's a syntax difference (as AnyObject always succeeding on Darwin) without an easy way of #if-ing it out. But if that's your thinking that's definitely outside the scope of this, then.

A more on-topic question than my aside with Joe: where does this leave us for the Foundation collection and collection-adjacent types that don't have Swift equivalents? What is the fate of NSAttributedString, NSEnumerator, and NSOrderedSet?