Java and Swift's unsigned integers

Hello everyone,
I've been working on unsigned integer support and was curious to gather some feedback what expected modes people would be most interested in.

The problem

  • Java, famously, does not have unsigned integer types (except the char type).
  • Swift, and other native languages, very frequently make use of UInt types
  • It is technically not safe/correct to "just" treat a returned UInt value as an int in Java, since any value larger than Integer.MAX_VALUE would show up as negative on the Java side, potentially leading to much confusion or even bugs (e.g. imagine some checks like if value > 0 etc).

When extracting bindings from existing Swift declarations, we need to decide how to represent unsigned numbers. For example, given such signature:

func echoNums(a: UInt16, b: UInt64) -> UInt64

how would we prefer (by default, and how in other opt-in modes) to represent this in Java?

??? echoNums(??? a, ??? b) { ... }

For the sake of this discussion let us focus on plain primitives. [UInt8] is a very common type but also likely deserves a separate discussion entirely.

Type mapping modes

Signed numerics are simple, they map directly to existing Java types:

Swift Type Bit Size Java Equivalent
Int8 8 byte
Int16 16 short
Int32 32 int
Int64 64 long
Int Platform-dependent Platform-dependent

However when we get to Unsigned numbers... we need to figure out how to represent them. The problem here is that if you'd write code with an extracted swift library, like this:

// java
var count = getCountSwift(); // UInt64
if (count < 0) throw new RuntimeException(...);

The problem: Since this Swift API likely declared its API to be unsigned because it fully intends to use those high values of the type's range... we may get into surprising behavior when Java code suddenly interprets that number as negative.

So, we're left to decide what the default, and what optional modes, jextract should offer.

OpenJDK's jextract just ignores the issue entirely and returns numbers "as is":

    /**
     * Getter for field:
     * {@snippet lang=c :
     * unsigned int private_flags
     * }
     */
    public static int private_flags(MemorySegment struct) {
        return struct.get(private_flags$LAYOUT, private_flags$OFFSET);
    }

Option "annotate": and I'm leaning towards doing the same, however I think we can do slightly better and by default annotate such types:

/** 
 * {@snippet lang=swift
 * func echoNums(a: UInt16, b: UInt64) -> UInt64
 * }
 */
@Unsigned // org.swift.swiftkit.core.primitves.Unsigned
long echoNums(@Unsigned short a, @Unsigned long b) { ... }

Option "wrap": The other alternative I was considering, was wrapper types which would be "very safe" and have precedent in the Guava library's Unsigned support. The upside is that it is very explicit and you're unlikely to do a conversion error, however the downside is:

  • allocating a class for the UnsignedInteger and friends

Option "widen" (meh): In theory we could consider also a "widen" mode, to just widen imported UInt types into their wider representation, however this may be too confusing, and it doesn't work for long anyway so we'd have to either annotate or wrap UInt64 anyway.

As a small summary, here are the options

Swift Type annotate (proposed) wrap widenOrWrap widenOrAnnotate
UInt8 @Unsigned byte UnsignedByte short short
UInt16 @Unsigned short UnsignedShort short short
UInt16 (alternative) char char char char
UInt32 @Unsigned int UnsignedInteger long long
UInt64 @Unsigned long UnsignedLong UnsignedLong @Unsigned long

The question

I'm curious to hear what people would prefer to be the output of jextract by default.

Since we're focused on both safety and high performance for the interoperability either option has some merit.

Default mode? The longer I look at this though, the more convinced I am that keeping it simple might be the best: and we should just do the "annotate" mode and perhaps not even offer the other ones. If anyone has strong feelings and use-cases in mind here, I'd love to hear about them though.

Keep wrap mode? The tradeoff between the modes is between safety in typesystem, at the cost of performance. I am wondering if keeping both modes is something we should offer.

Ship SwiftKit with Unsigned helpers? We could offer some unsigned numerics helpers (like Guava's UnsignedLongs and have them "batteries included")... OR, not offer them and just assume everyone knows Guava and will depend on it anyway.

Future

We could extract sources as Kotlin and support Kotlin Unsigned integer types, however generating kotling is currently not something we're working on. If we did, this would be a nice and very logical way to map those types.

8 Likes

Annotate sounds most productive, but because it can be incorrect I would require the user to opt-in instead of defaulting -- preferably with specific Swift-side API permissions, but perhaps with a blanket command configuration.

My default would be to balk, i.e., skip incommensurable API. I don't think Swift's Java inter-op should quietly bridge different semantics. For infrastructure like this, it seems the priority would be correctness, and to fail fast for ergonomic reasons, and then perhaps facilitate a policy-directed solution (instead of having any default).

Both widening and annotation are possibly incorrect because Java code can use negative numbers that don't have meaning in the corresponding Swift code.

Wrapping seems like it could be correct, but might just export headaches to the Java side where the type is otherwise unused. Also performance considerations make it seem unwise to build a type for a single value before the collection representation of Array and InlineArray (and sequence?) is settled. The community is in the best position to evolve different solutions that could inform design of a currency wrapper type, depending on inter-op needs.

When skipping, API mapping would complete, but with some indication of the omitted invalid API. If the subset of valid API is not sufficient for the use-case, the user has decisions to make. At this point, even if Swift-Java inter-op offered no other help, the user could solve their own problems in the Swift API.

If you want to help, granular Swift-side annotations would indicate whether to omit or map (possibly with value validation). The Swift API seems like the right place to decide; the developer is in the best position to see and document e.g., if the information flow is one-way and values are in range, and thus map+annotate is fine. But that seems like a lot of work to start with, and generation-time flags could set the default behavior.

3 Likes

The annotate mode sounds good to me. I don't think the wrap mode is worthwhile due to the performance cost.

But whatever we do, I do think it shouldn't be expressly incompatible with Kotlin's unsigned numerics. You can get a taste for how we do it in SwiftJNI with our UInt16 to kotlin.UShort coercion.

3 Likes

That this hasn't been resolved in Java, 30 years later, saddens me. (Also that Java never got explicitly fixed-size names for integer types i.e. int32 etc.)

There's no good answer here because bitmapping a UInt32 to int risks weird math bugs that a developer won't expect, while mapping UInt32 to long allows for overflow. Thankfully, Swift prefers signed integers for most work.

If I may expand on @wes1's suggestion of balking… that may be a good default option if we also provide a marshalling attribute in Swift. Given the following Swift variable:

var x: UInt32

We could provide:

@attached macro marshalling<T: NumericJavaType>(as _: T.Type)

Thus:

@marshalling(as: jint.self) var x: UInt32 // marshals by bitmapping
@marshalling(as: jlong.self) var y: UInt32 // marshals by widening
2 Likes

Thanks for the inputs everyone.

Unsurprisingly this is a topic that seems to yield all kinds of responses, and it really depends who you ask... I also asked some JDK developer friends, and some were more on the side of widening actually. While this thread seems to prefer not extracting unsigned.

Gathering the input here and from other discussions, I think we should be acting mostly similar to OpenJDK's official jextract tool, so mode I've called "annotate" here. Which we're actually doing slightly better than OpenJDK since we at least "annotate" the types so this shows up in autocompletion in IDEs (for example in IntelliJ, the most popular Java IDE):

Also because we want to get people off using C, and into using Swift so the experience not having a ton of papercuts immediately is also important.

This thread confirmed though that we should leave the "mode" selection as an option, so I'm going to leave that in, even if the wrap isn't likely to get a lot of use. It does however set us up for a future where we can expose kotlin.UShort and friends by just adding one more mode (and using the same infrastructure).

Extending control / options

I do like the general suggestion of allowing more fine-grained control over import behavior that @grynspan brought up.

The downside of an @JExtract(as: ...) var myInt: UInt is that it would require library developers to populate their library code with Java specifics... This may be okey for some developers, so I think we can keep this in mind for the future (I made an issue to track it).

We already have a configuration driven approach as well, where a swift-java.config file can configure tool behavior -- and I think it may be equally interesting to allow java developers wanting to consume a Swift library to modify extract behavior without modifying the swift library.

I think both are nice additions and I'll track them as future ideas to look into.

3 Likes

Guava... doesn't have this type. I see UnsignedInteger and UnsignedLong, but not UnsignedShort or UnsignedByte. UInt16 could still be char, but the UnsignedByte class you reference here straight-up doesn't exist. There's UnsignedBytes, but that's a static class with utility methods like compare, not an unsigned wrapper like UnsignedInteger and UnsignedLong are.

Another thing to consider is how to handle arrays. Things like [UInt8] are not exactly uncommon to represent a buffer of opaque data, and it would suck to have to iterate over it in an O(n) conversion to represent it in a Java UnsignedInteger[], only for the other side to convert it back to a byte[].

1 Like

It's not rocket science to make such type, so we can, the question is, if we should.

I should clarify that I was contemplating vendoring these types in SwiftKit. So we don't force a Guava dependency since that's fairly large and we don't need much of it. We could go either way, vend, or generate source that depends on Guava :thinking:

Arrays of integers will get some special treatment, these conversions are about when we literarily pass such type as an argument to a Swift fucntion, I didn't want to discuss arrays just yet though.

We already do special treatment for Data such that we can avoid un-necessary copies, and I fully expect to do additional work for [UInt8] as well as Span types; especially since we're interested in avoiding copying the data between Swift and Java.

2 Likes

But only when dealing with unsigned integer types. Since Swift prefers signed integer types, we're already pushing our developers away from the problem space. :slight_smile:

We can also pick a default one way or the other, but allow developers to refine things with the attribute.

1 Like

The "simplest" would probably make unsigned types wholesale unavailable in the inter-op (along with everything that's using them). Then eventually those who are not happy about missing API would change their UInts to Ints.

IIRC Swift established count and array indices to be Int in anticipation of issues like these, this would help to minimise the scope.

1 Like

As for Arrays linear conversion we already do it using Obj-C bridging and even in Swift when casting [1, 2, 3] as [Any]

For UInt family providing Kotlin-like wrappers seems for me to be the right default.

I have few experience with Java, but with other languages it was a pain dealing with absence of correct representation of natively unsupported types:

  • partly unavailable API can block adoption or force to invent and maintain own shims
  • annotations only say that there can by negative values in runtime, which are hard to resolve, sometimes making impossible to use function or at best force to provide meaningful error handling.
    It will be unfortunate to have problems of this kind using modern statically-checked type-safe Swift.
  • widening can be a good option for part of users, e.g. when for most of code base all integers up to UInt32 are ok, and only UInt64 should be treated accurately and can be additionally annotated to warn users.

For solution with wrappers two variants of functions can be provided. One with UInt wrapper and another with unsigned integers.
This way convenient and safe API is used by default, and in performance sensitive code they can fallback to unsafe variant.

3 Likes