Hello everyone,
I've been working on unsigned integer support and was curious to gather some feedback what expected modes people would be most interested in.
The problem
- Java, famously, does not have unsigned integer types (except the
chartype). - Swift, and other native languages, very frequently make use of
UInttypes - It is technically not safe/correct to "just" treat a returned UInt value as an
intin Java, since any value larger thanInteger.MAX_VALUEwould show up as negative on the Java side, potentially leading to much confusion or even bugs (e.g. imagine some checks likeif value > 0etc).
When extracting bindings from existing Swift declarations, we need to decide how to represent unsigned numbers. For example, given such signature:
func echoNums(a: UInt16, b: UInt64) -> UInt64
how would we prefer (by default, and how in other opt-in modes) to represent this in Java?
??? echoNums(??? a, ??? b) { ... }
For the sake of this discussion let us focus on plain primitives.
[UInt8]is a very common type but also likely deserves a separate discussion entirely.
Type mapping modes
Signed numerics are simple, they map directly to existing Java types:
| Swift Type | Bit Size | Java Equivalent |
|---|---|---|
Int8 |
8 | byte |
Int16 |
16 | short |
Int32 |
32 | int |
Int64 |
64 | long |
Int |
Platform-dependent | Platform-dependent |
However when we get to Unsigned numbers... we need to figure out how to represent them. The problem here is that if you'd write code with an extracted swift library, like this:
// java
var count = getCountSwift(); // UInt64
if (count < 0) throw new RuntimeException(...);
The problem: Since this Swift API likely declared its API to be unsigned because it fully intends to use those high values of the type's range... we may get into surprising behavior when Java code suddenly interprets that number as negative.
So, we're left to decide what the default, and what optional modes, jextract should offer.
OpenJDK's jextract just ignores the issue entirely and returns numbers "as is":
/**
* Getter for field:
* {@snippet lang=c :
* unsigned int private_flags
* }
*/
public static int private_flags(MemorySegment struct) {
return struct.get(private_flags$LAYOUT, private_flags$OFFSET);
}
Option "annotate": and I'm leaning towards doing the same, however I think we can do slightly better and by default annotate such types:
/**
* {@snippet lang=swift
* func echoNums(a: UInt16, b: UInt64) -> UInt64
* }
*/
@Unsigned // org.swift.swiftkit.core.primitves.Unsigned
long echoNums(@Unsigned short a, @Unsigned long b) { ... }
Option "wrap": The other alternative I was considering, was wrapper types which would be "very safe" and have precedent in the Guava library's Unsigned support. The upside is that it is very explicit and you're unlikely to do a conversion error, however the downside is:
- allocating a class for the
UnsignedIntegerand friends
Option "widen" (meh): In theory we could consider also a "widen" mode, to just widen imported UInt types into their wider representation, however this may be too confusing, and it doesn't work for long anyway so we'd have to either annotate or wrap UInt64 anyway.
As a small summary, here are the options
| Swift Type | annotate (proposed) | wrap | widenOrWrap | widenOrAnnotate |
|---|---|---|---|---|
UInt8 |
@Unsigned byte |
UnsignedByte |
short |
short |
UInt16 |
@Unsigned short |
UnsignedShort |
short |
short |
UInt16 (alternative) |
char |
char |
char |
char |
UInt32 |
@Unsigned int |
UnsignedInteger |
long |
long |
UInt64 |
@Unsigned long |
UnsignedLong |
UnsignedLong |
@Unsigned long |
The question
I'm curious to hear what people would prefer to be the output of jextract by default.
Since we're focused on both safety and high performance for the interoperability either option has some merit.
Default mode? The longer I look at this though, the more convinced I am that keeping it simple might be the best: and we should just do the "annotate" mode and perhaps not even offer the other ones. If anyone has strong feelings and use-cases in mind here, I'd love to hear about them though.
Keep wrap mode? The tradeoff between the modes is between safety in typesystem, at the cost of performance. I am wondering if keeping both modes is something we should offer.
Ship SwiftKit with Unsigned helpers? We could offer some unsigned numerics helpers (like Guava's UnsignedLongs and have them "batteries included")... OR, not offer them and just assume everyone knows Guava and will depend on it anyway.
Future
We could extract sources as Kotlin and support Kotlin Unsigned integer types, however generating kotling is currently not something we're working on. If we did, this would be a nice and very logical way to map those types.
