Bytes interop between Swift and Java

Hey! I am looking for some input what people think is the best approach to handling passing bytes between Swift and Java, when using the JNI jextract tool.

The problem

At this moment, the JExtract tool will extract any Swift UInt8 as Java’s byte, and therefore also [UInt8] as byte[]. Conversions between these two types just currently use the init(_source:) initializer, which will crash if the source is not within the valid range for the integer type.

The Swift type UInt8 is an unsigned byte, where as Java's byte is a signed byte. This will then mean Swift will crash if you try to pass a byte such as -1 to Swift.

Seing as [UInt8] is quite a common type to use in Swift to represent a bag-of-bytes, especially if you do not want to rely on Foundation, this is quite annoying for developers.

Possible solution

Instead of using the init(_source:) initializer, we could use init(bitPattern:), which will just use the underlying bits and perform no bounds checking. This would mean, if the Swift function you are calling into, relies on the numeric value of UInt8, it could lead to incorrect behaviour.

For example:

func printMyAge(_ age: UInt8) {
    print("I am \(age) years old!")
}

printMyAge(10) // I am 10 years old!
printMyAge(UInt8(bitPattern: -10)) // I am 246 years old!

This might not be that much of an issue, since UInt8 is mostly used as a byte, and not as its numeric representation. So in my opinion, this would be a good solution. Also, byte and byte[] is the "standard" way to interact with a list of bytes in Java. And if we forced users to use something like Data instead, we put the extra burden of somewhat understanding Swift Data and instantiating it, in order to call these APIs.

How others do it.

The skip-tool JNI layer extracts Kotlin code and is therefore able to use Kotlin's unsigned types, which we cannot.

The FFM mode currently only interacts with Foundation.Data as the "bag-of-bytes" type.

The questions

  • Do you think its a problem that we just use the bit-interpretation of UInt8?
    • Should this be an opt-in mode?
  • Any other ideas for solutions?
  • Do you prefer that the APIs are exported using standard Java types such as byte and byte[], or would you be OK with having to use a custom BagOfBytes type, such as Foundation.Data?
1 Like

If the FFM mode is already using Foundation.Data as the bridge type for byte[], I think it would be fine to use it for JNI as well. In my experience, idiomatic Swift almost always uses Data to deal with bags of bytes, so I don't think there is much to worry about with additional cognitive load. The only counter-argument I can think of is that it does force a Foundation dependency where it might not be strictly required, but I assume swift-java already presumes Foundation in other places?

Alternatively (or in addition), have you considered using java.nio.ByteBuffer.allocateDirect/NewDirectByteBuffer
paired with GetDirectBufferAddress/java.nio.ByteBuffer.get(int)? This could (perhaps optionally) provide a fast-track option for zero-copy heap-free migration of bytes between the Java and Swift sides. We've been wanting to implement something like this in Skip for a while.

1 Like

Sorry for the confusion, the FFM mode is not directly using Foundation.Data as the bridging type for byte[]. It just supports Foundation.Data, so if you explicitly have a Swift API such as

func encrypt(_ data: Data) 

then you can call that from java using

Data myData = Data.init(bytes: UnsafeRawPointer, count: Int)
encrypt(myData);

I definitely think the JNI mode should support Data in a similar way, not sure if we should bridge [UInt8] to Foundation.Data instead. I think [UInt8] is quite a common bag-of-bytes type, often used on Server-Side for example, or other cross-platform libraries, where you don't want to depend on Foundation, as you mention.

I have looked at the direct buffers, but from my research, it seems really callsite dependent what is the best. Direct buffers are very good if you know that the data will only be used in the native side, but very slow if you need to handle that data back in Java. (For example the encrypt function from before, if you are returning the encrypted bytes back to Java)

Here is some interesting reading on this topic: Java API Performance Improvements | RocksDB

In Readdle we are using the bit-pattern approach (we initially started with Java and now use Kotlin, but the behavior is similar). We annotate such variables/parameters with our own Java annotation @Unsigned — it explicitly tells the developer not to perform arithmetic on these values, because it won’t work correctly.

Regarding Data, our approach is to convert it to ByteBuffer and back. We evaluated multiple options and found this to be the most ergonomic method for Java/Android developers. This conversion also helps maintain a clear separation between Data and [UInt8].

3 Likes

I raised a similar question with @ktoso a month or so back. Having some automatic bridging between byte[] and [UInt8] (or allowing UnsafeBufferPointer and/or Span access to Java arrays) would go a long way toward solving this general problem.

2 Likes

Thanks for moving the discussion to the forums, it’s good to gather some more input on these decisions.

Thanks for chiming in @andriydruk! I see you’re also doing the cached wrappers AFAICS with the UInt8Enum. We have something similar called the “wrapper” mode, that we discussed a while back, it’s an opt in mode that wraps the unsigned types as their Guava unsigned equivalents – having that said, we don’t currently have users of that mode as far as I can tell.

We also already have support for making jextracted methods and parameters as @Unsigned, which we should continue/expand to cover the arrays/generic types which contain unsigned numbers.

Today, a method like:

// swift
public func unsignedLong(first: UInt64, second: UInt32) -> UInt32

is extracted as:

@Unsigned
public static int unsignedLong(@Unsigned long first, @Unsigned int second) {

This already works in ffm and also jni mode.

So the printMyAgeexample @madsodgaard provided in the OP would actually already be @Unsigned marked, giving developers that hint.

Side note: Technically we also have a “wrap” mode in FFM, where unsigned types are represented as their Guava UnsignedInteger equivalents. That’s of course expensive because of the objects, and more annoying to work with bit it was a direction that we explored. Currently I don’t think this mode is being used by any adopters though, the performance aspect wins people over to the just “watch out what you’re doing” side.

So to summarize the status quo of the FFM mode:

  • Foundation.Data - which was driven by SwiftCrypto APIs accepting that type,
  • and UnsafeRawBufferPointer are supported as well though ofc “unsafe”.

And, something I put together just now as I wanted to remind myself this piece of the sourcegen to have an informed opinion for this discussion…

  • [new] [UInt8] - partial impl over here
    • however, this isn’t quite good yet… as it incurs 2 copies, into a memory segment, and then into a Swift array… instead we should be directly creating a native instance of a Swift array from Java and we can do that.

Long story short, I thought about this throughout the day and I think the approach we want to take here is should fall out of what we’re already doing for unsigned numbers:

  • I agree that we should just extract as the “bit-width” integer type (byte[])
  • We should also “annotate” about the unsignedness with @Unsigned
    • i.e. the same way we do @Unsigned byte parm we should get @Unsigned byte[] bytes which represents “this was UInt8"
    • We also have source documentation which method a Java binding will be calling, so this also is documenting that the target is an UIntN-array method.

Then we should keep digging here and provide direct support for Data as the FFM mode does already because it then enables less copying if the target native method already is accepting Data.

This would be also great to support in swift-java of course. Would one of you be willing to help out and contribute some of that support back upstream @marcprux @andriydruk? We’d love to get some collaboration that benefits Android but also others going here, so if you have some things already here, that seems like perhaps a good one to start with?

I’m not sure about importing Data “as” ByteBuffer, given that we can just return a Data wrapper on the Java side – however if we could add a simple “from Data → ByteBuffer” conversions in the runtime library I’m sure folks would love this :slight_smile:

The actual end-game here I feel will be Span on the Swift side since it is somewhat more flexible than Data, so I think that’s the end goal we should be looking at next, unless Marc or someone can help contributing some byte buffer mappings as well.

In the end, it’s good to have a wide range of supported types, so I guess we’ll want all of them eventually…

1 Like

It seems like the consensus is that we should extract [UInt8] as byte[] for convenience. For that to work, that means we need to just intepret the UInt8 as the underlying bits.

The questions now are:

  • Should this be done generally for UInt8 in SwiftJava, or this is a JExtract only thing? (by making the change generally, it would also affect wrap-java)
  • Should we only use the bits when we have an array of UInt8 and not just a single UInt8?
  • Currently the @Unsigned stuff is a mode in jextract, does that mean we should only have this behaviour when that mode is enabled? This would mean this is a Jextract-only thing.
1 Like

I think every UInt should get the same treatment, and we mark them all with @Unsigned, what else do you have in mind here?

It’s only a marker really, if someone were to manually write wrappers they could/should mark their methods using it as well if wrapping UInt types. It is only in jextract because that’s the only java source generation we do.

The questions I asked where mostly rooted in that we have different unsigned numbers mode, and whether this "bit" behaviour should be used regardless of whether we use the annotate mode.

I think the approach we should take here is just always use the annotate mode, and effectively kill the unsigned numbers mode, it was only added for FFM, and according to @ktoso its not really used, unless someone else has some objections. And then we'll change the conversion for all the unsigned types in SwiftJava, such that its the same across both jextract and wrap-java.

Yeah this sounds good to me, we’ve perhaps overestimated the interest in “very safe mode”. We can always add it back if we hear from folks wanting to use a type-safe-unsigned mode where we wrap types returned by Swift in UnsignedInteger types to avoid accidental misinterpretation.

Also, thanks everyone for chiming in! It’s good to hear from the community what folks are doing and what they’d prefer default behaviors to be :slight_smile: