Why no Data.init(immutableBytesNoCopy:...)?

I'm wondering if there's a particular reason for this omission. I've run into this a couple of times, where I have a block of data from a library that I want to wrap in a Data. I don't want to copy the bytes, and I don't want to treat them as mutable.

There is Data.init(bytesNoCopy:...), but it takes a mutable pointer to the data. I work around this using CFDataCreateWithBytesNoCopy(), which does take an immutable data pointer, and passing the result to Data.init(referencing:). But that seems a bit hacky.

Can this not be hacky? What happens if bytes passed to Data.init(immutableBytesNoCopy:) still mutate (as this can't be checked)? or freed? If it's inherently unsafe, perhaps "unsafe" needs to be in the name, e.g. Data.init(unsafeImmutableBytesNoCopy:)`.

There are still some cases where something is statically known to be immutable via pointer, say StaticString.utf8Start (for strings that contain more than a single scalar), or literal arrays that contain only literal elements, like let x: [UInt8] = [1, 2, 3]. It's unfortunate that Data can't be initialized from these bytes embedded in the binary without copying.

This thread does not belong in the Development > Standard Library category, because Data is not part of the standard library.

I'd ask a question then, why don't we have [UInt8].init(immutableBytesNoCopy:…)? It doesn't have to be specifically Data.

That question would belong in either Using Swift or Swift Evolution > Discussion

@Philippe_Hausler may be the best person to answer this question authoritatively, but IIRC, this was at least partially for compatibility with NSData, whose -initWithBytesNoCopy:length:deallocator: also takes a non-const pointer. The reason for this I'm no longer 100% sure about, so I don't want to speculate.

It should, I believe, be possible to add an overload which also takes an UnsafeRawBufferPointer to do the same (and have it copy the data on initial mutation, even if uniquely referenced), but Data.Deallocator would also need to be extended with another case which takes an UnsafeRawBufferPointer. (What happens if you pass a deallocator accepting a mutable pointer to a data created with an immutable pointer is up for debate.)

This one is easy: Array cannot represent this. Specifically, Array stores its elements in a tail-allocation on the class structure, so it always owns its memory. Thus, it cannot be a "no copy" wrapper. As Array is frozen, changing this structure would be an ABI break. Additionally, it would make some Array operations slower.

1 Like

Makes me wonder why there is no ArrayProtocol (DictionaryProtocol, SetProtocol, etc). Would be quite flexible, e.g. for array I would be able define my own version of object(at: index) and back it with an arbitrary implementation / custom storage.

That would be neat, in Objective-C we can do that with subclassing. StringProtocol and DataProtocol exist but are rarely used. One of the reason may be because these protocols need associated types and therefore would be nightmarish to use (SE-0341 may help but still wouldn't be as nice as dealing with concrete types).

Can DataProtocol do what OP wants?

func makeData<T: DataProtocol>() -> T {
    // arbitrary storage, e.g. no copy from some bytes elsewhere or bytes generated on the fly
}

func foo_that_takes_data(_ data: Data) {
    print(data.count, data[0], data[1], data[2])
}

func test() {
    foo_that_takes_data(makeData())
}

We do; but it's called Collection.

2 Likes

So is something like this possible?

func makeArray<T: Collection>() -> T {
    // arbitrary storage, e.g. no copy from some bytes elsewhere or bytes generated on the fly
}

func foo_that_takes_array(_ array: [Int]) {
    print(array.count, array[0], array[1], array[2])
}

func test() {
    foo_that_takes_array(makeArray())
}

No, you cannot create an Array from a Collection without copying.

In object-oriented designs such as NSArray, subclassing is used to provide a common interface to multiple implementations. In the Swift standard library, the idea is that protocols should provide that interface, and you use them via generics or existentials.

So foo_that_takes_array should ideally become generic (existentials as function parameters are isomorphic with generics), and if you want to store the data in another type, either that containing type must become generic, or it must store the data in an existential:

struct StoresDataNoCopy<T> where T: Collection, T.Element == Int {
  var data: T
}

// or:

struct StoresDataNoCopy {
  var data: any Collection where Element == Int
}

Note that we don't currently support constrained existentials. Also, when we say "no copy" here, we mean that the storage will not be copied (i.e. "no deep copy"); the variable must of course be copied. Copying a variable means the storage will be retained so that its lifetime is guaranteed, but it won't allocate or copy its contents in to new storage.

Is the following worldview incompatible with protocol oriented design?

Array is a protocol

[Int], etc - is of "Array of Int elements" protocol type

I can write:

func foo_that_takes_array(array: [Int]) {
	print(array.count, array[1], etc)
}

there's a standard Array implementation, say "ArrayImplementation: Array"
provided by swift

I can create my custom Array implementation, say "ArrayCustom: Array"
and pass it everywhere "Array" is taken.

ditto for Dictionary / Set / Data / String

It seems compatible, if you replace the word β€œArray” with β€œCollection”

To bring it back to OP's question: Data wrapping storage from another value (whether that's a mutable pointer, immutable pointer, Array<UInt8>, etc) is not really the Swifty way of doing things; It's the Objective-C way of doing things, working around holes in our generics system and providing a familiar interface to Objective-C developers transitioning to Swift.

There are certainly valid reasons to wrap another type's storage. For example, String includes support for "shared strings" in its ABI, because it can apply its Unicode algorithms to provide a valuable new interpretation of some data without necessarily having to own it.

The Data type doesn't really do that; it just presents the data without interpretation. When used in this way, it acts just like a pass-through wrapper. You could argue that it gives guarantees about contiguous storage, but that was considered for the standard library and ultimately the core team decided that withContiguousStorageIfAvailable was a more appropriate way to surface that for all collections.

In the future, hopefully the generics system will be expressive enough that you can use mutable pointers, immutable pointers, Array<UInt8>, etc everywhere directly, without having to wrap them in a Data.

1 Like

I see, interesting.

Karl, I've edited my pseudo example above from:

func foo_that_takes_array(array: Array)

to

func foo_that_takes_array(array: [Int])

because that's what I actually meant. i.e. the main currency type, expressible with [Int], etc, in that pseudo code is a protocol, not a concrete type. This might or might not change your answer.

Perhaps this is not the swift way.. For example StringProtocol is even discouraged for new conformances. And I can't pass some custom collection ("RandomAccessCollection of Int elements") to a function that has [Int] argument (this would be cool IMHO).

[Int] is just sugar for Array<Int>, and changing that would be massively source-breaking.

I agree that generic functions are much more awkward to write, which is why the current efforts to improve generics syntax are so important. For example:

But the syntax is only part of the problem; the other part is gaps in capabilities. It seems like those are also being addressed, so the overall generics model is becoming much more usable and complete.

1 Like