Data subscript

The following fragment crashes:

import Foundation

let foo = Data([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
print(foo.indices) // 0..<10

var bar: Data = foo[5...7]
print(bar.indices) // 5..<8
assert(bar.count == 3)

let fix = false
if fix {
    bar = Data(bar) // fix
    print(bar.indices) // 0..<3
    assert(bar.count == 3)
}

assert(bar == Data([5, 6, 7]))
let baz = bar[0...1] // crash: EXC_BAD_INSTRUCTION
assert(baz == Data([5, 6]))
print("done")

It demonstrates how easy it is to shoot yourself into the foot by using subscripts of subscripts of Data.

Despite the result of "foo[5...7]" is typed to return "Data" it is some sort of data range, which you can convert into the real Data via the explicit "Data(xxx)" initialiser.

Shouldn't Data subscript return a different type, like Subdata (similar to Substring) or DataSlice (similar to Array)? Or is this just a bug in Data that shouldn't occur in the first place?

Data is its own SubSequence, and the behavior shown above is the consequence of that decision because of how indices are shared between subsequences and their parent sequences. It's not ergonomic, but the design decision has been made and can't be changed now, so it's what we're stuck with.

3 Likes

It's important to clarify that this is more of a consequence of how slice indices work than how Data works. Data adds an extra layer of weirdness due to being self-slicing, but the basic indexing behavior described here happens with any collection type.

The only thing that really changes if you sub say, Array for Data is that you can't assign foo[5...7] to an Array.

3 Likes

I know many developers who avoid numeric indexes for that very reason. If you wrap the index in a struct, it's more awkward, but it's also harder to misuse.

As a bonus, since you already ate the ergonomic cost, you can use compact or unsigned integers for your internal indices. There can be a variety of benefits from doing that.

1 Like

I'd say (and that's purely IMHO!) we'd deprecate Data at some point, or lower it into standard library (along with purifying the type). My Rust colleagues frequently ask me "why special Data instead of Array<UInt8> ?" and while I have some answers to them I must admit those answers are not too convincing even for myself.

BTW, I remember we had some meta guidelines in swift like "if it wasn't in the language today would we introduce it" and perhaps "if we did it today would we do it this way" (not quite sure about the latter) - is there a list of these swift guiding principles somewhere, historic and current?

Yep, and that's the very bit that's crucial: if I attempt to, say, pass the result of array range subscript into a method that wants Array I'd be notified right there and then about the type mismatch, at which point I'd have to make a decision of whether to convert subscript result into Array (and pay the corresponding conversion price) or adjust the code to use Array slice (perhaps more complicated but more optimal).

Not exactly what you meant but here's a subscript wrapper that converts integer indices inside:

extension Data {
    subscript(in range: Range<Int>) -> Data {
        self[index(startIndex, offsetBy: range.lowerBound) ..< index(startIndex, offsetBy: range.upperBound)]
    }
    subscript(in range: ClosedRange<Int>) -> Data {
        self[index(startIndex, offsetBy: range.lowerBound) ... index(startIndex, offsetBy: range.upperBound)]
    }
}

let foo = Data([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
var bar: Data = foo[5...7]
assert(bar == Data([5, 6, 7]))
//let baz = bar[0..<2] // Thread 1: EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
//let baz = bar[0...1] // Fatal error: Out of bounds: index < startIndex
let baz = bar[in: 0...1] // ok
assert(baz == Data([5, 6]))
print("done")

I guess to enhance that further I can implement my own "DataSlice" and make "subscript(in:)" return that type instead of Data. Then implement relevant Data's methods/properties in DataSlice (those would route the calls to the underlying Data value, converting indices appropriately). Quite some work.

BTW, there's a minor issue in error diagnostics of "bar[0..<2]" vs "bar[0...1]" (see above). The former gives a cryptic EXC_BAD_INSTRUCTION whilst the latter a much nicer "out of bounds" crash.

1 Like