[Rant] Indexing into ArraySlice

paiv · June 29, 2018, 10:17pm

Here is an anecdote, can you guess the output?

let abc = Array("abcdef")
let some = abc[1...]
some[1] == abc[1]
some[1] == abc[2]
let same = some.map{ $0 }
same[1] == abc[1]
same[1] == abc[2]

Output:

["a", "b", "c", "d", "e", "f"]
["b", "c", "d", "e", "f"]
true
false
["b", "c", "d", "e", "f"]
false
true

I believe, shared indexing is performance optimization at the expense of usability. A slice looks in every way just like an array, except for indexing. I have to be very careful with indexing when there are slices involved. But since this happens not very often, it is easy to forget, and face runtime error. With inferred types, it is easy to misread.

Another exercise: take a slice from a slice and try indexing into that.

This is so confusing. Do you know any other language with this design?

Joe_Groff · June 29, 2018, 10:27pm

It wasn't intended as a performance optimization, but to improve the usability of slices in divide-and-conquer algorithms, since you can take the indexes from a slice and use them as-is on the original whole collection or an overlapping slice without having to track adjustments. @dabrahams may have more background on the design choice here.

paiv · June 29, 2018, 10:36pm

In my experience, this leads to me forgetting to wrap

abc[1...]

with

Array(abc[1...])

getting no compile error, but crashing run time.

Definitely would vote against it.

Joe_Groff · June 29, 2018, 10:49pm

The Array() call is going to cause a copy that's otherwise unnecessary, which is a lot of overhead just to get zero-based indexing. Instead of using absolute indexes with slices, you might have an easier time making your indexes startIndex-relative for the slice. This'll be easier for the future if you choose to genericize the algorithm to work with non-Array collections too.

AlexanderM · June 29, 2018, 10:51pm

I think this is the most sensible choice for a default behaviour, but I think it would be best to also provide a subscript(zeroIndexed index: Int) -> T

Try implementing this with 0 based indexing, and see how much uglier it gets:

func binarySearch<T: Equatable & Comparable>(_ array: [T], for desired: T) -> Int? {
	func binarySearch(_ slice: ArraySlice<T>, for desired: T) -> Int? {
		let midIndex = slice.index(slice.startIndex, offsetBy: slice.count / 2)
		let mid = slice[midIndex]
		if mid < desired {
			return binarySearch(slice[midIndex...], for: desired)
		} else if mid > desired {
			return binarySearch(slice[...midIndex], for: desired)
		}
		else {
			return midIndex
		}
	}
	
	return binarySearch(ArraySlice(array), for: desired)
}

print(binarySearch(Array(100...5000000), for: 150))

scanon · June 29, 2018, 11:16pm

Try implementing this with 0 based indexing, and see how much uglier it gets:

Debatably, it's less messy:

func binarySearch<T: Equatable & Comparable>(_ array: [T], for desired: T) -> Int? {
	func binarySearch(_ slice: ArraySlice<T>, for desired: T) -> Int? {
		let midIndex = slice.count / 2
		let mid = slice[midIndex]
		if mid < desired {
			return midIndex + binarySearch(slice[midIndex...], for: desired)
		}
		if mid > desired {
			return binarySearch(slice[...midIndex], for: desired)
		}
		return midIndex
	}
	return binarySearch(ArraySlice(array), for: desired)
}

What changes? We pick up an extra midIndex + on one branch, but we make the computation of midIndex itself a lot more intuitive.

paiv · June 29, 2018, 11:19pm

That would be

func binarySearch<T>(_ array: [T], for desired: T) -> Int? where T: Comparable {
    func binarySearch(_ slice: ArraySlice<T>, for desired: T) -> Int? {
        let midIndex = slice.count / 2
        let mid = slice[midIndex]
        if mid < desired {
            return binarySearch(slice[midIndex...], for: desired) .map{ $0 + midIndex }
        } else if mid > desired {
            return binarySearch(slice[...midIndex], for: desired)
        }
        else {
            return midIndex
        }
    }
    
    return binarySearch(ArraySlice(array), for: desired)
}

How much is it uglier?

paiv · June 29, 2018, 11:22pm

It has optional return, so you can't just +

sbromberger · June 29, 2018, 11:22pm

As a new user, this slice behavior puzzled me. It seems that the only benefit to a slice (besides zero allocations) is that it's a (run-time) error to index it below a certain point. This doesn't seem like a huge benefit.

scanon · June 29, 2018, 11:26pm

This behavior isn't really a huge deal to me either way, except that more generally as a mathematician it vaguely bothers me that we have a canonical map from the non-negative integers to the set of indices given by 0 --> startIndex and + --> indexAfter but for some reason we've chosen not to identify them.

It's a quirk, and somewhat weird coming from other languages I've used, but it hasn't proven to be a huge deal in my personal experience.

scanon · June 29, 2018, 11:27pm

Yeah, yeah, .map { /blah/ }. Sorry, coding in browser =)

paiv · June 29, 2018, 11:35pm

With zero-based indexing I would not have to call Array()

Instead of tracking which variable is Array, which is ArraySlice, and selectively applying .startIndex, I'd rather have zero-based indexing on slices.

Joe_Groff · June 29, 2018, 11:42pm

On the other hand, if you always use .startIndex, you also don't need to track which variables are Arrays. I would say that Collection indexes are more like pointers or C++-style iterators (Cursor would've been a more honest name IMO); they just happen to look like traditional integer indexes on Array.

dabrahams · June 29, 2018, 11:56pm

Not to improve usability in divide-and-conquer algorithms, but to make them usable at all in generic divide-and-conquer algorithms. Not all collections have the ability to measure distances between, and offset, indices in O(1).

paiv · June 29, 2018, 11:57pm

From user perspective, this is a promise not kept. Because a collection accepting integer index mentally is thought to behave as array indexing. It adds cognitive load on user to take them apart, where language should be unambiguous. Either don't allow integer indexing on slices, or be array-like on slices.

dabrahams · June 30, 2018, 12:01am

Yes, there are a set of design tradeoffs here. Having nonzero-based indexing with integers is not ideal, but it was the best compromise overall. Stop writing your algorithms specifically for arrays, and you won't have this problem.

paiv · June 30, 2018, 12:13am

When I have array on input, I work with array. I don't have time to generalize. Basically you are telling me that I'm holding it wrong.

Anyway this is a rant. I'm pointing that language breaks user expectations, and user have to learn their way through runtime errors.

palimondo · June 30, 2018, 7:21pm

Fully agreed. But it would be much easier to argue this case if standard library shipped with at least one tree collection…

jawbroken · July 1, 2018, 7:57am

Why are those things related at all? There are several collections that don't have integer indices, starting from 0, in the standard library already.

palimondo · July 2, 2018, 4:09am

The key step in learning about Swift collections is to think of indexes as abstract pointers into the collection. Pointing at a tree nodes comes naturally during their typical usage. This isn’t the case with Set and Dictionary. Array doesn’t help you make this mental leap at all, because it’s contiguously ordered in memory and its Index is Int, which leads people into demanding zero based indexing for Slices.

@paiv General point is, when you want to subscript into a collection, you should get your index from that collection. Int indexes are easy to form in your head and when you hardcode them into your subscripts, you can be surprised by the result.

When you realize that Slices are just views into the underlying collection, it makes perfect sense why they would share indices.

Remember that when you have an index from array, it’s a logical pointer to the element, not an ordinal position from the beginning of an array. For that you should use index arithmetic:
a.index(a.startIndex, offsetBy: 3)