Call for Users (and Authors): Offset indexing pitch

(Michael Ilseman) #61

This is a huge part of the motivation behind Contiguous Strings, which is now in review.

1 Like
(Tellow Krinkle) #62

Here's what I would do if I were to use an ASCII String version of my string scanner to parse:

func parse(pktLine: ASCIIString) throws {
	var scanner = Scanner(pktLine)
	guard scanner.read(count: 4).flatMap({ Int($0, radix: 16) }) == pktLine.count else {
		throw ParseException("Bad length")
	}
	guard let requestCommand = scanner.read(toNext: " "),
	      ["git-upload-pack", "git-receive-pack", "git-upload-archive"].contains(requestCommand)
	else {
		throw ParseException("Bad request-command")
	}
	print("request-command: \(requestCommand)")
	guard let pathname = scanner.read(toNext: "\0") else {
		throw ParseException("Missing pathname")
	}
	print("pathname: \(pathname)")
	if scanner.isEmpty { return }

	guard scanner.remove(prefix: "host=") else {
		throw ParseException("Bad host-parameter")
	}
	guard var hostnameAndPort = scanner.read(toNext: "\0").map(Scanner.init), !hostnameAndPort.isEmpty else {
		throw ParseException("Missing hostname")
	}
	let hostname = hostnameAndPort.read(toNext: ":")!
	let port = hostnameAndPort.rest
	print("hostname: \(hostname)")
	port.map { print("port: \($0)") }

	guard let extraPrefix = scanner.read(toNext: "\0") else { return }
	guard extraPrefix.isEmpty else {
		throw ParseException("bad extra-parameters")
	}
	while let extraParameter = scanner.read(toNext: "\0") {
		print("extra-parameter: \(extraParameter)")
	}
}
1 Like
(Chris Lattner) #63

Have you considered the potential confusion where array slices will sometimes have completely different behavior for:

A[4] // may actually be the first element of the slice if the startIndex is 4.
A[offset: 4] // Always startIndex+4

? I haven't read all the comments in detail, so I'm sorry if someone already asked this.

-Chris

2 Likes
(Michael Ilseman) #64

Yes, that's been discussed in this thread and I briefly alluded to to in the pitch and examples:

This is worth elaborating on with code examples (especially generic functions over RAC) in a new iteration of the pitch.

1 Like
(Letanyan Arumugam) #65

Sorry for bringing this back up again. It seems like the consensus was to wait and try again later. But for anyone interested, I performed some benchmarks using some of the suggested solutions mentioned in the thread. Namely the suggested subscript(offset:), using a RAC and the other approach of a caching* wrapper (referred to as offset view).

*: I'm not exactly sure what Brent meant by cached indices. But in any case the wrapper I used only keeps track of the last index it visited and the offset to that index from the startIndex. So when a new offset is requested, it moves from the cached index by the difference of the cached offset and the new offset.

I used three substring finding algorithms to benchmark. The algorithms all do the same thing in that they each find the first index of some substring. Please ignore the performance of the algorithms relative to each other. Instead focus on the comparison between the four approaches (index, offset view, random access and subscript).

I've included an image that shows the graphs of the benchmarks. The left contains all four approaches, and the right excludes the subscript approach. This addition is so that it's easier to compare the other three.

Source code for the benchmarks can be found on Github

From the results, it's quite clear to see that creating a random access collection is the fastest approach, even against using indices directly. The caveats for the random access approach are the additional storage requirements O(n) and the lack of being able to mutate the underlying collection directly.
The offset view approach works well and is by no means 'slow'; in fact; it can outperform the index model approach in some cases. These cases are where the brunt of the work can be moved away from needing to traverse indices directly.
The subscript directly on the collection is well and truly the least performant. But still is the easiest to use (by little though).


In terms of my personal experience with doing this little project, by far the hardest part was writing the algorithms using the Swift index model. The other three approaches were all about as easy to use as the other.

2 Likes