Some thoughts, if anyone is interested.
Storing the number of elements of a list or table within the meta data implies that the meta page gets updated with every insert/remove. That would conflict with concurrent transactions since each transaction would hit the meta page. So don't store the count.
The meta page stores only the head and tail pages for a list or a root page for a BTree. Note that the leafs of BTree from a double linked list. Since I de-facto have a list, I might as well add support a list type in itself.
Now how about those Swift collection protocols:
Sequence should not be a problem. Just return nil in next()
when there are no more elements.
Unfortunately the next()
function doesn't throw. No nice for-loop sugar.
Nil is fine if the end is reached but if a read error occurs in between the caller might assume the database file is still fine rather than potentially (partially) compromised.
How about a (forward) Collection?
The startIndex
is known since the head is known:
struct Index
{
let total: Int //index of element in entire collection
let page: UInt32 //index of page
let local: UInt16. //index of element on page
}
The endIndex
is encountered after walking the entire collection.
Collection
expects O(1) for its startIndex
Collection
does not require O(1) for count
Collection
expects O(1) for its endIndex
startIndex + count
would after all be O(n).
Can this be improved?
The count can be slightly improved by walking the pages. Each page can store < UInt16.max elements.
So count is reached in O(n_pages). But that's not good enough, is it.
The endIndex
is 1 greater than the last valid index. So add a marker:
struct Index
{
let total: Int //index of element in entire collection
let page: UInt32 //index of page
let local: UInt16. //index of element on page
let end: Bool //marks the end of the collection
}
Collection
expects O(1) for its endIndex
Moreover this means index can conform to Comparable
Collection
expects `index(after:). There is some overhead but Big-O wise it's O(1) (no? Overhead shouldn't count. See comment by #michelf)
A list that can't be edited is pretty useless.
However inserting elements can invalidate an existing index when:
- the inserted element has a lower total index
- the element is inserted into a same page which is full.
The full page is replaced by two newish pages over which the new and existing elements are distributed-ish. Anyway, page and local are now invalid.
That's problematic. How can I insert an element at index I and then expect to lookup the element with that index if the underlying page has been moved.
Searching the forums I saw comment that Array with its Int
index can be misleading in how the Collection protocols can be used. Yes it can be
I already had a look at String.Index
but a table is more alike to Dictionary
. Peeking at Dictionary.Index
. I see age
and HashTable.Offset
. Not sure about the latter one but the meaning of the first one is easy to infer.
I kept a dictionary index around, added a lot of elements (which led to a new age), and promptly got a runtime error when I tried reusing the index.
Likewise String
docs states that indices should only be used with the string that created them.
I shouldn't store indices longterm and thus nobody should expect me to support that either.
Conclusion: Indices are ephemeral. slight_smile:
How does that help me? I add an age to my index and list. Each time the list is updated, the age is updated. An index with a different age is invalid.
struct Index
{
let total: Int //index of element in entire collection
let page: UInt32 //index of page
let local: UInt16 //index of element on page
let end: Bool //marks the end of the collection
let age: Int //marks an age
}
Though should it be completely invalidated?
Like with an array I still want to use that same index used for insertion to lookup the element. But unlike an array, I can't be certain if the page is still valid.
Thus the total
is still viewed as valid information. In effect this creates a fast and slow path when fetching an element:
func element(at index: Index) throws -> Element
{
try index.age == self.age ?
element(at: index.local, on page: index.page : element(at: index.total))
}
//slow path: lookup the nth element by walking
func element(at index: Int) throws -> Element { ... }
//fast private path: lookup the nth element on the given page
func element(at index: Int, on page: Int) throws -> Element { ... }
However if I previously cached an index with a greater total, that element would have shifted and I would get the preceding element back. But that's the same for Array. So:
Collection
expects O(1) access to the element (within the same age
)
Also index(after:)
is still ok because it's just total + 1.
Unfortunately, given the slow path, subscript
is now:
How about BidirectionalCollection
I have an endIndex but I can't create the index before with O(1) complexity.
However I really want this because when I want to have the last n elements I don't want to go through the entire collection.
I'll add another marker to indicate where the index started from:
struct Index
{
let total: Int //index of element in entire collection
let page: UInt32 //index of page
let local: UInt16 //index of element on page
let end: Bool //marks the end of the collection
let age: Int //marks an age
let anchor: Anchor //enum: head, tail -> marks total == 0 as the head or the tail
}
I can now start at either end and walk the collection.
So index(before:)
There is stil a problem though.
With the exception of the start and end indices, I can't order two indices which originated from opposite ends unless I know the total count. Comparing such indices is undefined behaviour.
I can either throw an error or (because it's not really an issue with the file) fail with a runtime error.
How about RandomAccessCollection
It requires index(after i:.Index) -> Index
with O(1) complexity and that's just not possible (as explained by #itaiferber). And that's okay.
Summary:
Sequence:
Collection:
BidirectionalCollection:
RandomAccessCollection:
In the end it is not advisable to conform to the Collection types or expect O(1) efficiency.
However I do think that with this index type I can walk the collection from either end, find indices and elements that match some predicate and look them back up again within reasonable time.
It's only by trying to conform to the Collection types that I am considering/encountering all of these aspects.
One more thing: about that Index type.
String.Index
is just a Int with a bunch of bit masks. I assume that's more efficient because it fits a register?
The max element number is UInt48 (yes that type doesn't exist). The markers require only a bit while Bool is an entire byte.
So I am heading toward this:
struct Index
{
private let a: Int
private let b: Int
var total: Int { 48 bits of a }
var end: Bool { 1 bit of a }
var anchor: Anchor { 1 bit of a }
var age: Int { 14 bits of a }
var page: UInt32 { 32 bits of b }
var local: UInt16 { 16 bits of b }
var count: UInt16 { 16 bits of b } //number of elements on local page
}
Inserting UInt16.max elements while the age only goes up to 14 bits could be an issue.
After 14 bits, it's always the slow path. And the collection type should not escape the transaction anyway. Adding more 14 bits of elements in one transaction is gonna bi slow anyway. At which point do append not insert.
But I probably change my mind a few things more before this settles down.
Thanks for reading. Typing it out helped me.
PS. It's also interesting to go through Foundation. I never expected to see FixMe's and ToDo's in code like that. Kinda nice/reassuring to know.