Aetherealtech’s Swift Lab - Foundational Libraries

aetherealtech · October 12, 2024, 12:13pm

The first libraries I am sharing are primarily my CoreExtensions library… this is a dumping ground for any utilities or additions to the standard library or libraries that “ship” with Swift (by that I mean in Xcode on Apple platforms, as I mentioned in the superthread I may eventually start adding other platforms and factor Apple-only stuff out)… and since that library has a few dependencies on other small libraries of mine, I have to include those in the inaugural showcase as well.

The small libraries CoreExtensions depend on are:

Assertions - Test assertions that throw
Stubbing - A library for generating random data in tests
Synchronization - Some synchronization primitives for thread safety (the blocking, not async/await type)

These libraries are not very “interesting”… they don’t really showcase any advanced Swift design. They’re just low level utilities, a big chest of tools I end up reinventing every time I start a new Swift development engagement. They form the underlying support for the other libraries that do get into the advanced design space.

However there are a few interesting tidbits I can discuss about them. Also, while tossing all these utilities together into packages wasn’t a huge effort, testing all of them was. I’ve been slugging through getting to 100% test coverage on CoreExtensions for months. Testing the async utilities in particular was very challenging. Correspondingly the tests are pretty intriguing, anyone who is mystified by how to test not just code that is concurrent but testing concurrency itself should take a look at them.

The Assertions library is to replace XCTAssert… functions with functions that throw on assertion failures. Theres’s a setting that is supposed to make it work this way, but (at least 1.5 years ago or so when I last checked) it doesn’t work with async tests (I filed a radar for this). So I just wrote my own. I also have trouble with XCTAssert… functions all taking autoclosures. For example if you try XCTAssertEqual(await getFirst(), await getSecond()), it thinks you’re passing async closures to XCTAssertEqual, which of course aren’t supported. That’s why I split them into ones that take values and ones that take regular (not auto) closures.

Stubbing contains the first macro I ever wrote for generating random data in tests. The next logical step is to add support for all enums, including with associated values, which I’ll try to add soon.

Some interesting points about the Synchronization library:

First, it is not designed for optimal performance. I’m not doing any of the Builtin trickery to get stable struct addresses, so most of these types are final classes, which leads to very suboptimal memory layout (cache invalidation from jumping between a value and its associated lock). It’s just made to be logically correct. The Standard Library now has Mutex and I only haven’t switched to that because it’s not a readers-write lock (which is frankly almost certainly an irrelevant micro-optimization to begin with, and because of the cache thrashing probably performs much worse). Eventually I may try to mimic the Standard Library’s approach and turn these all into non-copyable structs.
Synchronized.wrappedValue originally used get and set. But this is not just unperformant, it’s incorrect. If you do something like value.wrappedValue += 1 because the in-out behavior splits it into two atomic operations. I wasn't sure defer works the way I wanted it to inside a _modify but testing confirmed it does.
The Event implementation was more complicated than I originally thought, because it’s incorrect to wait on a condition variable without a predicate (due to spurious wakeups).
It took a while to convince myself the implementation of AnyConditionVariable is correct… specifically where on the notify calls it just locks and immediately unlocks the private mutex, instead of locking around the calls to the “real” notify functions. I saw from this that’s how libc++ implements it, and had to think carefully about it to accept that really is good enough.

aetherealtech · October 12, 2024, 12:18pm

The main library is CoreExtensions.

CoreExtensions is made up of several public targets, each of which is a set of related extensions. The CoreExtensions target itself is just an umbrella that imports all the rest (I like just throwing import CoreExtensions on files so all the extensions are at my autocomplete-y fingertips). I have mixed feelings about this being a single package file (it forces all the targets to be on the same old platform targets, even ones that don’t provide anything on older targets like the Async... ones). But since SPM doesn’t support reaching inside repos for package files, the alternative would force each one to be its own repo… not necessarily a bad thing, just rote tedium I chose not to go through yet.

Despite being essentially basic utilities, there are a few interesting things going on in there.

In CodableExtensions I made a wrapper to get a more helpful error message out of DecodingErrors. The way DecodingError is defined, you have to handle an @unknown default case. While aiming for 100% test coverage, I wondered how I could get a test to go into this case. I started by reaching inside an instance’s memory and setting the case tag value to an invalid one (you can quickly figure out which byte holds the tag by inspecting multiple cases)… but then that crashes when it tries to copy the value, since the value witness has to correctly copy the associated values and for that it has to understand the current case. I can’t think of a way to get to the function being called without at least one copy of the invalid case being made. I suppose to get past this I’d have to go find the value witness table for DecodingError and overwrite the function pointers to point to other functions. The easiest way to do that might be to write an equivalent enum that has an additional case, and at runtime go find the value witness tables for that enum and for DecodingError and swap them. I might try to do that one day mostly for educational purposes, but for now I just gave up and stuffed the switching inside an Untestable module.

CompareFunctions provides comparison operators that can be used to sort by multiple criteria, so that the first criterion is the “strongest” one and they get progressively weaker. Getting this to support parameter packs was my first attempt at using parameter packs. The algorithm for comparing by multiple criteria involves short circuiting (immediately return the comparison result of one if it isn’t ordererdSame). Short-circuiting is exactly what parameter packs (at least at the time) aren’t good at in Swift. I originally handled this by throwing and catching outside the repeat, but then I realized I can simplify it with a trick SIMD compilers (like for our GPUs) use when there are branches in shaders: don’t actually short-circuit, run all the iterations, just run them in a way that produces the correct result. Here, that means check if the result is already not orderedSame, and if that’s the case, just return that result again instead of calling the next sort function. That makes it much simpler, and repeating the unnecessary iterations is probably cheaper than throwing and catching.

I found during testing that the parameter packs sometimes crashed when used in the Collection.sort function. Best I can tell something is going wrong with handling escape. Internally parameter packs seem to be stuffed into tuples, but if you did that yourself that would force closures to be @escaping. Sure enough, marking them as @escaping fixes the crashes.

In CollectionExtensions I was surprised that I managed to write a fully generic parameter pack based implementation of the cartesianProduct and zip operators. The trick that enabled this was to abandon an attempt to implement the full algorithm in strongly typed parameter pack code. Instead, the strongly typed function immediately erases the parameter pack of sequences into a doubly erased array of sequences (both the sequence and element type are erased), then performs the algorithm on this type erased list in non-generic code, and takes the type erased result (an array of arrays of Any) and lazily maps the inner arrays back to the strongly typed tuples.

The cartesianProduct operator was particularly challenging. This is surely (in most cases) an unimportant microoptimization but I wanted to challenge myself by implementing in an optimal way with respect to allocations. The operator accepts any Sequences for the inputs, which means it is only safe to iterate once over each one, and the count of each one is not known except by iterating through it. In cartesian products elements in the sequences can appear more than once. The “easy” way to do this is just copy the sequences into individual arrays before building up the final array. But you don’t need to do this. The elements that get repeated are stored into the final results are when they are first encountered, so later appearances of the same element can be sourced from the final results array, avoiding the need to do any allocation except the ones needed for the final results (however many that is given the final size is unknown). Getting the logic for this correct where the algorithm knows whether a particular element has already been encountered and, if so, where to go find it in the final results array as that array is being built up, took me several tries and trial and error.

aetherealtech · October 12, 2024, 12:32pm

In LazyCollectionExtension, the main challenge in implementing lazy operators is getting the type right. For a while I thought opaque types were the correct way to do that, but I now understand those are for encapsulation. Most of the time it’s just a matter of writing the correct expression and inspecting the type of it. But an interesting case is the appending(if:) operators, which append an element or another sequence to a sequence only if the condition is true. I created a LazyInsertedList sequence that does the lazy insertion. But the type here would be dynamic: it would be LazyInsertedList<Self, Other> if the condition is true, but just Self otherwise. Originally I dealt with this using type erasure: just erase both of these to AnySequence. But we can do better than that and follow in SwiftUI’s footsteps: create an IfElseSequence<If: Sequence, Else: Sequence> type as an enum with two cases, one for the if and one for the else. The Iterator type is a similar enum with two cases. This gives us a union of the two possible types, with the enum tag recording which one we have at runtime. This is better than full type erasure that requires putting the results, and its iterator, in an Any box.

The FailingSequence stuff allows lazy operators that introduce errors that are eventually caught on a terminating operator like store(in:) or forEach.

I also successfully wrote lazy versions of the fully generic parameter pack cartesianProduct and zip operators. For cartesianProduct I took the easy way out here of buffering each individual sequence if it’s not a Collection so that it can be iterated over multiple times. The way it works is kind of fascinating: it’s a recursive call to result.lazy.flatMap { … } that repeatedly erases the resulting lazy list back to AnySequence<Any>. If you inspect the resulting type at runtime, you can see it is essentially a linked list tying every step of the nested flatMaps together, finally ending on the source sequence. I tested and made sure it does no eager evaluation of the operators by trying it on infinite sequences. Of course it never gets past the initial values of all but the last sequence, but it doesn’t hang and then crash on the operator step, it returns immediately and then you can start iterating over it.

AsyncExtensions supplies two very useful low-level utilities for async code: withTimeout and stream.

withTimeout wraps an async block and throws a TimedOut error if it takes longer than the specified time interval (or goes past the specified time). I use this all the time in tests. Its implementation is actually built on top of the other utility, stream.

What stream does is convert a Sequence of async blocks into an AsyncStream that runs all the blocks with an optionally specified maxConcurrency, from which the results can be iterated over in the order they arrive. I believe this is the proper abstraction over almost every call to withTaskGroup (how it’s implemented). Any time you want to fire off a list of async jobs and control the max concurrency, you just gather the jobs into a normal (non-async) sequence (like an array) and then call .stream on that sequence.

Testing stream, including that it abides by maxConcurrency and behaves right with and on cancellation, was a doozy.

AsyncCollectionExtensions includes parallel overloads of eager operators on normal sequences, like parallelMap and parallelFilter, that take optional maxConcurrency parameters. These are all powered by the stream operator of AsyncExtensions.

I wrestled a lot with how to properly implement the awaitAll that collects results from async functions into an array in the same order as the functions (not the order in which the results come in) in an optimal way. I wanted to avoided storing results in a dictionary or building up an array of optionals and then unwrapping them all at the end, since this all involves copying the final results at the end. I then saw something online where someone used the array init(unsafeUninitializedCapacity:initializingWith:) and then does async DispatchQueue stuff inside the block where you initialize the elements. At first I thought oh that’s definitely not safe, you’re supposed to initialize the elements before returning from that block. But then I figured if you just make sure the outer scope where the array is declared doesn’t finish, and you don’t try to use the initialized array, before all this async work is done, the array’s buffer isn’t going to be moved, so it’s fine.

CombineExtensions contains some custom Publishers. I tried to deal correctly with subscriber demand. I have never seen a custom publisher shared on the internet that tries to do this. They all just assume unlimited demand and otherwise ignore it. This is a problem particularly now that converting publishers to AsyncSequences with .values is a common technique. That works by requesting a single output on each iteration of the loop. It won’t work correctly if a publisher assumes it was asked for unlimited outputs.

Since the two collection based publishers are analogs of ones already in Combine, I tried to reverse engineer how demand is handled in those (using the ManualSubscriber). From what I gathered, combineLatest has a practically broken way of handling demand (it just broadcasts the demand to all the upstream publishers at first, and then forwards synchronous demand only to whatever upstream publisher happens to be the last one that fired). On the other hand merge handles it correctly, and that’s very complicated to get right.