Slightly faster small String on big-endian platforms

jrose · May 10, 2019, 4:47pm

The Swift repo supports exactly one big-endian platform right now, IBM System Z (s390x). (There's also build support for big-endian PowerPC64, but I don't know if anyone's using that.) I thought of a small improvement to small strings on big-endian platforms that I wanted to write down in case anyone wants to go implement it (@mundaym, @samding).

Swift's small string is very clever; it lays out its information like this (copied from SmallString.swift):

|0 1 2 3 4 5 6 7 8 9 A B C D E F| ← hexadecimal offset in bytes
|  _storage.0   |  _storage.1   | ← raw bits
|          code units         | | ← encoded layout
 ↑                             ↑
 first (leftmost) code unit    discriminator (incl. count)

For little-endian platforms, this lines up nicely with the representation of a large string, which looks like this (simplified):

|0 1 2 3 4 5 6 7 8 9 A B C D E F| ← hexadecimal offset in bytes
| countAndFlags |   pointer   | | ← little-endian representation
                               ↑
                               discriminator

Notice how the "discriminator" field overlaps in both representations. This unfortunately isn't the case for a big-endian platform, though, because the discriminator has to be in the high part of the pointer:

|0 1 2 3 4 5 6 7 8 9 A B C D E F|
| countAndFlags | |   pointer   | ← big-endian representation
                 ↑
                 discriminator

(You could make the code units continuous again by putting the pointer and discriminator as the first word in a String on big-endian systems, but then the small string's start address isn't aligned, and it's harder to add null termination.)

Currently, the way the big-endian implementation resolves this is by byte-swapping. What's more, both fields are byte-swapped, even though the first field isn't a pointer. This isn't very efficient…especially since (as far as I can tell) System Z doesn't have a bswap instruction. So, my suggestion:

smallString._storage.0 = countAndFlags // directly
smallString._storage.1 = (UInt(bitPattern: pointer) << 8) | (UInt(bitPattern: pointer) >> 56)

Most processors, including System Z, have some form of rotation instruction, so this could make conversion of a String to and from its known-small representation much faster.

Unfortunately, I don't have a big-endian system to test on, so someone else will have to pick this up if they think it's important. And we can't do this once we've declared ABI stability for the stdlib on a particular big-endian platform. But I suspect it's worth it.