Do StaticString pointers change?

hmlongco · August 21, 2023, 1:40am

A StaticString is a non-heap allocated "string" whose value is a constant known at runtime, referenced by an UnsafePointer<UInt8>.

They're often created and returned when using # function arguments for logging, etc..

So, let's look at something like...

struct StringKey: Hashable {
    let key: String
    internal init(key: StaticString = #function) {
        ...
    }
}

class Test {
    var item3: StringKey {
        StringKey()
    }
}

So, if I call item3 from somewhere in Test, the value of key provided by the default #function argument in the StringKey constructor is "item3", accessible via the utf8Start value of StaticString, The string value (item3) will always be the same for the lifetime of the program.

And will have the same value every time the variable is called and a new StringKey is returned.

Which leads to the question at hand:

Since it's a constant value, created at compile time and loaded with the rest of the linked code, will the corresponding utf8Start pointer value likewise always be the same through the lifetime of the program?

wadetregaskis · August 21, 2023, 1:50pm

Yes.

StaticString source.

There's no explicit guarantee of this stability in the documentation, but the relevant member variable (_startPtrOrData) is never mutated after initialisation.

Also, the member variable is marked @usableFromInline and utf8Start is marked @_transparent, so the implementation can't really change (without breaking ABI compatibility). Strictly speaking this doesn't guarantee that it returns the same pointer every time - _startPtrOrData is still technically mutable, as a var - but it seems unlikely to ever change.

Karl · August 21, 2023, 3:04pm

More to the point, when hasPointerRepresentation is true, the string is stored just as a naked pointer.

There is no reference counting to ensure the memory that is pointed-to stays alive, or to deallocate it when it is no longer referenced. Therefore, the memory must live forever. This is also implied by the StaticString API - utf8Start is just a property. Unlike most Swift APIs which expose pointers, it's not scoped to a closure to bound its lifetime.

That said, there is no guarantee that each occurrence of a StaticString literal has a unique memory location. If two StaticStrings have the same contents, they might be deduped.

hmlongco · August 21, 2023, 3:07pm

Appreciate the response, but not quite the question I was asking.

If I create a StaticString from a # function as shown above, will the pointer value always be the same across multiple invocations?

The pointer is set to a constant value loaded into memory somewhere. Is it ever tossed out and reloaded elsewhere?

I'd think not, since the pointer isn't reference counted or locked, and as such the value pointed to must remain the same (and in the same place) for the lifetime of the program.

hmlongco · August 21, 2023, 3:11pm

True. But if they have the same contents, then for all intents and purposes they're the same value and I think that's fine for my use case.

Karl · August 21, 2023, 4:35pm

If you're sure. The value of #function in the example is literally item3 -- so if you had another property called item3 in some totally different class, it may be deduped to the same static string, with the same memory location.

If you're static linking, the function name strings may even get deduped across different class names and different modules.

So if you want to use static string pointers as identifiers, you should advise clients to choose very unique function/property names. And at that point, it may be better to just generate a UUID. Have a macro embed a static UUID if the cost of generating them is too high.

eskimo · August 22, 2023, 7:18am

If you're static linking, the function name strings may even get
deduped across different class names and different modules.

Yep. I just had a DTS incident where a developer was being bitten by something very similar to that (although in that case it was deduping CFString values). They had code [1] that was relying on two different NSString pointers being unique, even though the strings had the same value. That worked with their old build system, which built everything as dynamic libraries, but failed with Swift package manager, which favours static linking. The static linker merged the values and badness ensued.

Taking a step back…

I’ve been watching this thread go by and… well… sheeesh… I wouldn’t use this approach. It seems to me to be relying on a long chain of “as long as everything behaves exactly this way”, and my experience is that such approaches don’t end well in the long term.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

[1] Not their code, mind you. This was in a third-party library.

hmlongco · August 22, 2023, 2:33pm

Take a look at th following code.

class Test1 {
    var item3: StringKey {
        StringKey()
    }
    var item4: StringKey {
        StringKey()
    }
}
class Test2 {
    var item3: StringKey {
        StringKey()
    }
}

Basically, StringKey needs to quickly differentiate between item3 and item4 within the same class (the key is used to access cache values within that class).

If item3 is deduped across classes it doesn't matter. The pointer value of item3 is still different from that of item4 (they must point to unique strings). And the key values must be unique, since the variable names on the class must also be unique.

The key is used to index into several dictionaries and converting the StaticString pointer to an integer and using that as a key is much, much, much faster than continually creating a heap allocated String from the StaticString, passing the string around (and reference counting it each time), and using the string as an index into the dictionaries (which requires hashing/string comps each time).

I could just do a one-time hash of the value... but hashes aren't guaranteed to be unique.

The benchmarked speed difference is about 400%, which is a pretty dramatic improvement.

Bottom line is that the value of # function must always resolve to the same text value due to the definition of that particular function. It should remain the same value during the lifetime of the app. And that value is a StaticString, whose pointed value must also remain constant for the lifetime of the app, since the pointer is now part of the StaticString, and the lifecycle of that string is unknown to Swift.

Those are inferences, true. But they're inferences based on the very definitions of the functions and types.

The only fly in the ointment is whether or not I could get a different pointer value across multiple invocations of the function... and given the other inferences I can't see why that would occur. Since the pointer must persist, I don't see how it could move, or why it would be reloaded/duplicated elsewhere.

I'll noodle some more, but the performance increase is hard to ignore. I appreciate the thoughts.

hmlongco · August 23, 2023, 2:28pm

After all of the comments I decided to rework things and I came up with the following approach of storing the StaticString in the key and performing my own fast hash and compare.

public struct FactoryKey: Hashable {

    public let type: ObjectIdentifier
    public let key: StaticString

    public init(type: Any.Type, key: StaticString = #function) {
        self.type = ObjectIdentifier(type)
        self.key = key
    }

    public func hash(into hasher: inout Hasher) {
        hasher.combine(type)
        if key.hasPointerRepresentation {
            hasher.combine(bytes: UnsafeRawBufferPointer(start: key.utf8Start, count: key.utf8CodeUnitCount))
        } else {
            hasher.combine(key.unicodeScalar.value)
        }
    }

    public static func == (lhs: Self, rhs: Self) -> Bool {
        // types don't match unequal
        guard lhs.type == rhs.type else {
            return false
        }
        // check key string
        if lhs.key.hasPointerRepresentation && rhs.key.hasPointerRepresentation {
            // safe to compare key addresses, if they match equal
            if lhs.key.utf8Start == rhs.key.utf8Start {
                return true
            }
            // not the same string, but same value?
            return strcmp(lhs.key.utf8Start, rhs.key.utf8Start) == 0
        } else if lhs.key.hasPointerRepresentation == false && rhs.key.hasPointerRepresentation == false {
            return lhs.key.unicodeScalar.value == rhs.key.unicodeScalar.value
        }
        // in this context if one's a scalar and one's a pointer unequal
        return false
    }

}

I tried precomputing the hash in the init and saving it, but the speed improvement of doing that was only about 0.2% faster, so I decided to forgo the extra size and complexity.

And settle for a 390% speed improvement.

I'm 99% sure the original approach would work... but as I was once told, close only counts in horseshoes, hand grenades, and when your deodorant stops working.