Inline hashing

I need some unique identifier of the point in codebase.

In ObjC this is currently handled by the following macro:

#define MakeID() [NSValue valueWithPointer:({ static void const * const ID = &ID; ID; })]

I need something equivalent for Swift. Something that would behave similar to #file and #line, but It should be opaque - I don't want source file names to be present in the production build.

So, I was thinking that I could hash (#file, #line) into a single and force hash computation to be inlined, so that only hashing result is present in the final binary, that would solve my problem.

I've tried this:

@inline(__always)
func makeID(file: StaticString, line: Int) -> Int {
    let buffer = UnsafeRawBufferPointer(start: file.utf8Start, count: file.utf8CodeUnitCount)
    var hasher = Hasher()
    hasher.combine(bytes: buffer)
    hasher.combine(line)
    return hasher.finalize()
}

Function makeID() gets inlined, but file names are still present in the output. Any idea how I can force more aggressive inlining, so that only the hash value remains in the compiler output?

Inlining does not mean "computed at compile time". What you want is something that will be constant-folded. I don't think a hash calculation is such a thing, but I'm the wrong person to answer that authoritatively.

Right, Hasher cannot do this, because Hasher is randomly seeded so that it produces different results on each program execution (this is an important mitigation for certain DoS-style attacks against data structures that use hashing). Because it produces a different value on each execution, it cannot possibly be computed at compile time.

The approach that Nickolas takes in Objective-C doesn't work either, because Swift objects don't generally have stable addresses.

The following should mostly do what you want:

@_transparent
private func makeId(file: StaticString = #file, line: Int = #line) -> Int {
    return file.withUTF8Buffer { utf8 in
      utf8.reduce(into: line) { $0 = (31 &* $0) ^ Int($1) }
    }
}

Note that this is an extremely weak¹ "hash function", so you should absolutely not use it if hiding source file names is actually important at all; it leaks enough information that a determined attacker would be able to partially recreate source file names from the generated IDs. But, it's simple enough that the compiler will compute it entirely at compile time, and it has the basic properties you want: Compiler Explorer.

If you can provide a little more information about what you're really trying to do, we can probably provide a more useful suggestion.

¹ Complete and utter trash. Don't use this for anything serious.

1 Like

Thanks, that looks like something that I was looking for. I can experiment with hash function later. Hiding source files is not super critical, I'm more concerned about binary size and comparison speed. Also collisions may be a problem - actually I need a unique id.

The problem itself is related to Equality of functions. While hacking on proposal implementation, we keep using some workarounds in the codebase - by default we compare functions bitwise, but in some critical cases we use custom comparison keys. Such key should contain unique identifier of the function and captured values. This post is about the former.

What is the difference between @_transparent and @inline(__always)? From the documentation I've understood that @_transparent is inlined before data flow analysis, and @inline(__aways) - after. But what are the practical implications of that? What can data flow analysis catch in case of @_transparent that it cannot with @inline(__always)?