Massive mapping table

I want to embed Unicode's IDNA Mapping Table into a package.

I wrote an script to parse the text and output the result as a Swift constant of multiple different types like
[UnicodeScalar: IDNAMapping] (~1.1 million lines)
[UInt32: IDNAMapping] (~1.1 million lines)
ClosedRange<UInt32>: IDNAMapping (~10K lines)
and I spelled out any type information I could, in the generated code, but the compiler still runs out of patience, and if not, my macbook runs out of RAM since swift-frontend and SourceKitService eat up all the RAM. Then my macbook freezes.

What's the solution?

I know I can dynamically load the data from an static file (e.g. parse the same txt file at runtime instead of trying to output it as an static Swift file), but I thought there must be a way to avoid that? Loading at runtime adds a bit of a wait time for the first load which is suboptimal.

Your best bet would probably be to encode it in a .c file and implement some small accessor functions in C to retrieve the data, which you would call from Swift. Unfortunately, the Swift compiler today still does a not-so-great job of dealing with huge static/constant data, and the language features needed to make it better (@_section, const-initialization, etc.) are still experimental/not fully implemented.

4 Likes

I'd love to see something like C's #embed in Swift, for now technically we have .embeddedInCode for SwiftPM resources but it still generates a source file which kills the compiler on large resources.

I found this package which apparently has bumped into the same problem with IDNA mappings.
They are assigning each value to a let constant and afterwards creating a dictionary from all those lets. That's also a solution I guess.

I'll see what I'll settle on.

1 Like

GitHub froze my whole M1 for a solid 15s trying to load this file haha

3 Likes

Alright I ended up going the C way. I'm happy with the result considering the C source files are not hand written so I won't need to eyeball incorrect array-length numbers.

The results are in this PR Implement IDNA and Punycode by MahdiBM · Pull Request #33 · MahdiBM/swift-dns · GitHub in the 2 CSwiftDNSIDNA* targets. The generator scripts are under the utils/ directory.

Nice. Apparently I'm writing a C package now.

3 Likes

It might be worth checking if your compiler supports #embed, that would remove the need for C source generation and you'd be back to writing a Swift package. See this blog post about #embed, its a C23 support but its very likely compilers will just back port it as an extension.

1 Like