Swift wasm binary sizes?

Interesting. My Swift WASM binary is 13.22MB. The build time is kinda slow but I don’t personally have a complaint. I haven’t had any clients complain about load times (yet). My staff are all using the SwiftWasm web all daily. Sometimes, I do have to tell them to refresh because the browsers do cash old versions.

One thing I do differently is I have a version of my app that runs nativity on iOS and (somewhat on Linux). I built a cross platform GUI framework that wraps UIKit, JavaScriptKit, (somewhat SwiftGTK). So I debug on iOS (mac) but build for Swift WASM (ubuntu) so the build/debug cycle doesn’t bother me much.

It all actually works amazingly. I use actors everywhere and am pleasantly surprised how well things integrate.

I see that build times could be an issue and I welcome the reduction in binary size (tho it’s smaller than some of the media files that I have to serve), I do think that it is very much ready for production.

Though, I do wish that the default html page made when bundling the wasm app had a “loading…” message; it would reduce the likelihood on slower devices that someone might try and refresh and reduce the hassle of me dropping in a custom page.

As mentioned in the original post, there are some extremely memory constrained environments out there (many crypto "smart contract" environments). So if your binary weighs in above the max size of the environment you want to deploy to, well, you simply can't deploy.

1 Like

@Max_Desiatov have you ever looked at the “Embedded Swift” white papers that were floating around on here at some point

the paper

bare metal swift 5.1 implementation GitHub repo

I’m sure it’s a completely barebones implementation, and it’s no longer maintained. But they had swift running on STM32 microcontrollers with 1-2mb binaries. It might be worth checking out.

Thanks for managing the expectations here @Max_Desiatov . I think it's fair to say that a minimum Wasm binary size of 500 kB is unacceptable and we'll have to address this if we want Swift to be a viable alternative for Wasm in the browser.

As you pointed out, the issue is twofold:

  1. The stdlib actually adding 500+ kB (mostly unicode data tables it seems)
  2. Linking Foundation is then adding ICU back into the binary which means our minimum Foundation-linking binary size actually regressed.

Personally, I'm much less concerned about (2) because this can be addressed in the future and also you're not required to link Foundation. I'm not saying it's not a problem but I feel like it's not our biggest issue at this point in time.

Issue (1), i.e. the stdlib alone adding 500+ kB is in my opinion the true deal breaker here and we have to fix this one way or the other.

@Max_Desiatov do we have a fresh bug for this already? How to other Wasm-targeting languages do this? Is it fair to say that their strings aren't Unicode-correct (e.g. the equivalent of assert("🏴󠁧󠁢󠁥󠁮󠁧󠁿".count == 1) would fail) unless you add a specific other module which adds the Unicode functionality (which will also add the 500+ kB)?

Because I'd assume that in Wasm in the brwoser you can also decide to leverage the browser's JavaScript String implementation through for anything that needs Unicode correctness. Is that accurate?

Phrased differently, in e.g. Rust you can have your cake and eat it too by leveraging the browser's own JavaScript string functionality (which has all the Unicode bits already) when you need it (through "FFI" calls). And if you don't need the Unicode functionality you can stay in Rust-land which by default treats "strings" just like a collection of UInt8s. That may be more cumbersome but would allow you to get a < 10 kB Wasm binary. Is that all correct?

4 Likes

It may also be worth exploring whether we can be a bit more selective with the Unicode data. I took a brief look at the data files, and here's what I found. These are just rough calculations and guesses, so take it with a grain of salt (perhaps @Alejandro could say more about which data is required for what):

Adding up the grapheme data, we have (621*4)+(165*2)+(166*8) bytes = 4142 bytes. I think that's all we need for String's collection conformance (?) - I don't think it needs to perform normalisation, or case-folding, or access scalar properties.

Using the same process to total the normalisation data (needed for things like string comparison), I get 27422 bytes. I wonder if there are alternative ways to pack this data which prioritise compactness over performance.

Then we get to scalar properties, which are just enormous - the header file is 2.3MB and is so massive GitHub doesn't even bother to render it. Just taking a look at some of the larger tables:

  • _swift_stdlib_scalar_binProps is (4*4855) = 19420 bytes
  • _swift_stdlib_mappings_data_indices is (4*2879) = 11516 bytes
  • _swift_stdlib_words is 78151 bytes
  • _swift_stdlib_word_indices is (4*12866) = 51464 bytes
  • _swift_stdlib_names is 215884 bytes
  • _swift_stdlib_names_scalars is (4*39040) = 156160 bytes
  • _swift_stdlib_names_scalar_sets is (2*8704) = 17408 bytes
  • _swift_stdlib_ages is (8*1659) = 13272 bytes
  • _swift_stdlib_generalCategory is (8*3968) = 31744 bytes
  • Total: 515019 bytes

(And there's more in other files - stuff like script information, word-breaking, and case data)

Firstly, it would be great if this stuff could just be dead-stripped automatically. I suspect the vast majority of applications don't care about scalar names, or their ages, word-breaking, or even the general category - but we know that DCE is a bit weak in Swift currently, so perhaps the compiler can't prove these things are never used.

If that's the case, it seems to me like we could still ship an excellent String experience just with grapheme and normalisation data (31564 bytes). I think this would still give us String's collection views, with proper count and iteration behaviour as we're used to, as well as canonical equivalence for String comparison.

It's not totally minimal, but it would be a big improvement and it's enough that I expect most Swift applications and libraries would continue to work as normal.

7 Likes

I think properties like general category are exposed through Character so one or two availability annotations for WASI or Wasm won't be that bad IMO.

2 Likes

Perhaps they could be split in to a separate module, similar to how Linux has Foundation and FoundationNetworking.

Some scalar properties are more widely useful than others (isWhitespace, for instance), but some of the more specialised properties could be optional. When you try to build your application/library for Wasm, if it uses those properties, you'd get a compiler error telling you to import the additional data module, then you can decide whether you want to do that, or if you want to use #if os(...) or something to avoid that API on Wasm.

Even if we had better dead-stripping, having that import statement as an explicit marker for functionality which requires a large amount of data could be valuable by itself.

I believe @Michael_Ilseman already had some ideas about introducing a separate Unicode module.

2 Likes

won’t moving these APIs into a separate module break ABI (even if it is implicitly imported)?

I used a small program to write all the arrays in those headers into a file, which came out to 668006 bytes.

Fwiw, compressing that file with zstd brings it down to 258278 bytes, so there's some redundancy in there that could maybe be reduced, too.

1 Like

It wouldn't need to apply to ABI-stable systems (ABI is relevant when the standard library is distributed separately, so its binary size isn't such a concern). But even if we included ABI-stable systems in the split and implicitly imported the extra data module, I think it would be possible to use @_originallyDefinedIn(...) to prevent breakage.

2 Likes

Personally, I wouldn’t find 500kB unacceptable at all. If I want to make a “hello world” example I’d just use JS. If I want to actually write some code, the overhead of 500kB would be insubstantial (don’t get me wrong: I’d prefer the overhead to be zero).

That said, a “hello world”-like example (after aggressive DCE using external tools) is currently 4.5MB with Swift Wasm, not 0.5MB. I’m still curious to know what can be done to reduce this.

Another thread recently referenced the Swift bug that unused public symbols will currently never be stripped. Fixing that seems like a big potential win, especially for statically linked binaries. Maybe that would also be enough to solve the issues mentioned above with the Unicode tables?

Edit: How to disable implicit Foundation imports? - #63 by Joakim_Hassila1 here is the post that mentions the open issues with DCE

1 Like