Swift wasm binary sizes?

Reviving this thread because I’m wondering what impact removing the ICU dependency from the stdlib has made on SwiftWasm.

From what I can see, binaries built on swiftwasm.org are still ~4.5MB in size – is it using an older version of the compiler or did removing ICU just not make that much of a difference?

2 Likes

I believe swiftwasm.org has not updated to Swift 5.6 yet.

1 Like

There have been nightly snapshots. I haven't checked, but they should include removal of ICU, since upstream trunk is frequently merged into the SwiftWasm fork.

Ah yes, I believe the change is already in SwiftWasm.
I was specifically talking about the one used by the official online playground.

1 Like

In most cases removing ICU dependency won't have a significant impact on binary size. The ICU library was used for multi-megabyte Unicode data tables, which were basically "moved" from ICU to the Swift distribution itself.

In fact, for apps and libraries depending on Foundation, binaries compiled by SwiftWasm will become bigger because of this.

This is caused by Foundation still depending on ICU. Until it's updated to rely on Unicode data tables within Swift itself, we'll have to embed both the ICU Unicode data tables and a version of those embedded in Swift runtime/stdlib.

1 Like

I think you're underestimating how much data ICU had that we don't have anymore. You mention ICU having multi-megabytes of Unicode data tables where the stdlib's Unicode data tables only clock in at around 500kb (less than 1 megabyte let alone multi-megabyte).

2 Likes

I'm not debating this, and yes, I should've phrased it as "prohibitively expensive" instead of "multi-megabyte".

The main point is that people interested in reducing size of binaries produced by SwiftWasm are mainly targeting the browser environment. There we're competing with alternatives (AssemblyScript, Rust, C/C++) that can produce 2kb binaries for "Hello, World!" apps, or even less than 2kb when writing plain JavaScript.

In the past years we've received multiple requests for reducing binary size of binaries produced by SwiftWasm. People are saying it's a deal breaker for them and they can't even consider using Swift in the browser in anything close to production environments because of this. For them 500kb Unicode data tables is 500kb more than their available size allowance.

I'm only trying to manage their expectations, as apparently there's a common misunderstanding that with the removal of ICU all Unicode data tables were somehow completely removed as well, which is not true. And when Foundation is linked, it actually makes things worse with a substantial subset of these tables duplicated.

8 Likes

Just curious are developers looking to shrink the swift wasm binary size because of download (data) constraints on their servers or loading time constraints?

Load time on the client is one of the concerns. Even if the .wasm binary is cached by the browser, it still takes time to load the module into a Wasm VM, run JIT on it, and then execute, and this will happen every time on every page load. While some parts of this process may utilize caching, we can't always rely on that.

Because of this, apps built with SwiftWasm may feel slow, as they only can become interactive after the full loading process is complete, which may take seconds. Compare this to JavaScript and other languages compiled to Wasm, which have binaries with comparable functionality that are many times (or even orders of magnitude) smaller and can be loaded immediately without any lag noticeable by users.

Build time is also a big concern. Larger binaries take more time to build and link. Since Swift doesn't support hot reloading in general, our only viable option is to rebuild the binary from scratch and then reload it in the browser when developing.

When building apps with JavaScript/TypeScript and other languages compiled to WebAssembly, developers can see changes they make in their code in front of them in the browser almost instantaneously. With SwiftWasm, Unicode data tables need to be copied and relinked in the final binary, and then reloaded in the browser every time no matter how small was the change that the developer made.

4 Likes

Interesting. My Swift WASM binary is 13.22MB. The build time is kinda slow but I don’t personally have a complaint. I haven’t had any clients complain about load times (yet). My staff are all using the SwiftWasm web all daily. Sometimes, I do have to tell them to refresh because the browsers do cash old versions.

One thing I do differently is I have a version of my app that runs nativity on iOS and (somewhat on Linux). I built a cross platform GUI framework that wraps UIKit, JavaScriptKit, (somewhat SwiftGTK). So I debug on iOS (mac) but build for Swift WASM (ubuntu) so the build/debug cycle doesn’t bother me much.

It all actually works amazingly. I use actors everywhere and am pleasantly surprised how well things integrate.

I see that build times could be an issue and I welcome the reduction in binary size (tho it’s smaller than some of the media files that I have to serve), I do think that it is very much ready for production.

Though, I do wish that the default html page made when bundling the wasm app had a “loading…” message; it would reduce the likelihood on slower devices that someone might try and refresh and reduce the hassle of me dropping in a custom page.

As mentioned in the original post, there are some extremely memory constrained environments out there (many crypto "smart contract" environments). So if your binary weighs in above the max size of the environment you want to deploy to, well, you simply can't deploy.

1 Like

@Max_Desiatov have you ever looked at the “Embedded Swift” white papers that were floating around on here at some point

the paper

bare metal swift 5.1 implementation GitHub repo

I’m sure it’s a completely barebones implementation, and it’s no longer maintained. But they had swift running on STM32 microcontrollers with 1-2mb binaries. It might be worth checking out.

Thanks for managing the expectations here @Max_Desiatov . I think it's fair to say that a minimum Wasm binary size of 500 kB is unacceptable and we'll have to address this if we want Swift to be a viable alternative for Wasm in the browser.

As you pointed out, the issue is twofold:

  1. The stdlib actually adding 500+ kB (mostly unicode data tables it seems)
  2. Linking Foundation is then adding ICU back into the binary which means our minimum Foundation-linking binary size actually regressed.

Personally, I'm much less concerned about (2) because this can be addressed in the future and also you're not required to link Foundation. I'm not saying it's not a problem but I feel like it's not our biggest issue at this point in time.

Issue (1), i.e. the stdlib alone adding 500+ kB is in my opinion the true deal breaker here and we have to fix this one way or the other.

@Max_Desiatov do we have a fresh bug for this already? How to other Wasm-targeting languages do this? Is it fair to say that their strings aren't Unicode-correct (e.g. the equivalent of assert("🏴󠁧󠁢󠁥󠁮󠁧󠁿".count == 1) would fail) unless you add a specific other module which adds the Unicode functionality (which will also add the 500+ kB)?

Because I'd assume that in Wasm in the brwoser you can also decide to leverage the browser's JavaScript String implementation through for anything that needs Unicode correctness. Is that accurate?

Phrased differently, in e.g. Rust you can have your cake and eat it too by leveraging the browser's own JavaScript string functionality (which has all the Unicode bits already) when you need it (through "FFI" calls). And if you don't need the Unicode functionality you can stay in Rust-land which by default treats "strings" just like a collection of UInt8s. That may be more cumbersome but would allow you to get a < 10 kB Wasm binary. Is that all correct?

4 Likes

It may also be worth exploring whether we can be a bit more selective with the Unicode data. I took a brief look at the data files, and here's what I found. These are just rough calculations and guesses, so take it with a grain of salt (perhaps @Alejandro could say more about which data is required for what):

Adding up the grapheme data, we have (621*4)+(165*2)+(166*8) bytes = 4142 bytes. I think that's all we need for String's collection conformance (?) - I don't think it needs to perform normalisation, or case-folding, or access scalar properties.

Using the same process to total the normalisation data (needed for things like string comparison), I get 27422 bytes. I wonder if there are alternative ways to pack this data which prioritise compactness over performance.

Then we get to scalar properties, which are just enormous - the header file is 2.3MB and is so massive GitHub doesn't even bother to render it. Just taking a look at some of the larger tables:

  • _swift_stdlib_scalar_binProps is (4*4855) = 19420 bytes
  • _swift_stdlib_mappings_data_indices is (4*2879) = 11516 bytes
  • _swift_stdlib_words is 78151 bytes
  • _swift_stdlib_word_indices is (4*12866) = 51464 bytes
  • _swift_stdlib_names is 215884 bytes
  • _swift_stdlib_names_scalars is (4*39040) = 156160 bytes
  • _swift_stdlib_names_scalar_sets is (2*8704) = 17408 bytes
  • _swift_stdlib_ages is (8*1659) = 13272 bytes
  • _swift_stdlib_generalCategory is (8*3968) = 31744 bytes
  • Total: 515019 bytes

(And there's more in other files - stuff like script information, word-breaking, and case data)

Firstly, it would be great if this stuff could just be dead-stripped automatically. I suspect the vast majority of applications don't care about scalar names, or their ages, word-breaking, or even the general category - but we know that DCE is a bit weak in Swift currently, so perhaps the compiler can't prove these things are never used.

If that's the case, it seems to me like we could still ship an excellent String experience just with grapheme and normalisation data (31564 bytes). I think this would still give us String's collection views, with proper count and iteration behaviour as we're used to, as well as canonical equivalence for String comparison.

It's not totally minimal, but it would be a big improvement and it's enough that I expect most Swift applications and libraries would continue to work as normal.

7 Likes

I think properties like general category are exposed through Character so one or two availability annotations for WASI or Wasm won't be that bad IMO.

2 Likes

Perhaps they could be split in to a separate module, similar to how Linux has Foundation and FoundationNetworking.

Some scalar properties are more widely useful than others (isWhitespace, for instance), but some of the more specialised properties could be optional. When you try to build your application/library for Wasm, if it uses those properties, you'd get a compiler error telling you to import the additional data module, then you can decide whether you want to do that, or if you want to use #if os(...) or something to avoid that API on Wasm.

Even if we had better dead-stripping, having that import statement as an explicit marker for functionality which requires a large amount of data could be valuable by itself.

I believe @Michael_Ilseman already had some ideas about introducing a separate Unicode module.

2 Likes

won’t moving these APIs into a separate module break ABI (even if it is implicitly imported)?

I used a small program to write all the arrays in those headers into a file, which came out to 668006 bytes.

Fwiw, compressing that file with zstd brings it down to 258278 bytes, so there's some redundancy in there that could maybe be reduced, too.

1 Like

It wouldn't need to apply to ABI-stable systems (ABI is relevant when the standard library is distributed separately, so its binary size isn't such a concern). But even if we included ABI-stable systems in the split and implicitly imported the extra data module, I think it would be possible to use @_originallyDefinedIn(...) to prevent breakage.

2 Likes

Personally, I wouldn’t find 500kB unacceptable at all. If I want to make a “hello world” example I’d just use JS. If I want to actually write some code, the overhead of 500kB would be insubstantial (don’t get me wrong: I’d prefer the overhead to be zero).

That said, a “hello world”-like example (after aggressive DCE using external tools) is currently 4.5MB with Swift Wasm, not 0.5MB. I’m still curious to know what can be done to reduce this.

Another thread recently referenced the Swift bug that unused public symbols will currently never be stripped. Fixing that seems like a big potential win, especially for statically linked binaries. Maybe that would also be enough to solve the issues mentioned above with the Unicode tables?

Edit: How to disable implicit Foundation imports? - #63 by Joakim_Hassila1 here is the post that mentions the open issues with DCE

1 Like