Embedded Swift

@Max_Desiatov that resolved the problem. I need to keep an eye on it whenever I do a command line tool update.
Thank you for catching that! Also thanks to @kubamracek.

1 Like

@Max_Desiatov is there any way to bring String into Embedded Swift for WASM? Playing with swift-for-wasm-examples-main and really can't imagine life without String type in web development.

There is work in progress on bringing the current Swift.String into Embedded Swift: [embedded] Port Swift.String to embedded Swift by kubamracek ¡ Pull Request #70446 ¡ apple/swift ¡ GitHub

Though, not everyone and not all use cases want (or can use) the regular String that exists in Swift today, because it's Unicode-compliant, commonly heap-allocated, dynamically sized, etc. Swift.String will only be a good fit for a subset of Embedded Swift users, and more explorations is the "embedded strings problem space" will be needed.

3 Likes

Amazing work @kubamracek and thank you for your quick response!
How do you think how soon it could be available in nightly development snapshot? Would love to start playing with it.

No ETA, sorry, there are still definitely unknowns to be resolved first. But we can get the Swift CI system to build a custom toolchain, with the change in, and perhaps that could let you start some experiments. Let me do that on that pull request (after a few hours, there should be a link to ci.swift.org that will have a link to the produced toolchain).

1 Like

As I've clarified on that PR, if you need String in Wasm you shouldn't be using Embedded Swift in the first place. Use of String will negate binary size reduction that Embedded Swift brings, and as for performance you'll still incur UTF-8 <-> UTF-16 re-encoding overhead on every call bridged to JavaScript.

2 Likes

Thank you @Max_Desiatov I appreciate your response and suggestions.

I will try to implement that special string type with UTF-16 under the hood.

I'm currently trying to build a minimal version of the JavaScriptKit and it seems I can't make it fully working without strings, StaticString is definitely not enough here.

In Embedded Swift for Wasm interoperating with JS, what could work is a #utf16 macro that converts string literals to [UInt16], similar to the #utf8 macro I previously suggested here: SE-0382: Expression Macros - #23 by Max_Desiatov. That way with an allocator you can get all of the array operations available, such as concatenation, addressing via integer indices, transformations with map etc.

As for JavaScriptKit, it was designed before Embedded Swift existed and with little constraints on binary size, which looked small anyway when compared to 5 MB minimum of non-embedded Swift binaries built for WASI. Its dynamic approach to bridging would need to be rethought for Embedded Swift. Maybe it could be built on top of macros that automatically produce @_extern and @_expose bindings which are static, don't requite passing and re-encoding a huge amount of strings around, and have much better performance as a consequence.

2 Likes

I'm not sure it's necessarily that absolute. There's a further cleaving of String here where we could provide the type itself, but without its Equatable, Hashable, and Comparable logic that rely on lookup tables (while still providing that functionality on UTF8View so you can explicitly opt in to code-unit-based comparisons). And now that the stdlib uses its own implementation instead of ICU, the cost of those tables is on the order of tens of kilobytes in native code; that's still too much for otherwise very small programs (and I'm not sure how it scales when encoded in a wasm module rather than a native binary), but is still a lot smaller than the hundreds of kilobytes that a well-shaken non-embedded Swift stdlib takes up.

5 Likes

I'm curious to see how that plays out in action, I've never seen it actually amounting to tens of kilobytes in a final binary.

Either way the point of re-encoding to and from UTF-16 still stands. Without first-class UTF-16 support in String the best approach for high performance interop with JS is some Collection<UInt16> ([UInt16] or UBP) encoded from string literals via macros.

3 Likes

String only needs ~26kb of data to perform Equatable, Comparable, and Hashable. A little more for grapheme breaking, but the majority (aka maybe 95%) of the data is to just implement Unicode.Scalar.Properties. We can subset that portion of the data out :smile:

7 Likes

This seems like it would be immensely useful. To add a data point: WebAssembly Macros currently compile to > 30 MB binaries even with -Os and wasm-opt optimization. Unfortunately swift-syntax relies on String so we can't build in Embedded mode just yet, but if we were able to include a stripped-down String in embedded I wonder how much of that 30 MB we could trim down.

2 Likes

I mean, most of that code is probably Swift-Syntax's parser.

The way I understand it, Embedded mode enables forced protocol specialization and aggressive LTO code-stripping. So the hope is that this more efficient code generation will ultimately reduce the size of SwiftSyntax’s admittedly complex parser.

Are you referring to generics specialization? This only helps with reducing the need for the Swift runtime, but otherwise it actually increases binary size for every specialization. Where you had a single type or function in non-embedded mode, now you have multiple copies of those for each specialization.

For big projects that rely on generics heavily, this may offset whatever gains caused by the exclusion of the runtime code and other optimizations. The impact on final binary size cannot be always predetermined to be positive for such projects.

4 Likes

Good to know!

However what’s true in theory may not ultimately apply to real-world apps. So I think it’d be useful to study the differences when compiling real-world apps with either regular Swift or Embedded mode. For example, what if generic functions are relatively small or what if the shared instructions of longer specialized generic functions can be combined (by e.g. link-time outlining). The binary-size impact might then be lessened.

1 Like

Thinking out loud:

import String.Core
import String.Hashable 
import String.Comparable 
import String.UnicodeScalarProperties

These could be imported implicitly on the current "big" platforms. On the embedded platforms they could be imported explicitly and individually on the "as needed" basis and only what's imported would add to the size to the resulting binary.

5 Likes

After some measurement I discovered that I hadn't stripped the binary :person_facepalming:

The baseline (binary size for StringifyMacro) is now ~10 MB, with the overhead of the non-Embedded stdlib being ~3.5 MB. Per Alejandro's comment, if we brought String to Embedded it sounds like we could save roughly 3.4 MB. That's a 34% reduction, not too shabby. (To be fair I'm not accounting for the overhead of monomorphization but I don't have the data to speculate on that atm.)

2 Likes

Well, maybe. The code for each specialization may be much smaller than the generic code (as with something like a fixed width integer shift, which will typically be a single instruction when specialized, but can be like 100 instructions when not specialized). So if you only have a few specializations, and can eliminate the generic version, you can still come out way ahead.

3 Likes

Is this in active development? "Latest" news for their IDE is from December 2022... :thinking:

1 Like