@Max_Desiatov that resolved the problem. I need to keep an eye on it whenever I do a command line tool update.
Thank you for catching that! Also thanks to @kubamracek.
@Max_Desiatov is there any way to bring String
into Embedded Swift for WASM? Playing with swift-for-wasm-examples-main
and really can't imagine life without String
type in web development.
There is work in progress on bringing the current Swift.String into Embedded Swift: [embedded] Port Swift.String to embedded Swift by kubamracek ¡ Pull Request #70446 ¡ apple/swift ¡ GitHub
Though, not everyone and not all use cases want (or can use) the regular String that exists in Swift today, because it's Unicode-compliant, commonly heap-allocated, dynamically sized, etc. Swift.String will only be a good fit for a subset of Embedded Swift users, and more explorations is the "embedded strings problem space" will be needed.
Amazing work @kubamracek and thank you for your quick response!
How do you think how soon it could be available in nightly development snapshot? Would love to start playing with it.
No ETA, sorry, there are still definitely unknowns to be resolved first. But we can get the Swift CI system to build a custom toolchain, with the change in, and perhaps that could let you start some experiments. Let me do that on that pull request (after a few hours, there should be a link to ci.swift.org that will have a link to the produced toolchain).
As I've clarified on that PR, if you need String
in Wasm you shouldn't be using Embedded Swift in the first place. Use of String
will negate binary size reduction that Embedded Swift brings, and as for performance you'll still incur UTF-8 <-> UTF-16 re-encoding overhead on every call bridged to JavaScript.
Thank you @Max_Desiatov I appreciate your response and suggestions.
I will try to implement that special string type with UTF-16 under the hood.
I'm currently trying to build a minimal version of the JavaScriptKit
and it seems I can't make it fully working without strings, StaticString
is definitely not enough here.
In Embedded Swift for Wasm interoperating with JS, what could work is a #utf16
macro that converts string literals to [UInt16]
, similar to the #utf8
macro I previously suggested here: SE-0382: Expression Macros - #23 by Max_Desiatov. That way with an allocator you can get all of the array operations available, such as concatenation, addressing via integer indices, transformations with map
etc.
As for JavaScriptKit, it was designed before Embedded Swift existed and with little constraints on binary size, which looked small anyway when compared to 5 MB minimum of non-embedded Swift binaries built for WASI. Its dynamic approach to bridging would need to be rethought for Embedded Swift. Maybe it could be built on top of macros that automatically produce @_extern
and @_expose
bindings which are static, don't requite passing and re-encoding a huge amount of strings around, and have much better performance as a consequence.
I'm not sure it's necessarily that absolute. There's a further cleaving of String
here where we could provide the type itself, but without its Equatable
, Hashable
, and Comparable
logic that rely on lookup tables (while still providing that functionality on UTF8View
so you can explicitly opt in to code-unit-based comparisons). And now that the stdlib uses its own implementation instead of ICU, the cost of those tables is on the order of tens of kilobytes in native code; that's still too much for otherwise very small programs (and I'm not sure how it scales when encoded in a wasm module rather than a native binary), but is still a lot smaller than the hundreds of kilobytes that a well-shaken non-embedded Swift stdlib takes up.
I'm curious to see how that plays out in action, I've never seen it actually amounting to tens of kilobytes in a final binary.
Either way the point of re-encoding to and from UTF-16 still stands. Without first-class UTF-16 support in String
the best approach for high performance interop with JS is some Collection<UInt16>
([UInt16]
or UBP) encoded from string literals via macros.
String
only needs ~26kb of data to perform Equatable
, Comparable
, and Hashable
. A little more for grapheme breaking, but the majority (aka maybe 95%) of the data is to just implement Unicode.Scalar.Properties
. We can subset that portion of the data out
This seems like it would be immensely useful. To add a data point: WebAssembly Macros currently compile to > 30 MB binaries even with -Os
and wasm-opt
optimization. Unfortunately swift-syntax relies on String
so we can't build in Embedded mode just yet, but if we were able to include a stripped-down String in embedded I wonder how much of that 30 MB we could trim down.
I mean, most of that code is probably Swift-Syntax's parser.
The way I understand it, Embedded mode enables forced protocol specialization and aggressive LTO code-stripping. So the hope is that this more efficient code generation will ultimately reduce the size of SwiftSyntaxâs admittedly complex parser.
Are you referring to generics specialization? This only helps with reducing the need for the Swift runtime, but otherwise it actually increases binary size for every specialization. Where you had a single type or function in non-embedded mode, now you have multiple copies of those for each specialization.
For big projects that rely on generics heavily, this may offset whatever gains caused by the exclusion of the runtime code and other optimizations. The impact on final binary size cannot be always predetermined to be positive for such projects.
Good to know!
However whatâs true in theory may not ultimately apply to real-world apps. So I think itâd be useful to study the differences when compiling real-world apps with either regular Swift or Embedded mode. For example, what if generic functions are relatively small or what if the shared instructions of longer specialized generic functions can be combined (by e.g. link-time outlining). The binary-size impact might then be lessened.
Thinking out loud:
import String.Core
import String.Hashable
import String.Comparable
import String.UnicodeScalarProperties
These could be imported implicitly on the current "big" platforms. On the embedded platforms they could be imported explicitly and individually on the "as needed" basis and only what's imported would add to the size to the resulting binary.
After some measurement I discovered that I hadn't stripped the binary
The baseline (binary size for StringifyMacro
) is now ~10 MB, with the overhead of the non-Embedded stdlib being ~3.5 MB. Per Alejandro's comment, if we brought String to Embedded it sounds like we could save roughly 3.4 MB. That's a 34% reduction, not too shabby. (To be fair I'm not accounting for the overhead of monomorphization but I don't have the data to speculate on that atm.)
Well, maybe. The code for each specialization may be much smaller than the generic code (as with something like a fixed width integer shift, which will typically be a single instruction when specialized, but can be like 100 instructions when not specialized). So if you only have a few specializations, and can eliminate the generic version, you can still come out way ahead.
Is this in active development? "Latest" news for their IDE is from December 2022...