So the -static-executable
and -static-stdlib
command line options of the swiftc
compiler do not work?
They do work, there has been quite some discussion on this previously. Eg in the original announcement thread here
The crux of it is: you can build a -static-executable
but only if it does not use anything from libc
because libc
really does not want to be statically linked.
So, if you want a fully static executable AND you need something from libc, you can now use musl
.
-static-stdlib
has worked since a long time and allows you to only statically link the swift runtime libraries, but still dynamically link libc
(and a few others). This brings the downside that the binary will not be as portable (mostly only runs reliably on the same distro and only if the libraries are present).
How do I know if I need something from libc? I use Swift, not C.
I just try an simple hello world, to answer your question
print("hello world")
on macOS with
swift build -c release
ls -al .build/arm64-apple-macosx/release/HelloWorld
-rwxr-xr-x 1 stormacq staff 55928 Jun 20 15:20 .build/arm64-apple-macosx/release/HelloWorld
When using the Static Linux SDK
export TOOLCHAINS=org.swift.600202406131a
PATH_TO_TOOLCHAIN=/Library/Developer/Toolchains/swift-6.0-DEVELOPMENT-SNAPSHOT-2024-06-13-a.xctoolchain
DYLD_LIBRARY_PATH=$PATH_TO_TOOLCHAIN/usr/lib/swift/macosx $PATH_TO_TOOLCHAIN/usr/bin/swift build -c release --swift-sdk aarch64-swift-linux-musl --target HelloWorld
ls -al .build/aarch64-swift-linux-musl/release/HelloWorld
-rwxr-xr-x 1 stormacq staff 42116112 Jun 20 15:24 .build/aarch64-swift-linux-musl/release/HelloWorld
It's 40Mb vs 55Kb :-)
Then, I stripped it and it's now 5.9Mb. That's an impressive 86% reduction.
Given that musl libc.a
is 2.4Mb and libc++.a
is 10Mb, I find that 5.9Mb for an executable that contains both libc and the Swift runtime is not that bad :-)
# on Linux
strip -o stripped HelloWorld
$ ls -alh stripped
-rwxrwxr-x. 1 ec2-user ec2-user 5.9M Jun 20 13:37 stripped
You always need something from libc, unless you're using the embedded mode. Swift runtime is built on top of libc++ or libstdc++, which are built on top of a libc.
If we're making binary size measurements for "Hello, World!" printers on Linux, one could produce an executable taking 1096 bytes on arm64 with Embedded Swift.
For posterity here's the source code for that, showing how using syscalls without any libc would look like. No allocator is included though, to reduce the amount of dependencies.
import LinuxSyscall
@_cdecl("_start")
func start() {
let hello: StaticString = "Hello, World!\n"
__syscall3(64, 1, Int(bitPattern: hello.utf8Start), hello.utf8CodeUnitCount)
__syscall1(93, 0)
}
where LinuxSyscall
directory contains this module.modulemap
:
module LinuxSyscall {
header "aarch64/syscall_arch.h"
export *
}
and LinuxSyscall/aarch64/syscall_arch.h
looks like this:
// ----------------------------------------------------------------------
// Copyright © 2005-2020 Rich Felker, et al.
// Permission is hereby granted, free of charge, to any person obtaining
// a copy of this software and associated documentation files (the
// "Software"), to deal in the Software without restriction, including
// without limitation the rights to use, copy, modify, merge, publish,
// distribute, sublicense, and/or sell copies of the Software, and to
// permit persons to whom the Software is furnished to do so, subject to
// the following conditions:
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
// MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
// IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
// CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
// TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
// SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
// ----------------------------------------------------------------------
#define __asm_syscall(...) do { \
__asm__ __volatile__ ( "svc 0" \
: "=r"(x0) : __VA_ARGS__ : "memory", "cc"); \
return x0; \
} while (0)
static inline long __syscall0(long n)
{
register long x8 __asm__("x8") = n;
register long x0 __asm__("x0");
__asm_syscall("r"(x8));
}
static inline long __syscall1(long n, long a)
{
register long x8 __asm__("x8") = n;
register long x0 __asm__("x0") = a;
__asm_syscall("r"(x8), "0"(x0));
}
static inline long __syscall2(long n, long a, long b)
{
register long x8 __asm__("x8") = n;
register long x0 __asm__("x0") = a;
register long x1 __asm__("x1") = b;
__asm_syscall("r"(x8), "0"(x0), "r"(x1));
}
static inline long __syscall3(long n, long a, long b, long c)
{
register long x8 __asm__("x8") = n;
register long x0 __asm__("x0") = a;
register long x1 __asm__("x1") = b;
register long x2 __asm__("x2") = c;
__asm_syscall("r"(x8), "0"(x0), "r"(x1), "r"(x2));
}
static inline long __syscall4(long n, long a, long b, long c, long d)
{
register long x8 __asm__("x8") = n;
register long x0 __asm__("x0") = a;
register long x1 __asm__("x1") = b;
register long x2 __asm__("x2") = c;
register long x3 __asm__("x3") = d;
__asm_syscall("r"(x8), "0"(x0), "r"(x1), "r"(x2), "r"(x3));
}
static inline long __syscall5(long n, long a, long b, long c, long d, long e)
{
register long x8 __asm__("x8") = n;
register long x0 __asm__("x0") = a;
register long x1 __asm__("x1") = b;
register long x2 __asm__("x2") = c;
register long x3 __asm__("x3") = d;
register long x4 __asm__("x4") = e;
__asm_syscall("r"(x8), "0"(x0), "r"(x1), "r"(x2), "r"(x3), "r"(x4));
}
static inline long __syscall6(long n, long a, long b, long c, long d, long e, long f)
{
register long x8 __asm__("x8") = n;
register long x0 __asm__("x0") = a;
register long x1 __asm__("x1") = b;
register long x2 __asm__("x2") = c;
register long x3 __asm__("x3") = d;
register long x4 __asm__("x4") = e;
register long x5 __asm__("x5") = f;
__asm_syscall("r"(x8), "0"(x0), "r"(x1), "r"(x2), "r"(x3), "r"(x4), "r"(x5));
}
You can build it with a recent development snapshot using this command:
swiftc -Osize -enable-experimental-feature Embedded \
-wmo \
--target=aarch64-none-none-elf \
-I . \
-nostartfiles \
-Xfrontend -function-sections \
-Xlinker --gc-sections \
-c -o hello.o \
hello.swift
and link with lld
:
ld.lld hello.o --gc-sections -Bstatic -EL -o hello
What is this magic spell? Is it documented somewhere?
So the release build does not strip debug information from the binary. We need to use the additional strip
command.
- I could not find
org.swift.600202406131a
there; - These instructions are for macOS, while we are talking about Amazon Linux here.
OK, Now I realize that this is related to some development snapshot.
To bring the topic a bit back on track: So if I read the arguments correctly in this thread there are good reasons (security, simplicity, acceptable overhead) to say that building and deploying fully static library to AWS Lambda is actually better than building a dynamic library (with static swift stdlib) tailored for the specific known runtime env (AL2023)?
The doc has
export TOOLCHAINS=$(plutil -extract CFBundleIdentifier raw /Library/Developer/Toolchains/<toolchain name>.xctoolchain/Info.plist)
We're talking about cross-compiling on macOS to produce Linux binaries
I'm still trying to figure out :-) Thank you for this discussion that helps to gather diverse opinions.
Here is what I think about the Static Linux SDK in the context of AWS Lambda functions
PROs
- simplicity of the build (no need to have Docker, but it requires to install the SDK)
- no dependency on Amazon Linux 2 / 2023 or next
Neutral
-
binary size. It is clearly larger, even when stripped down (beware stripping must happen on Linux, making the deployment pipeline more complex). It's not necessarily bad in the context of the Lambda execution environment. Does it affect cold start time ? I don't know. Intuitively, I would answer "no" but I need to measure this.
-
I don't think there is an advantage in terms of security. I understood Alastair comment as "we can strip down the OS and remove libraries that are not used, hence reducing the surface of attack" But in the context of AWS Lambda, you typically don't provide the OS images, the AWS Lambda service does. Unless you provide your full OS image as an OCI container image. But even if you do, what are the security risks you want to protect against in the context of an AWS Lambda execution environment ?
Unless it has a negative impact on cold start time, I don't have a con yet.
Would you mind trying a
swift build -c release -Xswiftc -gline-tables-only -Xcc -gline-tables-only
That should give you a release binary which still has line information embedded (very useful for debugging crashes), but likely smaller than -g
.
Thank you @johannesweiss for the suggestion. If I applied them correctly, these two command line options do not move the needle much : 192 bytes less (- 0.0000045%)
without
swift build -c release --swift-sdk aarch64-swift-linux-musl
ls -al .build/aarch64-swift-linux-musl/release/HelloWorld
-rwxr-xr-x 1 stormacq staff 42116112 Jun 22 14:04 .build/aarch64-swift-linux-musl/release/HelloWorld
➜ SAM git:(main) ✗ ls -alh .build/aarch64-swift-linux-musl/release/HelloWorld
with
swift build -c release --swift-sdk aarch64-swift-linux-musl -Xswiftc -gline-tables-only -Xcc -gline-tables-only
ls -al .build/aarch64-swift-linux-musl/release/HelloWorld
-rwxr-xr-x 1 stormacq staff 42115920 Jun 22 14:04 .build/aarch64-swift-linux-musl/release/HelloWorld
How about
swift build -c release -Xswiftc -gnone
do we still need strip
?
That's the easiest way currently, for sure. We should probably start shipping llvm-objcopy
et al with the toolchain, which would let you strip it on any platform Swift runs on.
Since the libraries being linked also contain debug information, you probably do need to explicitly strip the result (-Xswiftc -gnone
just controls debug information for your program, not for the libraries that are already built).
I don't know what the Lambda cold start time includes, but one thing that doesn't have to happen here is dynamic library loading and the associated run-time linking. So, naïvely, I would expect cold start time to be lower for the fully statically linked binary.
I agree. There is a probably a few ms added to download the binary into the runtime environment. But give networks these days, it should be really minor. Then, as you mentioned, there is no need to do any symbol resolution. I also intuitively think the statically linked binary should decrease the cold start time. I'll try to measure that if time permits. Although I'm a bit scared of any Lambda benchmarks, these are so easy to interpret incorrectly.