Thank you so much for all of this! Would it be useful for me to open up an MR into your sample benchmark repo that aggregates what we've discovered here to help others?
Sure, would be happy to take any PR improving sample code (maybe would be good with some more comments too, as always doc is lagging a bit).
I'd also suggest you simplify even one more step for memory checking, there's a shorthand:
@_dynamicReplacement(for: registerBenchmarks) // Register benchmarks
func benchmarks() {
/* Benchmark.defaultMetrics = [
.allocatedResidentMemory,
.peakMemoryVirtual,
.peakMemoryResident,
.memoryLeaked
]*/
Benchmark.defaultMetrics = BenchmarkMetric.memory // use this instead to get all malloc related metrics
Benchmark.defaultThroughputScalingFactor = .kilo
Lovely - will get that put together here shortly and ask for your review!
Oh, I forgot - if you really want to run exactly 100k iterations to be able to compare with the macOS tooling, I would use a throughtputScalingFactor of .kilo and desiredIterations to 100.
Hi Joakim - I somehow missed that those memory test examples were in your package-benchmark-samples
repo already; I think I don't need to create an MR then. Thanks again for the clarifications.
@Joakim_Hassila1 maybe we can close the JSONEncoder memory leak bug? See: [SR-5501] JSONEncoder leaks memory · Issue #4275 · apple/swift-corelibs-foundation · GitHub
To follow up on the linear/power of two bucket scheme currently used, this is the foundation we'll use instead as mentioned:
And now it's shipped, please see the announcement here for details:
And to follow up about running benchmarks in isolation, that is now fixed in 0.6.1 that just shipped.
So @corymosiman12 if you update to 0.6.1 your test case no longer needs to be split into separate executable targets anymore for things to make sense.
amazing, thanks!
Now supporting regex for both benchmark suite and individual benchmark filtering/skipping (thanks Swift 5.7!)
This causes a command-line break where skip
has been renamed skip-target
and the new skip
argument applies to benchmarks instead.
Sorry, I'm just revising this again trying to shape up the API for a 1.0 soonish, and as far as I can tell it's not technically possible to mimic that approach when using the BenchmarkSupport as a dependency.
To make a long story short, the multi-node tests depends on a static array that defines the types that should be introspected - this works fine when things are in the same module, but there is no way as far as I have found to get to any global state from the hosting application from a package module.
Basically all avenues except dynamically replacing a function seem to be closed in practice (e.g trying to hook in by overriding a function in an extension fails with Overriding non-@objc declarations from extensions is not supported
, same thing for dynamic dispatch of protocols that doesn't work for this, etc see e.g. the discussion Joe Groff had here https://twitter.com/owensd/status/634270773000151040 with the conclusion "Is the extension in a different module? Modules can't change the behavior of other modules." - and that's basically what's needed here in some way)
The setup is basically that the Benchmark suite executable depends on the BenchmarkSupport package that provides all of the behind-the-scenes functionality (among other things, using swift argument parser to get running instructions from the hosting plugin) - and these are two different moduels.
If there were any mechanisms apart from the currently used dynamic replacement of a function that allows me to hook in and register stuff, while still allowing a nice package dependency for the supporting infrastructure, I'm all ears (but I don't want to push a benchmark user to require additional glue code if at all possible).
Just wanted to put this to the record why the existing benchmark registration looks like it does...
On another note, closing up on 1.0 (primary thing remaining is the additional export functionality of the results) a source- breaking release was done now:
(additional samples on how to migrate any existing benchmark are available in the samples projects)
i started getting the error while loading shared libraries: libjemalloc.so.2
error again trying to run benchmarks with the latest version (0.8) of the package.
here’s what i get when i run sudo make install
:
~/jemalloc-5.3.0$ sudo make install
/usr/bin/install -c -d /usr/local/bin
removed ‘/usr/local/bin/jemalloc-config’
‘bin/jemalloc-config’ -> ‘/usr/local/bin/jemalloc-config’
removed ‘/usr/local/bin/jemalloc.sh’
‘bin/jemalloc.sh’ -> ‘/usr/local/bin/jemalloc.sh’
removed ‘/usr/local/bin/jeprof’
‘bin/jeprof’ -> ‘/usr/local/bin/jeprof’
/usr/bin/install -c -d /usr/local/include/jemalloc
removed ‘/usr/local/include/jemalloc/jemalloc.h’
‘include/jemalloc/jemalloc.h’ -> ‘/usr/local/include/jemalloc/jemalloc.h’
/usr/bin/install -c -d /usr/local/lib
/usr/bin/install -c -v -m 755 lib/libjemalloc.so.2 /usr/local/lib
removed ‘/usr/local/lib/libjemalloc.so.2’
‘lib/libjemalloc.so.2’ -> ‘/usr/local/lib/libjemalloc.so.2’
ln -sf libjemalloc.so.2 /usr/local/lib/libjemalloc.so
/usr/bin/install -c -d /usr/local/lib
removed ‘/usr/local/lib/libjemalloc.a’
‘lib/libjemalloc.a’ -> ‘/usr/local/lib/libjemalloc.a’
removed ‘/usr/local/lib/libjemalloc_pic.a’
‘lib/libjemalloc_pic.a’ -> ‘/usr/local/lib/libjemalloc_pic.a’
/usr/bin/install -c -d /usr/local/lib/pkgconfig
removed ‘/usr/local/lib/pkgconfig/jemalloc.pc’
‘jemalloc.pc’ -> ‘/usr/local/lib/pkgconfig/jemalloc.pc’
Missing xsltproc. doc/jemalloc.html not (re)built.
/usr/bin/install -c -d /usr/local/share/doc/jemalloc
removed ‘/usr/local/share/doc/jemalloc/jemalloc.html’
‘doc/jemalloc.html’ -> ‘/usr/local/share/doc/jemalloc/jemalloc.html’
Missing xsltproc. doc/jemalloc.3 not (re)built.
/usr/bin/install -c -d /usr/local/share/man/man3
removed ‘/usr/local/share/man/man3/jemalloc.3’
‘doc/jemalloc.3’ -> ‘/usr/local/share/man/man3/jemalloc.3’
here is what is in /usr/local/lib/
:
$ ls /usr/local/lib
libjemalloc.a libjemalloc_pic.a libjemalloc.so libjemalloc.so.2 pkgconfig
here’s what i get when i run the plugin:
$ swift package benchmark
Building for debugging...
Build complete! (0.59s)
Building targets in release mode for benchmark run...
Build complete! Running benchmarks...
BSONEncodingBenchmarks: error while loading shared libraries: libjemalloc.so.2: cannot open shared object file: No such file or directory
Failed to run 'run' for /swift/swift-mongodb/Benchmarks/.build/x86_64-unknown-linux-gnu/release/BSONEncodingBenchmarks, result [32512]
Likely your benchmark crahed, try running the tool in the debugger, e.g.
lldb /swift/swift-mongodb/Benchmarks/.build/x86_64-unknown-linux-gnu/release/BSONEncodingBenchmarks
Or check Console.app for a backtrace if on macOS.
here’s what’s in the binary:
$ readelf -d .build/release/BSONEncodingBenchmarks
Dynamic section at offset 0x23d698 contains 41 entries:
Tag Type Name/Value
0x0000000000000003 (PLTGOT) 0x23efe8
0x0000000000000002 (PLTRELSZ) 15264 (bytes)
0x0000000000000017 (JMPREL) 0x3e610
0x0000000000000014 (PLTREL) RELA
0x0000000000000007 (RELA) 0x17e08
0x0000000000000008 (RELASZ) 157704 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000006ffffff9 (RELACOUNT) 4197
0x0000000000000015 (DEBUG) 0x0
0x0000000000000006 (SYMTAB) 0x278
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000005 (STRTAB) 0x81e0
0x000000000000000a (STRSZ) 52097 (bytes)
0x0000000000000004 (HASH) 0x14d68
0x0000000000000001 (NEEDED) Shared library: [libswiftGlibc.so]
0x0000000000000001 (NEEDED) Shared library: [libm.so.6]
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED) Shared library: [libutil.so.1]
0x0000000000000001 (NEEDED) Shared library: [libdl.so.2]
0x0000000000000001 (NEEDED) Shared library: [libFoundation.so]
0x0000000000000001 (NEEDED) Shared library: [libswiftDispatch.so]
0x0000000000000001 (NEEDED) Shared library: [libdispatch.so]
0x0000000000000001 (NEEDED) Shared library: [libBlocksRuntime.so]
0x0000000000000001 (NEEDED) Shared library: [libjemalloc.so.2]
0x0000000000000001 (NEEDED) Shared library: [libswift_Differentiation.so]
0x0000000000000001 (NEEDED) Shared library: [libswift_StringProcessing.so]
0x0000000000000001 (NEEDED) Shared library: [libswift_Concurrency.so]
0x0000000000000001 (NEEDED) Shared library: [libswift_RegexParser.so]
0x0000000000000001 (NEEDED) Shared library: [libswiftCore.so]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000c (INIT) 0x421b0
0x000000000000000d (FINI) 0x1a49e4
0x000000000000001a (FINI_ARRAY) 0x22f6e0
0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
0x0000000000000019 (INIT_ARRAY) 0x22f6e8
0x000000000000001b (INIT_ARRAYSZ) 16 (bytes)
0x000000000000001d (RUNPATH) Library runpath: [/usr/lib/swift/linux:$ORIGIN]
0x000000006ffffff0 (VERSYM) 0x172c8
0x000000006ffffffe (VERNEED) 0x17d68
0x000000006fffffff (VERNEEDNUM) 3
0x0000000000000000 (NULL) 0x0
i can run the binary successfully if i do
$ sudo ldconfig /usr/local/lib
can this be added to the getting started? alternatively, can the package configure this automatically?
finally, when i do run the benchmarks, i get a debug description output:
Debug result: [Benchmark.BenchmarkResult(metric: Time (wall clock), timeUnits: μs, measurements: 19, warmupIterations: 3, thresholds: nil, percentiles: [Benchmark.BenchmarkResult.Percentile.p75: 572522, Benchmark.BenchmarkResult.Percentile.p100: 639107, Benchmark.BenchmarkResult.Percentile.p99: 639107, Benchmark.BenchmarkResult.Percentile.p90: 634913, Benchmark.BenchmarkResult.Percentile.p0: 516162, Benchmark.BenchmarkResult.Percentile.p25: 533987, Benchmark.BenchmarkResul
...
i remember the plugin once printed an table to the terminal, which was useful. is there a way to re-enable that?
Hi!
First off, the tables would be printed to terminal if swift package benchmark
ran successfully, but there seems to be some platform specific issue here - I'm guessing you are not on Ubuntu as I just tried a clean build using 0.8.0
:
ubuntu@linux:~/x$ gh repo clone ordo-one/package-benchmark-samples
Cloning into 'package-benchmark-samples'...
remote: Enumerating objects: 137, done.
remote: Counting objects: 100% (130/130), done.
remote: Compressing objects: 100% (57/57), done.
remote: Total 137 (delta 59), reused 103 (delta 42), pack-reused 7
Receiving objects: 100% (137/137), 29.05 KiB | 303.00 KiB/s, done.
Resolving deltas: 100% (59/59), done.
ubuntu@linux:~/x$ cd package-benchmark-samples/
ubuntu@linux:~/x/package-benchmark-samples$ swift package update
Fetching https://github.com/ordo-one/package-benchmark
Fetched https://github.com/ordo-one/package-benchmark (0.85s)
Computing version for https://github.com/ordo-one/package-benchmark
Computed https://github.com/ordo-one/package-benchmark at 0.8.0 (0.22s)
...
Creating working copy for https://github.com/ordo-one/package-benchmark
Working copy of https://github.com/ordo-one/package-benchmark resolved at 0.8.0
ubuntu@linux:~/x/package-benchmark-samples$ swift package benchmark
Building for debugging...
[182/182] Linking BenchmarkTool
Build complete! (4.45s)
Building targets in release mode for benchmark run...
Build complete! Running benchmarks...
Benchmark results
============================================================================================================================
Host 'linux' with 8 'aarch64' processors with 15 GB memory, running:
#62-Ubuntu SMP Tue Nov 22 19:56:13 UTC 2022
Foundation-Benchmark
============================================================================================================================
Foundation AttributedString()
╒══════════════════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric │ p0 │ p25 │ p50 │ p75 │ p90 │ p99 │ p100 │ Samples │
╞══════════════════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ Malloc (total) │ 19 │ 19 │ 19 │ 19 │ 19 │ 19 │ 19 │ 100000 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Memory (resident peak) (M) │ 20 │ 21 │ 21 │ 21 │ 21 │ 21 │ 21 │ 100000 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Throughput (scaled / s) (K) │ 572 │ 545 │ 534 │ 522 │ 522 │ 471 │ 11 │ 100000 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (total CPU) (μs) │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 10002 │ 100000 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (wall clock) (μs) │ 2 │ 2 │ 2 │ 2 │ 2 │ 2 │ 89 │ 100000 │
╘══════════════════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛
Foundation Date()
╒══════════════════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric │ p0 │ p25 │ p50 │ p75 │ p90 │ p99 │ p100 │ Samples │
╞══════════════════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ Throughput (scaled / s) (M) │ 43 │ 43 │ 43 │ 42 │ 42 │ 42 │ 42 │ 43 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (wall clock) (μs) │ 23364 │ 23446 │ 23560 │ 23658 │ 24068 │ 24068 │ 24068 │ 43 │
╘══════════════════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛
Samples
============================================================================================================================
All metrics, full concurrency, async
╒══════════════════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric │ p0 │ p25 │ p50 │ p75 │ p90 │ p99 │ p100 │ Samples │
╞══════════════════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ Bytes (read logical) │ 747 │ 750 │ 750 │ 751 │ 1077 │ 1078 │ 1078 │ 79 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Bytes (read physical) │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 79 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Bytes (write logical) │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 79 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Bytes (write physical) │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 79 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Malloc (large) │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 79 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Malloc (small) │ 173 │ 239 │ 375 │ 423 │ 482 │ 575 │ 575 │ 79 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Malloc (total) │ 173 │ 239 │ 375 │ 423 │ 482 │ 575 │ 575 │ 79 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Malloc / free Δ (K) │ 0 │ 0 │ 0 │ 328 │ 787 │ 2033 │ 2033 │ 78 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Memory (allocated) (M) │ 23 │ 34 │ 39 │ 40 │ 40 │ 40 │ 40 │ 79 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Memory (resident peak) (M) │ 24 │ 26 │ 26 │ 26 │ 27 │ 27 │ 27 │ 79 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Memory (virtual peak) (M) │ 214 │ 227 │ 230 │ 230 │ 230 │ 230 │ 230 │ 79 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Syscalls (read) │ 3 │ 3 │ 3 │ 3 │ 4 │ 4 │ 4 │ 79 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Syscalls (write) │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 79 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Threads (peak) │ 12 │ 12 │ 12 │ 12 │ 12 │ 12 │ 12 │ 79 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Throughput (scaled / s) │ 80 │ 80 │ 79 │ 78 │ 78 │ 73 │ 73 │ 79 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (system CPU) (μs) │ 0 │ 0 │ 0 │ 0 │ 0 │ 10002 │ 10002 │ 79 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (total CPU) (ms) │ 80 │ 80 │ 80 │ 90 │ 90 │ 90 │ 90 │ 79 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (user CPU) (ms) │ 80 │ 80 │ 80 │ 90 │ 90 │ 90 │ 90 │ 79 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Time (wall clock) (ms) │ 12 │ 13 │ 13 │ 13 │ 13 │ 14 │ 14 │ 79 │
╘══════════════════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛
...
MemoryOne
============================================================================================================================
Explicit Capture Memory
╒══════════════════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Metric │ p0 │ p25 │ p50 │ p75 │ p90 │ p99 │ p100 │ Samples │
╞══════════════════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ Malloc (large) │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 0 │ 22129 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Malloc (small) │ 1016 │ 1016 │ 1016 │ 1016 │ 1016 │ 1016 │ 1021 │ 22129 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Malloc (total) │ 1016 │ 1016 │ 1016 │ 1016 │ 1016 │ 1016 │ 1021 │ 22129 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Malloc / free Δ (K) │ 0 │ 0 │ 0 │ 328 │ 328 │ 328 │ 918 │ 22129 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Memory (allocated) (M) │ 13 │ 901 │ 1789 │ 2678 │ 3211 │ 3532 │ 3565 │ 22129 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Memory (resident peak) (M) │ 22 │ 911 │ 1799 │ 2689 │ 3221 │ 3540 │ 3576 │ 22129 │
├──────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Memory (virtual peak) (M) │ 110 │ 1131 │ 2179 │ 3056 │ 3592 │ 4264 │ 4264 │ 22129 │
╘══════════════════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛
ubuntu@linux:~/x/package-benchmark-samples$ ls /usr/local/lib
python3.10
ubuntu@linux:~/x/package-benchmark-samples$ readelf -d .build/release/MemoryOne
Dynamic section at offset 0x1ff388 contains 38 entries:
Tag Type Name/Value
0x0000000000000003 (PLTGOT) 0x20ffe8
0x0000000000000002 (PLTRELSZ) 14160 (bytes)
0x0000000000000017 (JMPREL) 0x37988
0x0000000000000014 (PLTREL) RELA
0x0000000000000007 (RELA) 0x14ed0
0x0000000000000008 (RELASZ) 142008 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000006ffffff9 (RELACOUNT) 3779
0x0000000000000015 (DEBUG) 0x0
0x0000000000000006 (SYMTAB) 0x2b0
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000005 (STRTAB) 0x7b28
0x000000000000000a (STRSZ) 51430 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x14410
0x0000000000000001 (NEEDED) Shared library: [libswift_StringProcessing.so]
0x0000000000000001 (NEEDED) Shared library: [libswiftGlibc.so]
0x0000000000000001 (NEEDED) Shared library: [libm.so.6]
0x0000000000000001 (NEEDED) Shared library: [libFoundation.so]
0x0000000000000001 (NEEDED) Shared library: [libswiftDispatch.so]
0x0000000000000001 (NEEDED) Shared library: [libdispatch.so]
0x0000000000000001 (NEEDED) Shared library: [libBlocksRuntime.so]
0x0000000000000001 (NEEDED) Shared library: [libswift_Differentiation.so]
0x0000000000000001 (NEEDED) Shared library: [libjemalloc.so.2]
0x0000000000000001 (NEEDED) Shared library: [libswift_Concurrency.so]
0x0000000000000001 (NEEDED) Shared library: [libswiftCore.so]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000c (INIT) 0x3b0d8
0x000000000000000d (FINI) 0x17c598
0x000000000000001a (FINI_ARRAY) 0x201ed0
0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
0x0000000000000019 (INIT_ARRAY) 0x201ed8
0x000000000000001b (INIT_ARRAYSZ) 16 (bytes)
0x000000000000001d (RUNPATH) Library runpath: [/usr/lib/swift/linux:$ORIGIN]
0x000000006ffffffb (FLAGS_1) Flags: PIE
0x000000006ffffff0 (VERSYM) 0x14440
0x000000006ffffffe (VERNEED) 0x14e4c
0x000000006fffffff (VERNEEDNUM) 2
0x0000000000000000 (NULL) 0x0
....
ubuntu@linux:~/x/package-benchmark-samples$ .build/release/MemoryOne
Debug result: [Benchmark.BenchmarkResult(metric: Memory (virtual peak), timeUnits: ms, measurements: 21503, warmupIterations: 3, thresholds: nil, percentiles: [Benchmark.BenchmarkResult.Percentile.p75: 3051, Benchmark.BenchmarkResult.Percentile.p100: 3592, Benchmark.BenchmarkResult.Percentile.p99: 3592, Benchmark.BenchmarkResult.Percentile.p90: 3592, Benchmark.BenchmarkResult.Percentile.p25: 1131, Benchmark.BenchmarkResult.Percentile.p0: 110, Benchmark.BenchmarkResult.Percentile.p50: 1836]), Benchmark.BenchmarkResult(metric: Memory (resident peak), timeUnits: ms, measurements: 21503, warmupIterations: 3, thresholds: nil, percentiles: [Benchmark.BenchmarkResult.Percentile.p75: 2613, Benchmark.BenchmarkResult.Percentile.p25: 886, Benchmark.BenchmarkResult.Percentile.p0: 22, Benchmark.BenchmarkResult.Percentile.p100: 3475, Benchmark.BenchmarkResult.Percentile.p50: 1749, Benchmark.BenchmarkResult.Percentile.p99: 3441, Benchmark.BenchmarkResult.Percentile.p90: 3131]), Benchmark.BenchmarkResult(metric: Memory (allocated), timeUnits: ms, measurements: 21503, warmupIterations: 3, thresholds: nil, percentiles: [Benchmark.BenchmarkResult.Percentile.p99: 3431, Benchmark.BenchmarkResult.Percentile.p75: 2603, Benchmark.BenchmarkResult.Percentile.p100: 3467, Benchmark.BenchmarkResult.Percentile.p50: 1739, Benchmark.BenchmarkResult.Percentile.p25: 875, Benchmark.BenchmarkResult.Percentile.p90: 3121, Benchmark.BenchmarkResult.Percentile.p0: 13]), Benchmark.BenchmarkResult(metric: Malloc / free Δ, timeUnits: μs, measurements: 21503, warmupIterations: 3, thresholds: nil, percentiles: [Benchmark.BenchmarkResult.Percentile.p75: 328, Benchmark.BenchmarkResult.Percentile.p90: 328, Benchmark.BenchmarkResult.Percentile.p99: 328, Benchmark.BenchmarkResult.Percentile.p25: 0, Benchmark.BenchmarkResult.Percentile.p100: 918, Benchmark.BenchmarkResult.Percentile.p50: 0, Benchmark.BenchmarkResult.Percentile.p0: 0]), Benchmark.BenchmarkResult(metric: Malloc (total), timeUnits: ns, measurements: 21503, warmupIterations: 3, thresholds: nil, percentiles: [Benchmark.BenchmarkResult.Percentile.p75: 1016, Benchmark.BenchmarkResult.Percentile.p90: 1016, Benchmark.BenchmarkResult.Percentile.p100: 1023, Benchmark.BenchmarkResult.Percentile.p0: 1016, Benchmark.BenchmarkResult.Percentile.p25: 1016, Benchmark.BenchmarkResult.Percentile.p99: 1016, Benchmark.BenchmarkResult.Percentile.p50: 1016]), Benchmark.BenchmarkResult(metric: Malloc (small), timeUnits: ns, measurements: 21503, warmupIterations: 3, thresholds: nil, percentiles: [Benchmark.BenchmarkResult.Percentile.p0: 1016, Benchmark.BenchmarkResult.Percentile.p90: 1016, Benchmark.BenchmarkResult.Percentile.p25: 1016, Benchmark.BenchmarkResult.Percentile.p100: 1023, Benchmark.BenchmarkResult.Percentile.p99: 1016, Benchmark.BenchmarkResult.Percentile.p50: 1016, Benchmark.BenchmarkResult.Percentile.p75: 1016]), Benchmark.BenchmarkResult(metric: Malloc (large), timeUnits: ns, measurements: 21503, warmupIterations: 3, thresholds: nil, percentiles: [Benchmark.BenchmarkResult.Percentile.p25: 0, Benchmark.BenchmarkResult.Percentile.p0: 0, Benchmark.BenchmarkResult.Percentile.p99: 0, Benchmark.BenchmarkResult.Percentile.p100: 0, Benchmark.BenchmarkResult.Percentile.p90: 0, Benchmark.BenchmarkResult.Percentile.p75: 0, Benchmark.BenchmarkResult.Percentile.p50: 0])]
ubuntu@linux:~/x/package-benchmark-samples$ uname -a
Linux linux 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:56:13 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
ubuntu@linux:~/x/package-benchmark-samples$
You shouldn't need to use ldconfig
manually at all, my best guess is that the way you've installed jemalloc is not putting it in the dynamic search paths of your system (on Ubuntu just installing with sudo apt-get install -y libjemalloc-dev
do the right thing).
Which distribution are you running on?
i am on the swiftlang/swift:nightly-5.8-amazonlinux2
docker image, either jemalloc installs itself to /usr/local/lib
where it shouldn’t, or the docker image just doesn’t have that directory in its search path, and it should.
i don’t think this is an issue with the package, the package could just use some better documentation for the amazon linux 2 container use case.
i’m building it from source, since the version that comes with yum
is too old for the plugin. i’m still using the same steps from before.
the histogram is showing for me again, so i have no idea why it was printing a debug description before. it probably had something to do with my benchmark.
Yes jemalloc-3.6 in EPEL7 is from 2015 and too old.
For the local install of jemalloc just do the following and then can run benchmarks fine:
echo /usr/local/lib > /etc/ld.so.conf.d/local_lib.conf && ldconfig
I\m happy to announce a major release of Benchmark which includes several new features, bug fixes - but most importantly, significantly improved documentation thanks to @heckj who's made a great set of contributions in this release, thanks a lot!
Major improvements are e.g.:
- Greatly expanded export formats (JMH, percentiles, raw samples, in tab-separated-values format for easy consumption by other tools, etc, and better analytic tools check out some samples on the landing page)
- Significantly improved documentation (soon updated on SwiftPackageIndex later today, make sure you get the 0.9.0 documentation or check out the repo and build it in Xcode with
Build Documentation
) - A new
check
command which can be used to check for regressions in CI with much improved and understandable output compared to 0.8.0. - Progress bars with ETA while running tests
- Regex filtering of benchmarks to run/skip
swift package benchmark help
- Streamlined CLI interface
Release details here (couple of small search-and-replace API breaks outlined that needs to be fixed):
Any feedback on API or output formats much appreciated as we want to wrap up a 1.0 reasonably soon.
Should be there in 0.9.0, thanks.
And a sample how a larger benchmark suite could look like: