ah, microbenchmarks ;) i do not see the boost fiber based implementation in there, nor do i see the reference to what hardware the test was run on. just to give you an example: "In later versions of iOS, A7- and A8-based systems expose 16-kilobyte pages to the 64-bit userspace backed by 4-kilobyte physical pages, while A9 systems expose 16-kilobyte pages backed by 16-kilobyte physical pages." here the relevant parameter is either 4K or 16K depending upon the hardware.
the closest to C i see i in there is this from "skynet.c":
/*
* Because of the way libmill schedules coroutines without
* the following yield() more of the coroutines which start
* additional coroutines will run before the ones which
* do not (size == 1). This would result in a lot of active
* coroutines each with a stack (default 256K although it
* can be changed to 16K without modifying libmill) and
* many mapped pages. Even with the yield you may find that
* you need to increase vm.max_map_count e.g.
*
* sudo sysctl -w vm.max_map_count=2000000
*/
obviously not something you'd want your users to do before then run your app.
microbenchamarks' results shall be taken with a grain of salt. if you are careful to not call OS, etc you can construct a sample that uses a minimal amount of stack space, say 4K on a platform that uses 4K physical pages. 4K * 1M = 4GB, so if you have that amount of RAM for the app that particular microbenchamark will run "ok".
calling newThread(userDefinedCallback, 32K) (let alone 4K) would be a minefield on iOS/macOS... don't you think? i mean, how can you be sure that userDefinedCallback itself uses less than 32K of stack? and doesn't make another call that allocates a lot on stack, now or in the future. that might be an OS call. an OS call whose stack requirement might be fine now but increase in the next OS update. or there's some OS interrupt that uses a non trivial amount of the current stack. it doesn't look safe or future proof. to play safe i'd say: "choose at least 100K and then pray".
updated: on the current 64bit desktop/mobile platforms (embedded platforms aside) it would be better to specify a more safe amount of memory for stack, say 1MB. the actual minimum physical amount of memory allocated would be different based on the page size of the platform, e.g. 4K on A8 and 16K on A9 - this is this number that will be the actual limiter and the RAM available affordable for stack use, say half of RAM which would give 1GB on iPhone 6S, hence 64K fibers hard limit there. before you commit to anything i'd recommend you to do your own testing on platforms you are going to support. and keep your eyes open to the recent trends: increasing physical page sizes and more languages introducing async await machinery instead of going fibers direction. you may also sidestep this whole great stackful vs stackless debate and consider a less generic event-driven approach (see Nginx design as an example).
update to update: you can do your own "fiber testing" without fibers: just malloc a bunch (say a million) of megabyte sized blocks (one malloc per each block or one giant malloc for all blocks) and write the first 1K in each block. i won't be surprised if half of current popular platforms/devices fail this test.
another update: the mentioned mini test app below. it runs fine in both modes on a 6 years old mac that presumably uses small physical page sizes. it fails very early on iPhone XS Max in "many small blocks" mode, at about 6800 of them, which would correspond to the maximum fiber count if these were real fibers; and it fails straight away in "one giant block" mode on this iPhone. would be interesting to see how it runs on a recent mac / iPad / iPhone hardware, that presumably uses a bigger physical page size but also has more RAM
'to fiber or not to fiber' test app
//-------------
// to_fiber_or_not_to_fiber.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define blockCount 1000000L
#define blockSize (1024*1024L)
#define blockUsedSize (1024L)
int to_fiber_or_not_to_fiber(int mode) {
printf("to_fiber_or_not_to_fiber started in mode: %s\n", mode ? "one giant block" : "many small blocks");
if (mode) { // one giant block
char* area = malloc(blockCount * blockSize);
if (!area) {
printf("malloc failure, exit\n");
return -1;
}
for (int i = 0; i < blockCount; i++) {
char* blockStart = &area[i * blockSize];
memset(blockStart, 1, blockUsedSize);
if ((i % (blockCount/10)) == 0) {
printf("done %.0f%% blocks so far\n", 100.0 * i / blockCount);
}
}
printf("done 100%%\n");
printf("deallocating one giant block\n");
free(area);
} else { // many small blocks
char** blocks = (char**)malloc(blockCount * sizeof(char*));
if (!blocks) {
printf("malloc failure, exit\n");
return -1;
}
for (long i = 0; i < blockCount; i++) {
blocks[i] = malloc(blockSize);
if (!blocks[i]) {
printf("malloc failure, exit\n");
return -1;
}
memset(blocks[i], 1, blockUsedSize);
if ((i % (blockCount/10)) == 0) {
printf("done %.0f%% blocks so far\n", 100.0 * i / blockCount);
}
}
printf("done 100%%\n");
printf("deallocating many small blocks\n");
for (long i = 0; i < blockCount; i++) {
free(blocks[i]);
}
free(blocks);
}
printf("to_fiber_or_not_to_fiber finished ok\n\n");
return 0;
}
//-------------
// to_fiber_or_not_to_fiber.h
int to_fiber_or_not_to_fiber(int mode);
//-------------
// Bridging-Header.h
#include "to_fiber_or_not_to_fiber.h"
//-------------
// testApp.swift
import SwiftUI
@main
struct testApp: App {
init() {
DispatchQueue.global().async {
to_fiber_or_not_to_fiber(0)
to_fiber_or_not_to_fiber(1)
}
}
var body: some Scene {
WindowGroup {
ContentView()
}
}
}
struct ContentView: View {
var body: some View {
Text("Hello, world!").padding()
}
}
trust me, i don't think that swift's async/await implementation is ideal. or any custom promise based implementation is. just the fiber way has its own gotchas and limitations and is hardly a clear winner if a winner at all.
i hope there's yet another better way to go that we just don't see yet!