Renaming packages?

mpilman · April 18, 2023, 3:55pm

(I am not 100% sure whether this is the right place to have this discussion, if not please point me to the right direction).

So this is a feature I always wanted to have in other languages but I don't know any that solve this problem for me. In fact I think something like a rename package feature would solve two problems: an obvious one and a less obvious, but arguably more important (or at least cooler) one.

Say I want to use Swift-NIO and I want to use the newest version. So I add a dependency to Swift-NIO 2.50.0. But I also want to use some other library, let's call it foo, which depends on Swift-NIO 2.20. Or even worse on NIO 1.0. What do I do? The solution I would propose is for me to allow renaming either Swift-NIO or foo (and transitively its dependencies) to something else. Basically this would allow me to pull in multiple versions of the same code.

This kind of namespacing would allow me to resolve hard to resolve dependencies. So this is the obvious use-case. It would be even cooler if I could now also depend on multiple versions of the swift standard library and other fundamentals (like Foundation). One reason I'd like to do this is testability: I could just use the same version of all libraries my dependencies were tested with.

Now the more cool one (and this is a real-world problem we never really managed to solve in FoundationDB since C/C++ don't really provide a solution for this). Let's take FoundationDB as an example (and let's pretend FDB is written in Swift):

FoundationDB uses deterministic simulation to test all kind of real-world scenarios within a single process. One thing though that is really freaking hard to test is upgrades and downgrades. Basically we would like to run different versions of the same code in the same process and verify that they work together.

We believe that testing this in a real cluster is inadequate: failures are very hard to debug, tests are very expensive to run (as they require more resources than what FDB can do in-process), testing becomes an operational nightmare etc. FoundationDB works around this using two strategies:

All processes within a cluster have to run at the same version. So when upgrading a cluster all processes have to be upgraded at the same time.
For things that require disk format change we manually do what I described above. If you look at FDB code you can find that the transaction log server is checked in at versions 4.6, 6.0, and 6.2. Since these are versions where me made significant changes that require on-disk changes with upgrade and downgrade capabilities.

Both of the above come with huge problems. (1) is an operational issue and will eventually cause a scalability limitation. (2) is a maintenance nightmare and makes the code ugly and hard to understand.

We did consider using dlopen to load multiple versions of the same thing into one process, but so far we couldn't get this working.

So I am wondering whether this is something the language could solve (or at least help solving). If I have a project Bar I could simply import it's target as of a previous release and then build testing around it. For example, I could run old code against a new instance of my database. I could test that my network protocols are upwards and downwards compatible. And I could do all these things in simple unit tests.

Maybe there's also other solutions for the two problems above (and it doesn't need to be one) but I really would love to have some features either in the language or the build system that would allow me to solve these issues.

ktoso · April 20, 2023, 6:52am

Hi Markus!
Evolution is technically for proposing language direction and changes, but since this is borderline that I think this category will be fine for discussion

Hah yeah testing version upgrades can be rough, we've suffered through the same in my past life with Akka and didn't find great solutions (we'd test manually some cases, and document "how to" and call it a day). In persistent format compatibility cases we'd store "old" binary data raw in tests and make sure we can keep reading it and similar tricks etc, but we've never had great tests for big version rollouts either. And of course there are also behavioral changes that are another beast one would like to be testing like this.

~~

To the point though: I think we actually may have a shot at making this possible in Swift, with slightly extending two features that could be used to enable this, these features are:

module aliasing for disambiguation - for "import module under different name"
distributed actors - if we'd want to intercept calls between "nodes" in the same process, and handle them using one or the other "version" of an actor

Both are missing some functionality to achieve what you're truly after here, but not that much.

~~

1) The module aliasing allows us to depend on a module with an aliased name -- which renames the entire module, and all of its symbols (since mangling includes module name) under a different name -- so e.g. if two projects include "Utils" normally you'd have a naming conflict if depending on both you'd get a naming clash, but with this feature we can

     .product(name: "Game", package: "swift-game", moduleAliases: ["Utils": "GameUtils"]),

which allows an existing module "Utils" to be used as GameUtils and all the symbols in the entire binary are changed to "GameUtils" for it, even inside the game-package.

So that's almost enough to handle the multiple versions of the same package: the same mechanism would be used to resolve the conflicts and we could end up getting Lib_2_20_4 and Lib_2_30_0 in the same process without clashing.

What is not solved today though is resolving the dependencies such that this would just automatically work:

// TODO: we'd need some way to express this "same lib in diff version here"
dependencies: [
  .product(name: "Alpha", package: "AlphaLib", moduleAliases: ["Alpha": "AlphaOne"]),
  .product(name: "Alpha", package: "AlphaLib", moduleAliases: ["Alpha": "AlphaTwo"]),
]


'lemma': ignoring duplicate product 'Alpha' from package 'alpha'
error: multiple aliases: ['AlphaOne', 'AlphaTwo'] found for target 'Alpha' in product 'Alpha' from package 'Alpha'

TODO: We'd need to improve dependency resolution to allow resolving this, and then apply the aliasing to it. This may be tricky, but seems possible to do -- given that all the aliasing work exists and works. There were other groups which were interested in the "multi version" solution as well, so perhaps there'd be enough of a case to drive it.

This would be enough to have simple tests which "test calling Lib2 with some outputs of Lib1" is compatible, but doesn't give us the "entire cluster simulation" since we still would need to somehow "weave through which lib to call where".

~~

This is where 2) is has an interesting potential we could utilize.

A distributed actor effectively is a proxy to call "some method, somewhere" and we're very free to do whatever we want with the intercepted calls.

Obtaining a distributed actor reference is always done via MyActor.resolve(id: ..., using: someSystem) where the ID basically is an "endpoint" or "address". In your case, as many others, this address would have information on which node we're looking for the actor to be resolved or created...

You could imagine an implementation that inspects the ID (which are arbitrary types, and can have extra metadata (!)), for a hint what module it should use an actor from; E.g. Echo.resolve(id: ID("127.0.0.1:7337", uid: .wellKnownEcho).replaceModule("Lib_20")) could allow us to pull some tricks and use the Lib_20.Echo rather than the Lib.Echo actor that the code was invoked on... We could return a "remote reference" that actually just proxies to an Lib_20.Echo implementation rather than actually goes over network.

To be a bit more clear about the substituting the resolves idea -- you totally could have a dictionary in the actor system that says "127.0.0.1:7337": .replaceResolveModule("Lib", "Lib_20") and you'd just set that up on the actor system (ActorSystem is basically like your sim2 instance -- a shared instance in this scenario) before kicking off the tests. So it doesn't even really have to be in the ID, as long as the actor system knows what to look up we could make it work I think.

That'll again need some extra capabilities so that we're able to executeDistributedTarget with a different mangled name -- but that's a) something I wanted to get to anyway, to get stable identifiers for RPC endpoints rather than rely on mangled names, and b) perhaps not even necessary, we could do this with some tricks outside the language perhaps already...

~~

So that's the idea I'd have for this. It feels like we have the actually hard building blocks in the language, but the dependency resolution is tricky and would have to be solved (or done manually). The distributed actor trickery I'm actually optimistic we could pull off, I'd be willing to give it a PoC shot once we're a bit less busy after WWDC

I should probably mention thought that 1) would require that everything is implemented in Swift, as only Swift modules get this symbol renaming treatment -- but since we're talking in terms of "what if", it's still worthwhile exploring.

Let me know what you think and if any of that made sense or we should dive deeper into these!

PS: It is lovely hearing about all those excellent use-cases pushing the language where no language has really gone and seeing how we can solve those use-cases in an elegant manner. I really think we can do something great for distributed systems here, and use-cases like yours will help us get there!

mpilman · April 20, 2023, 3:56pm

Thanks for the detailed answer! I looked at swift-evolution but somehow missed the module alias. That's already a great step into the right direction.

I wonder whether for (1) Package.Dependency would also need an alias feature? Since effectively this is what I would want to do:

let package = Package(
  ...
  dependencies: [
    .package(url: "https://github.com/alpha/alpha.git", exact: "1.0.0", packageAlias: "Alpha1")
    .package(url: "https://github.com/alpha/alpha.git", exact: "2.0.0", packageAlias: "Alpha2")
    ...
  ],
  ...
  targets: [
    ...
    .target(
      ...
      dependencies: [
        .product(name: "Alpha", package: "Alpha1", moduleAlias: ["Alpha": "AlphaOne"]),
        .product(name: "Alpha", package: "Alpha2", moduleAlias: ["Alpha": "AlphaTwo"]),
        ...

Optimally this would also namespace the dependencies, so that Alpha1 could depend on an older swift-nio version (for example).

Additionally (and this might be a tangent) it would be super cool if somewhere in the future we could even have different versions of libc. I know this is almost impossible to do with glibc (I wasted way too much time fighting glibc already) but maybe when llvm's libc matures this could be possible.

Using distributed actors to simulate clusters within a process would be really cool! If I understand them correctly, most of the fdb simulator could be implemented using distributed actors today. I think even the simulated machine concept and failure injection could be built nicely in Swift. The only thing that will be very hard to achieve today is determinism (but that's a bit off topic and I think the problems there will be solved in Swift 6 timeline).

Ah that would be super cool! This would allow us to simulate different machines at different versions of the code!

ktoso · April 26, 2023, 3:52am

Sorry for the delayed response! I also have the other thread you started to get to -- I will shortly!

That sounds like a reasonable direction to explore.

I don't think swift will be any luckier/easier with these problems, it ends up being libc problems after all. I don't have enough experience with these to comment more but it sure sounds like a hell of a problem to try to tackle.

Yeah I think that's right. They provide enough hooks for remote calls to inject failures and delays etc.

Running "a few nodes in the same process" is basically what all of our cluster tests do (except those using multi-node tests which split it into processes, and could be extended to move to multiple nodes without changing the test code at all but I didn't get to it yet).

I'll look into what we'd need to do in distributed actors if we had the "Lib1 and Lib2" versions of some code; I suspect it has some overlap with what is necessary to allow resolve() on a protocol declaration like this: protocol A: DistributedActor { distributed func x() } which is something we're interested in allowing anyway.