Internal imports and symbol visibility

We are using plugins that are compiled as dynamic libraries. Consequently they might have shared dependencies, let's say importing swift-nio or any other statically linked dependencies.

All public symbols for all dependencies are becoming public that might be not very great especially if different versions of libraries are used.

As an example I have:

// SharedStaticLib:
public final class PublicThing {
    public var value: Int = 0 
    public init() {}
}

// PluginA:
internal import SharedStaticLib

@_cdecl("plugin")
public func pluginA_probe() -> Int { 0 }

// PluginB:
internal import SharedStaticLib

@_cdecl("plugin")
public func pluginB_probe() -> Int { 0 }

// Loader: dlopen(path, RTLD_NOW) for both

Macos emits a warning about that:

objc[48563]: Class _TtC9SharedStaticLib11PublicThing is implemented in both <path>/libPluginA.dylib (0x100a780f8) and <path>/libPluginB.dylib (0x100a880f8). This may cause spurious casting failures and mysterious crashes. One of the duplicates must be removed or renamed.

Judging nm, symbols on linux are also exposed but no warning issued.

I tried to use internal/private imports but it doesn't help. Should they affect visibility?

I wonder if there is a good solution to avoid exposing internal symbols from internally imported libraries?

The warning you're seeing on Darwin is because all Swift classes (even those that aren't @objc) are registered with the Objective-C runtime. When the same class is linked into multiple dynamic libraries and those libraries are loaded into the same process, they collide. This is bad if any copies of the class are ever a different version.

I could have sworn there was an unsupported attribute to suppress the Obj-C class metadata but I can't seem to find it. Even if I did, I don't know if that would matter—duplicated pure Swift classes, protocol conformances, etc. loaded into the same process would probably still cause problems.

Realistically, you have two options here:

  • If you can control the build process for every plugin, don't statically link the shared class into each of them. Put the shared code in its own dylib and have each of the plugins load that, so there's only a single copy of the shared code in the process.
  • If the plugins are built at different times, potentially with different versions of the shared code, you'll need to use something like -module-alias to assign unique names to each occurrence of the shared dependency (something like renaming SharedStaticLib to PluginASharedStaticLib, etc.). That will guarantee that the ABI names of the shared code are all unique and won't collide in the loaded process.
2 Likes

Thank you for your reply!

Unfortunately, we do not control all the plugins being built, and there can be up to several tens of separate packages, both prebuilt by us and created by others.
While I agree this approach may work on a small scale, I’m afraid it would be quite difficult to organize properly in practice.

Although I originally used a class in the example, as you correctly noticed, that may be a bit wider issue affecting structs, actors, and protocols that share the same mangled name (even if they are unrelated).

Perhaps one of the properties that is supported by Objective-C (and C/C++) compilers is -fvisibility=hidden - that allow to hide symbols in library.

I wonder if there are any plans to evolve swift or/and spm in this direction?

Hiding the symbols from a linking perspective is something you can do with the right flags, but it is not enough to make this pattern safe and free from undefined behavior. The Swift runtime will still interact with the multiple copies of the same type, regardless of linkage visibility. @allevato's suggestions are the correct mitigations.

2 Likes

How attached are you to using dylibs for the plugins instead of making each plugin a separate executable, similar to compiler macro plugins? That would solve the problem since a plugin would be its own completely isolated image, and would have other security benefits (plugins can be independently sandboxes, a plugin can't muck around in the main process's address space or crash the app, etc).

Unfortunately very attached - we need lot of runtime machinery to provide the required live data to the plugins and doing an extra ipc hop in any form is a no go from a latency perspective. We want it to be as close as possible as if the client would have written the whole process in house from a performance pov as possible. (We may have hundreds or even thousands of plugin instances in a given process space and they share fairly significant caches of certain reference data too… so splitting that into processes would be a no go - they are currently modeled as actor instances)

Are you sure that factoring out the shared code into a dylib isn't an option? If you provide a stable ABI/API, you might be able to get away with something like this:

Let's say you have three components: HostApp (the app that loads the plugins), SharedCode (code used by the app and by the plugins), and SomePlugin (maybe multiples of these).

First, create SharedCode that has all the APIs you want accessible to your plugins. Build that with library evolution enabled so you get a .swiftinterface, and emit a .dylib for it.

Then, you need to get a .tbd file for libSharedCode.dylib. I tried adding -emit-tbd-path libSharedCode.tbd to the earlier swiftc invocation but that didn't seem to work the way I expected (it complained that the .tbd file was empty). I didn't investigate that any further, but tapi stubify worked to extract a .tbd file from the already built libSharedCode.dylib.

Once you have the .swiftinterface and the .tbd, that's what your plugin authors will use to compile and link against. SomePlugin should import SharedCode and have the .swiftinterface file in its module search path and the .tbd file in its linker search path. Use those to compile and link libSomePlugin.dylib.

Finally, HostApp will also import and link to SharedCode. When it dlopens the plugin dylibs, all the symbols needed in SharedCode by the plugin should be found since the host app has already loaded it. Then you can just grab the pointer to your cdecl function or whatever your entry point is and call it.

The resilient interface (and you promising to not break it) is the key to making it all work so you can guarantee that the plugins all use the right entry points when talking back to the dylib.

All of the above assumes macOS. I'm not sure how well this would work in a Linux environment where Swift doesn't promise ABI stability, and I don't know if Linux linkers offer anything similar to a .tbd file to link against. The lack of ABI stability means you might be in for a bad time if you expect users to be able to "bring their own .so files" to you, but I'm by no means a Linux expert.

1 Like

I think on Linux you might have to do as C++ does, and resort to the hourglass model. Write a shared library in Swift that exposes a stable C API, write an unstable Swift API that wraps the stable C API, use that Swift wrapper in your app/plugin code.

___________
\         /
 \ Swift /
  \     /
   \ C /
   /   \
  /     \
 / Swift \
/         \
-----------

I think it works to some extent and we are factoring out some small libraries that we can or have to due to real crashes.

One of such a specific cases is for distributed system.
The case was a bit more trickier.
We are loading 2 versions of the same plugin - they define 2 resolvable protocols which lead to crash in swift runtime due to picking up first protocol.
We workarounded that by factoring out resolvable protocols to different library. That implies a limitation that changes in distrubuted actors' protocol are not possible until restart of the HostApp but that is more or less okayish limitation.

I would like to show the full picture on how we use plugins:

  1. We have some pre-compiled (binary distributed plugins)
  2. We have some plugins that are supplied as sources code and have all the standard swift dependencies (such as nio, async collections etc)
  3. Several versions of plugins can be loaded in one HostApp
  4. That can be up to several tens of plugins in a small scale and up to several hundreds in large

Unfortunately, that would be near impossible to control all the dependencies to be aligned in every such setup.

Additionally, the problems start to become worse when the same version of plugins with updated dependencies (or even internal class) is used.

For example, let’s say we have a plugin with class v1:

private final class PrivateClass {

}

Then we update this class to use some protocols and or add new methods and load v2 plugin:


private final class PrivateClass: StableABILibraryProtocol {
    func foo() // implements StableABILibraryProtocol.foo
}

callStableABI(PrivateClass())

I guess that this might be an UB.

By the way - I also checked library evolution libraries.
Unfortunately, they also expose all symbols from internal imports.

Right, library evolution isn't going to change the linker behavior with regard to dependencies that are statically linked into multiple plugins.

To be honest, even if you stripped all symbols from your dynamic libraries except for the plugin entry point, you'd still have sections of data for things like type metadata and protocol conformances that would be found by the runtime when the image is loaded. I assume that would have the potential to cause problems if you did anything like dynamic casts (if A and B are linked into multiple plugins, does A as? B refer to the copy of A and/or B from this plugin or that plugin?).

3 Likes

That is true - if types have identical names - it will definitely be a problem.

Though there is a mechanism for private types. I think any private type is generated with unique name during compilation. I think I saw that as 'unknown context at $...' for some private structures in debug printouts.

For example, if I have two plugins:

/// Sources/PluginA/PluginA.swift
private final class HelperClass {}

/// Sources/PluginB/PluginB.swift
private final class HelperClass {}

This won’t lead to error because they will have different names, e.g.:

$ nm .build/arm64-apple-macosx/debug/libPluginB.dylib | grep Helper
00000000000009a8 t _$s7PluginB11HelperClass33_292520EA077904A28B86327696C46FBCLLCADycfC

$ nm .build/arm64-apple-macosx/debug/libPluginA.dylib | grep Helper
00000000000009a8 t _$s7PluginA11HelperClass33_FF7395C8000560BDCF6F0284CBB380ECLLCADycfC

Perhaps that made me think that imports would intuitively work identically and put the entire dependency module within a randomly (or consistently random) generated namespace.

I thought of that a bit and wonder if it is possible to use some prefix or postfix for all dependencies’ types, for example product name as a part of all statically linked symbols (similar to private definitions in files)?
It would largely help even without hiding all symbols.

@allevato I gave a try for module aliases.

That seems to be mostly what we can use instead of proper symbol hiding.
However, I would love to use it automatically to cover all dependencies.

When I tested that in Package.swift it worked good.

.product(name: "SharedLib", package: "SharedLib", moduleAliases: ["SharedLib": "SharedLibAliasPluginAV130"]),

However, when I tried to integrate it to build plugin the following way:

var buildParameters = PackageManager.BuildParameters(
    configuration: configuration.buildConfiguration,
    logging: .verbose
)
buildParameters.otherSwiftcFlags += ["-module-alias", "SharedLibAliasPluginAV130 = SharedLib"] // NB! here it is a different order of alias vs package.swift 

I immediately got an error:

  3 | import SharedLib
    |        `- error: cannot refer to module as 'SharedLib' because it has been aliased; use 'SharedLibAliasPluginAV130' instead

Maybe it is not possible to do that way though.

I think if we could use it from SPM plugins - we would be able to automate the solution.