Dependency Mirroring and Forking

This is a draft proposal to serve as a starting point for discussing dependency mirroring and forking. We invite the community to help flesh out concrete proposal(s) for the swift-evolution process!


Package Manager Dependency Mirroring and Forking

Introduction

This is a draft proposal for adding support for dependency mirroring and forking in SwiftPM.

Terminology

Mirror: A mirror is an alternate source location for a dependency which exactly mirrors the contents of the original source.

Fork: A fork is a package dependency which may have different content than the original dependency.

Vendoring: Vendoring refers to checking in source code of the entire dependency graph so you always have your dependencies.

Motivation

Mirroring:

Dependency mirroring is highly useful for several reasons:

  • Ensures that a dependency can be always fetched in case the original source is unavailable or even deleted.
  • Access to the original source location is slow or forbidden on the current network.
  • You want to validate or screen the upstream updates before making it available to use internally in a company.

Forking:

Most use-cases of using a fork boils down to one major case: using a modified version of a dependency in your package before those modifications are available in the original package. Example: You made a bug fix in Swift-NIO but the fix is not merged in the upstream repository yet and your application requires this fix in order to work correctly.

Vendoring:

Vendoring has some overlapping motivations with mirroring. It ensures that the dependencies are always available even if the original sources disappear. It allows building without network access, and it enables "archiving" a project so it can be built in future without worrying about the original dependency sources being available years from now.

Possible solutions

Dependency mirroring

A dependency mirror is supposed to exactly replicate the contents of the original dependency and given its use-cases, a user should be able to use the mirrors they have access to with any arbitary package. We can introduce a new "package mirror" file to store per-dependency mirroring information that SwiftPM can use as an additional input. The mirrors will work for both direct and transitive dependencies of a package. A separate file allows using a mirror config without modifying the package.

It would be nice to have some kind of validations to make sure that the content on the mirror is actually same as the original content. However, we don't need to tackle that problem initially and we can add the validations in future as an enhancement to the mirroring feature.

Example:

Consider a package graph with two packages: libCore and app where app depends on libCore. Resolving the app package currently looks something like this:

$ swift package resolve 
Fetching https://github.com/example/libCore.git
Resolving https://github.com/example/libCore.git @ 1.0.0

With support for package mirror files, we would be able to record a mirror for
libCore, perhaps with a mirror subcommand:

$ swift package mirror add \
    --package libCore \
    --url https://mygithub.com/myOrg/libCore.git \
    --mirror-file Package.mirror

Then, you could use the mirror file to resolve the dependencies:

$ swift package --mirror-file Package.mirror resolve 
Fetching https://mygithub.com/myOrg/libCore.git
Resolving https://mygithub.com/myOrg/libCore.git @ 1.0.0

We could also make the mirror file implicit if a file named Package.mirror is present in the root directory of the package.

This leaves an open question of what do we store in the Package.resolved file if a mirror file is used.

Fork Support

Things get a bit tricky with fork support. Forks may modify the contents of the original dependency and if you depend on the contents of the fork to build, the dependees of the package will also need to use that fork. However, propagating forks in downstream packages can be dangerous and it can create a new type of dependency hell where there could be two forks of a dependency in a package graph. Moreover, packages should always get the dependencies they asked or expected, and not some modified version from a different source location because one of the dependencies forked something.

For the reasons mentioned above, we should probably support declaring forks on only the root package. This does mean that if a library package publishes a version that use a fork, it would either fail to build when used as a dependency or its dependees will need to declare the forks as well. Note that forking a transitive dependency should be allowed.

Since forks are inherent part of how a package is configured and built, it makes sense to declare the fork information in the Package.swift manifest file, similar to branch or git hash based dependencies. Package.resolved is too fragile to hold this information as one might want to blow it away to get to a pristine state.

A strawman proposal for declaring forked dependencies:

let package = Package(
    name: "app",
    dependencies: [
        // App depend on the libCore dependency.
        .package(url: "https://github.com/example/libCore.git", from: "1.0.0"),

        // Override the dependency with our fork until an important patch is merged upstream.
        .fork(package: "libCore", url: "https://github.com/myName/libCore.git", .branch("CVE-5715")),
    ],
    ...
)

Vendoring

We're not planning to address the vendoring feature in this proposal but we're open to reconsideration based on feedback from the community. Vendoring has some overlap with problems mentioned in this proposal. It can be considered as a way to "archive" a project with all of its dependencies so it can be easily built without worrying about availability of the dependencies. It also solves the problem of fetching the dependencies when there is no or limited network access.

It is currently possible to technically vendor packages or create a "local" fork using the local dependencies feature, however, this would only work for root packages as SwiftPM will emit an error if the local dependency feature is used in a dependency referenced by a version.

We think mirroring and forking are the right choices to solve some of the problems that vendoring aims to address. This leaves the archiving use-case which can be considered as a separate feature in the future.

10 Likes

Hi Ankit:

In terms of scenarios for the use cases:

In this model, I can expect that there are modes where it would be desirable for the configuration of which Packages are mirrored to be applied globally to Swift Package Manager, rather than on a per-project basis via a Package.mirror file (what syntax would this file have? Should it be .json | .yaml | .swift?).

This is certainly the approach that NPM takes: you can use a .npmrc file to configure an alternative registry, or you can use specific locations for modules inside the package.json file.

I think these are some great ideas but I worry that there are higher priority items to address since all of these ideas presented are currently posssible with simple workarounds AFAIK. I feel that other issues like strict SemVer validation by analyzing source api changes would be more valuable to SPM. I apologize if this not the right place to bring up these concerns.

I’m very interested about forking and censoring as it’s something I think is very common and that many dependency managers don’t handle gracefully. SwiftPM already helps here thanks to the edit command, which makes it effortless to start making changes to a dependency. How do you see edit into that? What’s the ideal workflow?

Hey @Chris_Bailey!

The Package.mirror file is proposed to be package and user agnostic and it should be possible to apply it to any package by passing its path on the command-line. We have to be careful when building features that would globally modify package manager's behavior in an implicit manner. I do agree that it would be super convenient to have a implicit per-user mirror file in some cases, especially, when you're always expected to use a certain mirror config. Maybe we can have a setting that would apply a mirror file on a package for all operation after that point, similar to edit mode. Something like:

$ swift package set-config --use-mirror-file ~/package.mirror

The syntax of the mirror file should be an implementation detail as we will provide functionality to manipulate the file using commands. In practice, I expect that we will use JSON as it is easy to read/write on Darwin and Linux.

1 Like

Thanks @kdawgwilk. I believe there are several features that can be equally valuable for the package manager to add. Analyzing SemVer is definitely valuable. If you would like to propose that feature, feel free to create a draft proposal and start a new thread for discussion. However, I would prefer to keep discussion on this thread focused on the ideas being proposed.

Thats a great question! I think edit and fork will work great together, especially, once we get machine editing support for the manifest file. Once you're done making changes in an edited package, SwiftPM can prompt if you would like to add the edited package as a fork in your manifest file. I can imagine a workflow like this:

$ swift package edit Foo

> Make changes to Foo, test, push on your fork, create a PR, etc.

$ swift package unedit Foo
Add forked Foo[git@github.com:me/foo] at branch bugfix to the manifest? y

Thanks for this proposal @Aciid. I think it generally looks really good.

Like @Chris_Bailey, I do think that it may be useful to provide per-machine global configuration of package mirroring at some stage. This is commonly used in other package manager ecosystems to allow for caching of package installs. As an example, it's quite common to mirror the Python Package Index inside large enterprises to improve dependency resolution and install times. This is doubly-useful with mirroring from git repositories, as Github has expressed in the past that it does not like being used as a CDN, and so it would be enormously helpful for both public and private continuous integration services to provide mirrors for git repositories. This would reduce both traffic on GitHub and the regular experience of git hosting failures causing CI breakage.

Of course, if you were going down that road it would be ideal to have a way to say "mirror everything". Whether you want to do that is a question for SwiftPM of course, but such a flag would be very useful for automated build systems that want to improve build times by reducing the cost of bootstrapping a clean build environment.

1 Like

Right - being able to mirror everything is desirable because from an enterprise perspective I don't want to have to keep shipping new mirror files to my users when new packages are added to the internal git repositories.

My question about the format of the .mirror file is based on the fact that the suffix normally denotes the format (if nothing else so editors know how to open/format them).

Sorry if this opens a whole debate on naming :grinning:

Indeed.

In fact, there's another use case that I missed, but it's somewhat common in the Python community to run a partial mirror locally to your development machine using devpi. This gives two advantages:

  1. Faster dependency resolution and checkout for regularly-checked-out packages.
  2. The ability to work offline by using only locally mirrored packages.

Both of these are convenient nice-to-haves. In these cases it's much easier to manage the mirror if you can globally opt in to it.

1 Like

Thanks for the feedback @Chris_Bailey and @lukasa!

I think the "mirror everything" feature will make more sense once we have the package index feature. I expect people will create their own internal index which SwiftPM can use to lookup packages. Trying to do that with url based packages is a bit tricky as we'll need to provide some functionality to re-write the package urls based on some pattern, so I think we should continue with per-package mirrors for now. Once we do have the package index feature, we should be able to naturally extend the mirror config file to allow specifying the index url that will mirror everything.

From the feedback, it seems like the important points to note are:

  • There is more interest in the mirroring feature than the forking feature.
  • We should have a per-user global mirroring config that is implicitly applied to all packages.

I think we can discuss the name and format once we figure out the major design for this feature.

1 Like

I think it might be a good idea to have the mirror configuration be available on different levels, such as project, per-user and global, similar to npm configuration files. That would give users the necessary flexibility of configuring mirroring based on their particular environment.

I agree with @Aciid that "mirror everything" won't really be practical until there is a way to refer to packages by name instead of URL inside the package manifest.

3 Likes

For mirroring, I agree that different levels would be a good thing, provided there is some confidence that the repository that’s fetched is exactly the same regardless of what level the mirroring was defined at (this could be verified via hashes etc).

Fork support is different, I think: in that case it would be potentially dangerous to let a global or per-user override affect what’s specified in the package, since it can result in different code being compiled in.

Completely agreed, global forks are likely to be more troublesome than useful.

2 Likes