[ SPM ] Multi Package Repositories

Sherlouk · December 20, 2020, 10:35pm

Hey there folks First time contributing to the evolution of Swift, so let's see how this goes

I hope this has enough detail, and follows the respected pattern from previous pitches. Please share as many thoughts and feelings as possible -- I'll also accept tips on how to write better pitches in the future!

Introduction

Many developers operate under a "mono repo" design pattern where code for multiple packages or projects exists under one repository.

Swift Package Manager currently assumes that each repository only contains one package, this may not always be true - this proposal aims to add support for repositories which contain multiple packages.

Context

When I refer to a "package definition" I'm talking about the Package.swift file.

In this pitch I'll reference Firebase a couple times, especially in examples. This is purely as this is a relatively widely known, public, package demonstrating where this proposal might be useful.

While I am basing this proposal on my own thoughts and findings, I want to highlight @ddunbar's concept from 4 years ago(!) which is along the same lines and some components have been brought across.

Motivation

Larger projects (such as Firebase), teams or organisations might wish to have multiple packages coexist within a single repository.

Adoption of Swift Package Manager within these projects requires some sort of ability to not only define multiple projects within a repository but also not have them tied to the root directory of the repository.

Workarounds

Currently developers wishing to have multiple packages within a single repository have to adopt a single package with many products.

The exact solution may vary (examples below) but they all suffer from extra noise, complexity and require a higher level of understanding.

Massive Package Definition

Firebase's definition currently has 16 products, 40 targets, 14 test targets, and a lot of configuration. It's daunting, hard to read, lengthy, and ultimately attempts to wrap multiple SDKs all under one umbrella.

The repository has all of the different products in separate directories, frequently with their own test suite.

This repository would be a key candidate for some sort of separation allowing for each product to be defined in a separate file. This would improve readability and management.

Automatically Generated

Personally, I have created a CLI tool which automatically generates a package definition for the root of the project based on a list of packages within the repository.

This allows each product to have its own definition, as is desirable - and allows for easy referencing of other packages within the same repository. But for external clients, the automatically generated Package.swift enables them to use the subpackages.

This is problematic though as it introduces more complexity through an extra dependency, more work to keep it up-to-date with new updates of Swift Package Manager and still produces noise within the repository.

Proposal

This proposal would introduce the idea of "subpackages", this is a package within a repository which is addressed by its path relative to the root.

Every subpackage will be required to have a unique name defined in the package definition. This will be name which is used by users of the package within their dependency definitions.

When performing dependency resolution on a package graph, we will unify all of the requirements on any individual repository across all requested subpackages, to ensure only a single revision of the multi package repository is required.

Detailed Design

Folder Structure

A repository may choose to have a Package.swift at the root level (referred to simply as a "package") and/or one or more Package.swift files within directories (referred to as a "subpackage").

Package.swift // A "package"
Foo/Package.swift // A "subpackage"
Bar/Package.swift // A "subpackage"

Dependency Declaration

(Examples may be simplified for demonstration purposes)

let package = Package(
    name: "Foo",
    dependencies: [
        .package(subpackage: "Bar"), // (1)
        .package(url: "https://github.com/example/one", subpackage: "Baz", .branch("master")), // (2)
        .package(url: "https://github.com/example/two", subpackages: [ "Baz", "path/to/Qux" ], .branch("master")), // (3)
    ]
)

A developer may choose to add a dependency on a local subpackage. The path would be relative to the package definitions directory.
The most common route will be adding a remote dependency with a subpackage path. This will be a path relative to the repository root.
Very similary to number two but allows the user to provide an array of subpackages. This may be preferred as it enables the reference to and loading of multiple subpackages with one version ultimately simplifying updates (one place to update).

We will need to add support for scanning a package definition for multiple dependencies within the same repository. It's plausible for the same repository to be listed multiple times (with different package/subpackages) but we would need to ensure we only download it once -- and ensure every definition uses the same version.

Target Inclusion

Once a dependency has been defined, it needs to be used be used by a target. This will be done slightly different:

let package = Package(
    // ...
    targets: [
        .target(
            name: "Foo", 
            dependencies: [
                "Bar", // (1)
                .product(name: "Bar", subpackage: "path/to/Bar"), // (2)
            ]
    ]
)

Where it's clear and not ambiguous, a target may be able to define simply by the name of the subpackage (as referenced to in the dependencies array) by it's package name.
Where it's not clear or ambiguous, the user may need to be more explicitly reference the name of the package and the path to it.

Impact on Existing Packages

This change would not immediately break any packages and is entirely additive.

Developers can, at their own pace, choose to move their package definitions to wherever makes sense for their repository - at that point in time it would be deemed a breaking change and as such users would need to update their dependency to reference the new path.

Alternatives Considered

Import Statement

This alternative would introduce a new API to the 'products' array within the existing package definition. It would allow the Package.swift file to "import" another Swift Package at a given relative local path.

This would address the motivation for this pitch by allowing the package definition in the root of the project to simply import packages from around the repository and to not define itself any specific or extra targets - reducing the noise.

This approach has a key benefit in that it does not alter the installation method of any Swift Package by any users, instead relying solely on a change by the package author. This further reduces the impact on the change as it would no longer be a breaking change.

Below is an example of the Package.swift which would sit in the root of the repository.

// swift-tools-version:5.3
// The swift-tools-version declares the minimum version of Swift required to build this package.
import PackageDescription

let package = Package(
    name: "Firebase",
    products: [
        .import(path: "Crashlytics"), // imports Crashlytics/Package.swift
        .import(path: "path/to/Analytics"), // imports path/to/Analytics/Package.swift
    ]
)

Importantly though there is no reason why a package definition not at the root level should not also be able to import another package.

This is to say that imports can be chained. An error would be thrown if you attempt to create a circular reference where one package definitions reference another, and that package references the original.

I believe while this is a viable option and has its own list of benefits, it does also introduce some of its own complexity as mentioned above. I also don't think it fundamentally addresses the 1:1 relationship between package and repository -- purely focussing on reducing the noise and complexity of having a single, huge, package definition. It also requires that all packages still have a Package.swift at the root of the repository - this may be desirable for some but I believe still contributes to noise which could be avoided.

There is however, no reason why this couldn't be implemented as well as the original proposal giving developers and users some options. Though this would probably be best to follow up in a future proposal.

Conclusion

I have not at this time been able to successfully implement the proposal within the Swift Package Manager project but I do not have any reason to believe it wouldn't be possible. I'm definitely willing to give it a go but would likely need some direction and support with it

I hope that I've at least provided justification for why I feel a change of this nature is required and hope we can work together to move this pitch to an official proposal -- and get it implemented!

Thanks for reading, and I look forward to discussing!

SDGGiesbrecht · December 20, 2020, 11:54pm

Does that mean that (a) you tried and failed, (b) you started and ran out of time, (c) you haven’t tried yet, or (d) something else.

I ask because SE‐0226 turned out to be much more complicated than it looked like on the surface. It went unimplemented for several years. (I’m an outsider, so you would have to ask the internal team why.) Then it was implemented piecemeal by several different developers over the course of several months as individual aspects of the puzzle were solved. The final piece I submitted involved over a month of work and saw four different implementation strategies discarded when they proved to actually be impossible after a week of fleshing out. I wasn’t even certain the fifth idea was going to be possible until a few minutes before I submitted the pull request when it was finally complete. And to this day there remain unsolved bugs that prevent its release.

My intuition tells me this pitch will compound the complexity of the resolution process in comparable ways, but in yet a new dimension. I believe that it is possible, but I cannot with confidence say that it is possible without rewriting the resolver from scratch (though I am hopeful). So I would advise implementing at least the part that integrates into the resolver to prove it is possible first. It is probably not helpful to discuss the manifest API until that has been done, because it would become moot if it turns out too difficult to implement in the end.

I hope that the experience gained from SE‐0226 transfers, so that this time around it is less painful. I don’t have time to directly help with the implementation, but I can give you some pointers and provide feedback that should help you avoid retreading past mistakes.

Edit:

But that is not to say higher‐level discussion shouldn’t happen first, such as:

How badly do we want this?
Are there other motivations out there?
Which motivations are the most important?
What is the essential difference between a subpackage and a product?
Do we want to be able to use two subpackages at different versions? Would that be a feature or a vulnerability?

These sorts of things could affect the implementation strategy chosen, and so are worth evaluating up front.

Sherlouk · December 21, 2020, 12:18am

Thanks for the thoughtful response, Jeremy!

I admit that, at this time, I've simply not attempted to implement the proposal. It's a little bit daunting as I definitely expected it to be difficult (but program by the premise 'it's always possible, just might take some time')!

I'll definitely take more time to look into SE-0226 and familiarise myself with the troubles which were seen there. With a quick scan I can see how there could potentially be some overlap or attempting to solve some similar problems.

At a minimum I appreciate your thoughts on where investigations should start though - I'll do more due diligence in looking at that area of the SPM codebase.

How badly do we want this?

I do believe there is a wide audience for this sort of change, but this is obviously a gut feel and difficult to quantify. I know from an organisation perspective, we have a healthy amount of teams working within monorepo who would love this sort of separation.

Having said that I have detailed some 'workarounds' which do function and provide this capability - so it's not like it's enabling something which is impossible.

Are there other motivations out there?
Which motivations are the most important?

For me (personally) it's really about enabling flexibility which drives down complexity (from the developer and user's point of view) and increases readability, and maintainability.

As we continue to drive the sharing of code around the organisation it's important to us that we can continue to live with the benefits of a monorepo while continuing to enable external projects to benefit from existing code. (without the overhead of custom scripts to workaround the lack of the feature!)

What is the essential difference between a subpackage and a product?

I'd probably say a subpackage is a product. Just in a different location. I'm happy to clarify or change terminology if it becomes confusing -- fundamentally this proposal defines a custom path relative to the root of the repository where the Package.swift lives.

Do we want to be able to use two subpackages at different versions? Would that be a feature or a vulnerability?

Longer term I can see this being highly beneficial but I think this is out of scope for the initial proposal in order to keep the complexity down. Definitely worth thinking about and keeping in mind - but, in my opinion, not something to get highly distracted about.

Happy to reconsider if you think otherwise though!

SDGGiesbrecht · December 21, 2020, 12:38am

The first three questions were mostly directed at other users who come to this thread, to see if they have things to add that haven’t been noted yet. You did a good job explaining your viewpoint in the pitch. I’m sorry if they came across as a prompt for improvement.

This leads to another question: If they are fundamentally the same, does the client need to see the difference, or should it just be an implementation detail of the package using the feature?

Whether it keeps the complexity down depends on the answer to the previous question.

If a subpackage is essentially a special kind of package (and is plumbed through most of the same implementation), then it would be more work to teach the resolver to constrain their versions.
If a subpackage is essentially a special kind of product (and is plumbed through most of the same implementation), then it would be more work to teach the resolver to exceptionally allow splitting versions.
If a subpackage is fundamentally different from either of them... then I don’t really know.

saeta · December 21, 2020, 5:15am

Quick question: how would versioning of packages work? (Context: git tags are used for versioning releases; git tags are global to a repository.) Thanks for thinking about these proposals!

Sherlouk · December 21, 2020, 10:42am

I suppose this depends on the exact implementation - as per the alternatives I raised there is an approach that could be taken which would give us many of the benefits without the user/client every really being able to tell the difference.

I do believe (and it leads onto the next question) that with the main proposal here, the resolver needs to understand that the Package.swift is not at the root of the project and needs to know where to look, with the way I have designed this implementation, that information comes from the user/client.

I believe it would go down most of the same implementation. My extremely naive mental map (based on no experience of the actual code) would have a new parameter (array of subpackage paths) routed through the resolver and where we load root/Package.swift it would instead use the new array (reverting to existing logic where it's nil/empty). Would need to change some parts of the output side to support an array of exports.

I believe the version would be locked and the same for all subpackages within the same repository. If you have 5 packages from repo "Foo", then they'd all have to be the same version.

Depending on the complexity (as Jeremy mentions above) it might be possible for the resolver to work with different versions for subpackages though. This would essentially be the case of downloading the same repo for every subpackage you download at a different version (so a bit of extra logic to first find all of the same repos, and then group by version).

Does that make sense?

mattt · December 21, 2020, 11:40am

SE-0292, which is currently in review, proposes an alternative solution to the problem of monorepos, by allowing maintainers to specify a path when publishing package releases to a registry. Quoting from the specification:

4.5.5. Package location

A client MAY use path and url parameters together to publish a single package located in a subdirectory of a repository or to publish multiple packages from different paths in a single repository.

If a client specifies a path parameter, a server SHOULD look for a Package.swift file at that location. Otherwise, it SHOULD look at the root directory. If a Package.swift file doesn't exist at that location, a server SHOULD respond with a status code of 404 (Not Found).

Would this be a reasonable solution to the problems you've identified here?

Sherlouk · December 21, 2020, 12:11pm

That's really interesting, thanks for sharing it @mattt. I did a search but didn't think this sort of change would be deep inside of the package registry specification!

If I'm reading this correctly, it would allow for a path to be optionally specified but only when publishing it on a registry? To continue the common example in my pitch, that would mean Firebase would only be downloadable through a registry which has context about its path?

Is there any plan (in this or any other registry related proposal) to support that capability directly through the Package.swift package definition without using a registry?

Furthermore it's unclear to me when looking at the examples of what is and isn't included in the proposal how you would reference a package with a path. It gives an example where you simply use the repository's URL, I assume that this would then query the registry which knows about the path. This doesn't demonstrate the ability for a single repository to have multiple packages though, as how would you know which package in the repository you'd be looking for?

With my initial understanding it seems to resolve one part of this proposal which is allowing packages to live at a non-root location, but not supporting multiple packages in the same repository. Please advise if I've completely misunderstood the proposal though - apologies if so.

mattt · December 21, 2020, 12:43pm

That'd be one option, yes. Another would be for Firebase to provide a pre-built .xcframework binary, as described in Apple's documentation.

Not to my knowledge, no.

That all depends on whether GitHub (or other code hosts) adds support for this. For repositories that contain a single Swift package at its root directory, it's expected that a code host with registry support would redirect to the corresponding registry endpoint for that package via HTTP content negotiation.

.package(url: "https://github.com/mona/umbrella", from: "1.0.0") 
// Redirects to https://swift.pkg.github.com/mona/umbrella

A host might also add support for subpaths to be routed to packages within that project — it all depends on how the host responds to the initial HEAD request.

.package(url: "https://github.com/mona/umbrella/part", from: "1.0.0") 
// Redirects to https://swift.pkg.github.com/mona/umbrella/part

Otherwise, you could still declare a dependency on the package directly from the registry URL.

.package(url: "https://swift.pkg.github.com/mona/umbrella/part", from: "1.0.0")
// No redirects

Sherlouk · December 21, 2020, 12:54pm

Thanks for those extra examples, that's clarified a huge amount for me.

Really I think this is the pivotal part then.

If GitHub (et al) get on board and adopt this early, then this would immediately resolve many of mine, and the people who I've spoken to's, motivation for such a change as this pitch represents.

With this new information, I'm going to take some time to read the specification, proposal and other docs to fully understand what's happening and if this completely negates the need for this pitch or whether there are still some voids that could do with filling. I am suddenly a lot more hopeful though!

Thank you once again @mattt.

Sherlouk · December 21, 2020, 8:26pm

So at its core this pitch aimed to allow package creators to supply multiple packages from one repository. Fundamentally, SE-0292 gives us this capability.

The biggest question mark is simply "what is the adoption of Package Registry's going to be like?" and none of us have that sort of magic 8 ball.

As such I don't think I'd push hard on this pitch until we have a better picture (after it's been implemented) of it's adoption.

I do believe there is there a delta between the two pitches, the required use of a registry is the most obvious (and potentially disruptive) - but this may or may not be a problem based on the above question.

Another quick thanks to Mattt for bringing this quiet part of the registry proposal to my attention!

tl;dr - Pitch can be closed until we see the results of SE-0292, in my opinion.

Anthony_Miller · February 26, 2021, 11:53pm

While Package Registries do provide a possible solution to this, I would really like to see a concept of sub-packages without the need for registry usages. It would be great to be able to provide a path on any package dependency.

.package(name: "MyLib-SubPackage1", 
         url:"https://github.com/me/lib.git",
         path: "SubPackage1`)