Proposal to enable SPM to use a local repository mirror

(Sorry for double posting. At first, I wasn't sure where to put my question, but now I do.)

Just wondering... When developing packages whose dependencies are large in size or quantity, or where there are nested dependencies, what would be the work-flow (or Package.swift updates) to allow retrieving dependencies from local storage (or internal server)?
Using github.com based repositories could lead to a rather large load on both github and local network if developers are cleaning/building often.
This issue is further compounded by the fact that revision tags need to be jigged to give the right results considering the Package dependency requirements in the graph. Seems like working with a local mirror is what you might call your worst nightmare.

Cheers,
Nap

Hi Napoleon,

One of the features that's been discussed in the past is "fork support", for easily overriding the canonical URL for a package with a URL you want to replace it with everywhere it's referenced in the dependency graph. You might want to do this to use a local mirror, or because you actually want to fork a package and plug the fork into your package graph.

The mirror case sounds relatively straightforward, but to support true forks we'll also have to figure out how to map version tags from the canonical repository to version tags in your fork (which will have a different hash, if you've made any changes vs origin).

No one's put forth a proposal for this yet, but this is on my list of features I'll eventually plan to propose if no one in the community does so first!

Thanks Rick!

It appears to me that when the SPM runs, it actually performs a fork operation first (repositories folder) which is followed up with a clone operation (checkouts folder). If the SPM had mod_rewrite like functionality (Apache HTTPD), the URLs could be updated on-the-fly without having to change the package's Package.swift, nor the repository's history. The 'mod_rewrite' extension would, of course, need to be able to handle protocols, domains, paths, and users, and enable a means of authentication (I tend to use Public Key) where the local repository server requires such.

Thus, the SPM (swiftenv) configuration would be setup to specify how to perform the URLs alterations (regex, mappings, or otherwise) (1). The SPM, during the fork phase, would initially check the local mirror to see if the package repository exists locally. If not, then the package would be retrieved from its original source and placed directly into the location determined by the rewritten URL. The git hook: pre-receive (2) trigger handler would be used to create the repository .git folder in the correct location and call git init --bare --shared (or whatever is appropriate) within it.
The clone would then be performed using the rewritten URL (again on-the-fly) and the checked out version placed into usual place inside the .build folder. If the correct checkout cannot be resolved in the cloned repository, then either an error would be issued, or some other appropriate action taken (maybe the package could be retrieved from its original location and replace the locally stored URL rewritten version).

Because the contents of the locally checked out repository will not have changed from that of the pristine original, the local package repository would still track the authentic origin, thus SPM could have a feature that enables it to indicate if there have been noteworthy changes upstream.

If the developer wishes to make changes to a dependency, then they would need to consider whether to perform a real fork (and think of how it would be distributed) or issue a pull request. I believe, on principle that, once forked, you have a new product, and usage of the new product in place of the original is a risk.

I don't know anything about SPM's design, nor whether it is reliant on libgit2, so I can't say how practical or feasible the above procedure would be. Also, I'm not sure how this suggestion would fare in scenarios where build scripts run within the package and access external resources. Nevertheless, perhaps this thread could be treated as an initial brainstorming on how to resolve this problem.

Cheers,
Nap.

(1) This could be done on a per project basis or globally for the user (perhaps similar to how git manages configuration settings).
(2) Git hook: pre-receive itself cannot be used for this, unfortunately, because it isn't triggered when a repository does not already exist. Personally I think this trigger should not depend on the existence of a repository, and that a feature request to the libgit2 team should be made to provide it. I feel this would almost be the ideal mechanism, and would require the least amount of work all round to implement..

I think some users are using git's insteadOf configuration to work around the lack of fork support in SwiftPM. I think it is somewhat similar to the mod_rewrite functionality you mentioned.

Ok, I've done it using some hacked up code that only handles the specific cases I am testing here.
Basically. knowing all the recursively dependent repos your project uses:

Create/prepare the empty repositories on the server (local mirror):

for repo in $repos; do
       mkdir -p /git/${repo}.git
       cd /git/${repo}.git
       git init --bare --shared      (or whatever parameters are necessary)
done

On the developer's machine, clone from the source, change the upstream url, then push to it

cd /tempDir
for repo in $repos; do
    git clone https://github.com/<repoOwner>/${repo}.git
    cd $repo
    git remote set-url origin ssh://git@pi.local/git/${repo}.git  (I'm using ssh but this could be any URL you need, and you may need to authenticate)
    git push
    git push --tags      (pushing the tags after the initial push seems to work better
                         because in some cases the repo gets corrupted if you push the tags
                         with the initial push)
    cd ..
done

Then, in the SPM code (note I'm referring to the swift-4.0.3-RELEASE tag here) in Sources/SourceControl/Repository.swift after line 19, add the test cases and the rewrites:

   /// Create a specifier.
public init(url: String) {  _____________________ <= line 19
    if url.hasPrefix("https://github.com/rhx/") {
        self.url = "ssh://git@pi.local/git/" + url.suffix(url.count-23)
    } else if url.hasPrefix("https://github.com/johnsundell/") {
            self.url = "ssh://git@pi.local/git/" + url.suffix(url.count-31)
    } else {
        self.url = url
    }
}

Now, clearly the above hack can be greatly improved by making the init method use some more sophisticated pattern matching, but I'm not that well versed in Swift nor the SPM to provide the code that implements such functionality.

This will work recursively for all the dependencies of your project! The nice thing here is that there are no git history (hash) related problems!

The server-side work would ideally be handled using an improved version of the pre-receive trigger, but until such time as the git guys agree to implement this, it can be achieved using ssh commands.

The only issue this suggestion doesn't cover is the case where custom build scripts clone other repositories that aren't specified in the Package.swift dependencies. I have such a script in the example project I'm working with and edited the location the script clones from. However this could be partially mitigated by not placing these repositories inside the .build folder, where our distribution clean function simply deletes the .build folder.

Enjoy,
Nap

I'm not familiar with the insteadOf functionality. I've only got interested in this whilst working out how to solve the mirror problem.

I've briefly read what isteadOf does, and I don't think it will help in this situation.
The problem isn't the protocol, the problem is the location (which could include the protocol).
The hacked solution above deals with this in a general manner that will handle all cases.

The only problem is that without an improved pre-receive hook, the server-side initialisation/preparation needs to be either done through ssh commands (yuk), or manually (even worse).

@rballard : there's some sample code above that does the trick, albeit in a very simple and simplistic way. I believe it solves all the issues that are within the control of SPM in a clean and elegant manner.

If the main objective is to lighten the network load, rather than to swap in a different fork, then being able to use git clone's −−reference functionality would also be useful in certain situations.

--dissociate and git repack would be complementary to the --reference flag.
https://git-scm.com/docs/git-clone#git-clone---dissociate
https://git-scm.com/docs/git-repack

I haven't used any of these before.