Shallow git clone for CI purposes

We have a SPM Package for building the Firebase SDKs for zip distribution and have an internal CI that pull the project and all dependencies every build.

I did some digging and saw that the SPM git clone intentionally does not do a shallow clone due to the cost of iterative updates.

Do other folks think it would be helpful to have a command line flag to do a shallow clone during a SPM build? I assume it could be a parameter of the GitRepositoryProvider initializer and used in the fetch(repository:to:) function, but I'd have to try to implement it first to confirm.

I did a trivial test (outside of SPM) for time and bandwidth required to clone one of our dependencies (swift-protobuf in this case) and the difference seems non-trivial to me (compounding across multiple builds and a growing number of dependencies):

Deep clone:

$ time (git clone https://github.com/apple/swift-protobuf.git deep-clone)
Cloning into 'deep-clone'...
remote: Enumerating objects: 10, done.
remote: Counting objects: 100% (10/10), done.
remote: Compressing objects: 100% (9/9), done.
remote: Total 18239 (delta 1), reused 4 (delta 1), pack-reused 18229
Receiving objects: 100% (18239/18239), 17.88 MiB | 4.67 MiB/s, done.
Resolving deltas: 100% (15037/15037), done.

real	0m7.132s
user	0m5.793s
sys	0m0.379s

Shallow clone:

$ time (git clone --depth 1 https://github.com/apple/swift-protobuf.git shallow)
Cloning into 'shallow'...
remote: Enumerating objects: 416, done.
remote: Counting objects: 100% (416/416), done.
remote: Compressing objects: 100% (380/380), done.
remote: Total 416 (delta 151), reused 76 (delta 25), pack-reused 0
Receiving objects: 100% (416/416), 1.01 MiB | 2.78 MiB/s, done.
Resolving deltas: 100% (151/151), done.
real 0m1.298s
user 0m0.207s
sys 0m0.159s

Emphasizing two important parts:

Deep

real 0m7.132s
17.88 MiB

Shallow

real 0m1.298s
1.01MiB

The time to clone and data downloaded provides a nice benefit for the one-off builds, and saving bandwidth in the CI system would be a nice improvement by not pulling in data it won't use as well.

If this is something that other folks have interest in (or there's no pushback on it), I'm happy to investigate building the feature locally and doing some proper benchmarks.

Thanks!

If you really want peak efficiency, then never re-clone a project (shallow or not). Here is a "git + CI" cheat sheet:

Dedicate a subdirectory to CI clones/worktrees (i.e. where you don't do any development), then do this: git pull --ff-only per project and if you're feeling paranoid, then do git clean -fdx per project. Presto! The same result as recloning but as efficient as possible.

If it actually is faster, it would be nice to improve efficiency without need for a special flag.

But the comparison you show is just the timing of a single isolated Git command. Unless every package in the dependency tree only uses .exact versions, the package manager still must check out manifests from multiple revisions in order to resolve the tree.

Maybe when fetching according to a valid Package.resolved, the clones could reasonably be shallow, and the deep fetching delayed (git fetch --unshallow) until the root manifest changes and it needs to resolve anew? That might theoretically speed cloning up for simple checkout and build operations.

I suspect it just needs someone to do the empirical testing to find out what actually is fastest over a range of sizes for the repository, version list and dependency tree. If a new strategy can be proven to be generally better, I doubt anyone would object to switching.

(Any such testing should probably use the master branch of the package manager, since a lot of work has been done on the resolution logic since the latest releases branched.)

Unfortunately I don't think that will work for our setup - it's essentially an internal Jenkins instance (called Kokoro, more information here) and there is no dedicated machine for our builds, we get a random machine from a pool. Did you have something else in mind that would work in this situation?

Based on the comment in the code, it seems like this was tested originally and for general purpose development doing a full pull is more efficient during iteration, but maybe some more testing is needed.

That is a great point that I certainly haven't thought of! :slight_smile: in our case we're only using .exact but great point about having to resolve the version. Perhaps after a valid Package.resolved is available this could be used like you mentioned.

Thanks for the replies folks!

That's a good idea if well implemented. I am also fine with starting with a flag to always perform shallow clones which should be much easier to implement.

I think this is something everyone should get by default so they don't have to know about this flag, assuming we don't know of any downsides (unless you just meant it would have the flag for a short migration window to this becoming the default?).

I meant it seems reasonable to add a flag so user can opt-into shallow clones where the clones are short lived (like the CI use case) until we can implement some sort of feature to automatically manage shallow vs full clones.

Terms of Service

Privacy Policy

Cookie Policy