Improving manifest loading performance for declarative package manifests

This is not correct, JSON pod specs in Trunk are closer to a cache for released versions than anything else. Programmatic pod specs remained available.

1 Like

I think this risk can mostly be mitigated by adding a validation subcommand to SwiftPM that double-checks both modes work, which package authors could run before releasing versions of their package.

Sure, "switched" was the wrong word there, since most pods continued to use Ruby podspecs for local development, and just adopted JSON podspecs in the registry. But FWIW, I was suggesting that we consider the same path for SPM — generate Package.json by evaluating Package.swift, in the case that that's not environment-dependent, as a higher-performance cache of the same information.

I am concerned that relying on developers to either generate a declarative manifest or validate the equivalence of their declarative and executable manifests is very fragile. The risk of “works on my machine, ship it!” is very high.

5 Likes

I've never had problems with reading or writing json, especially when my IDE knows the schema and tells me what keys are valid. It's also the best format for people who want to write DevOps tools in random programming languages and parse this file.

I also think it's more readable than toml. Toml is only good when your IDE can't do automatic indentation.

1 Like

While I applaud the thought and effort going into this, parsing swift as declarative syntax seems like a win only if we couldn't migrate to a purely declarative approach. That is the real win IMO.

Declarative files are much easier for surrounding tooling to work with (cargo add is a great example). And it seems possible to migrate to a declarative format without breaking anything that exists today.

I envision something like:

Package.swift → Package.toml

If a Package.swift exists, execute it (or parse it as described above), then generate a Package.toml and use that.

If Package.swift doesn't exist, but Package.toml does, use that.

I'm using TOML here because I like it, but maybe pkl would also make sense. I would strongly prefer not using JSON though ;)

5 Likes

This is amazing, thanks a lot for the work :clap:. SPM Resolution performance is the top concern for our engineers in our DevEx surveys and this has helped a lot. From 2:40s having warm caches to around 0:30s (which we are in the process of reducing further by moving our ~100 remote dependencies over to a Swift Package Registry).

We have over 200 local packages and one benefit from this work is that the GitInformation structure is only populated if it's accessed (unlike in the executing path where it always is). In our monorepo, over half of the resolution time goes into this redundant git status calls by SPM. I even opened a PR to address this but I believe Doug's work brings makes the fix more structural (rather than throwing in another cache) plus of course the speedup from avoiding "running" the packages.

I'm excited for this and very glad that there seems to be a bunch of ongoing efforts to address SPM performance.

If the future is to have another mode or syntax for declarative packages I wouldn't be opposed to it, but I don't think that's a reason this work shouldn't land in the meantime, as it solves real problems we have today.

6 Likes

Was gonna ask the same question.

Is it not possible to invest all this effort into a proper Swift interpreter/JIT? That would have outsized returns across the ecosystem and every other use case for Swift. Is everyone who cares about time-to-results for Swift code supposed to write a similar special-purpose interpreter for their chosen subset of the language?

1 Like

Swift already has a JIT, the point is that doing the parse is the cheap bit of that, and skipping all the rest of it is a big performance improvement.

Could be funny to tie package evaluation to constant evaluation though — if the Package initializer can be folded, you can skip code generation & execution & just pull the values out of the compiler front-end.

Maybe the word proper is doing too much heavy lifting, but in essence if it isn’t fast enough for this use case, then it isn’t good enough to be a proper interpreter/JIT.

Explaining my thinking:

Consider the effort that will go into this proposal:

Swift PM contributors

  • Building the swift-syntax based interpreter (already have a POC)
  • Evolving it to support more constructs / removing common “limitations”. Endlessly debating which to support.
  • Evolving it as the package description evolves
  • Evolving the package description with consideration for what’s easy to support with this special-purpose tightly coupled interpreter
  • Evolving it as the language evolves

Package contributors

  • Updating packages for compatibility
  • Triaging problems when what seem like straight forward package changes suddenly trigger limitations, coming up with alternatives
  • Endlessly debating whether it’s important or not and triaging requests from users to just do it

Everyone else

  • Because this sets a precedent for “swift is too slow to compile so you should write a special-purpose swift-syntax-based interpreter for your use case”, anyone else who wants to use swift for configuration or lightweight extensibility (e.g. imagine a spreadsheet app with swift for formulas), will be encouraged to do the same thing, and tackle all the same problems above
  • Some of these use cases will take the opportunity to accept special syntax or interpret existing syntax in non-conforming ways which make sense for their use case / application, leading to more confusion

When you add it up, it’s a ton of energy spent over the years that would have been better spent making a general-purpose interpreter/JIT that’s good enough for this use case.

I think there are other benefits of a declarative model, like hermeticity, but it’s an orthogonal topic. We want those benefits for Swift code too without having to parse for a subset of the language.

5 Likes

A declarative Package manifest (that is explicit about being so in the // package-description line) is a good idea IMO. If I was building this feature from scratch I probably wouldn't do it that way, which is a bit of a yellow flag, but I do see the point about backwards compatibility, especially given that ~87% of packages "just work" as described.

I have certainly felt the pain of non-declarative manifests. Indeed, as we move more and more to SwiftPM the problem described here gets worse and worse (more and more deps, including our own, leading to lots of empty headscratching between swift build and Nothing to rebuild in the base case).

The other upsides: Swift code (and with it the manifests) are quite readable. And I don't see huge issues regarding incompatibilities with the right test suite that ties the two implementations together.

What we will need however is better handling of "don't download these external dependencies at all if they're unused in the targets being built for the current target platform", and "don't even think about producing this Product unless you're building for a certain platform", and the like. These are the only reasons we have conditional non-declarative code in our manifests today.

2 Likes

We can check argument order easily enough. We're not going to get everything, though, and we can't really tighten restrictions after this ships because it would break existing declarative manifests.

I don't agree: I think array concatenation and factoring common groups of settings into global lets is a reasonable way to express reuse, and it is declarative. If we went with some non-Swift declarative format, I agree that we would need a way to express these.

I've observed it to be an issue, yes. The cost stacks up linearly with the depth of the package dependency graph. This later post shows some real performance improvements from this change because they have the repositories locally available. I've also noticed the need for this when editing package manifests in my IDE, where the dependencies are there or mostly there, and the bottleneck is re-processing the manifest.

This cost blocks package resolution from starting, and at each package encountered in the graph. Other improvements to package resolution, such as fetching an archive of exact the sources you need rather than doing a full git clone for each repo, are likely to make this issue stand out more over time.

We've had tooling to add to mostly-declarative package manifests since SE-0301 was implemented. It's not as easy to build as a fully-declarative file, but it's doable.

That's fine for producing a cached version of the manifest for the specific host environment and target configuration. But that cached version isn't necessarily usable outside of that specific build because you don't know what the manifest did when it was executing that might be host-specific: did it look for some files in the local build directory? check some environment variables?

... and ...

The amount of effort for these two ideas are very, very different. Building a proper Swift interpreter/JIT is at least 100x as much work as the parsing manifest loader, or even moving the ecosystem to a new declarative format. If we did all that work, perhaps we could make executing the manifest 10x faster. That's nice, but I suspect it's still probably not enough. It's really hard to compete with something that's

Doug

4 Likes

A middle ground between "manifests are arbitrary Swift code" and "manifests are inert declarative JSON-like data" might be to take inspiration from semi-programmable configuration languages like Dhall, which provide things like let bindings, basic conditionals, arithmetic, and structural composition operations, and even function definitions, but limited in such a way that the evaluation is guaranteed to terminate and not have arbitrary side effects. With the parser you've already established, maybe we can gradually expand it to a "Dhall-in-Swift" subset that allows for useful abstraction without going all the way to full unrestricted code execution.

5 Likes

Do we have evidence to this effect? As far as I can tell, TOML doesn’t support macros, variables, concatenation, or any other form of metaconfiguration. Has this actually been a problem for the Python and Rust package ecosystems?

Completely anecdotal, but I personally use it and would be sad to see it go.

Would you be willing to share an example of how you are using it?

Sure!

The most common use-case I have is let swiftSettings: [SwiftSetting] = […] and re-use this in all my targets.

I also have an example of a more complicated project with heavy use of variables: officectl/Package.swift at b366e3195fc086512347f434cb5a2a00f85af403 ¡ Frizlab/officectl ¡ GitHub

1 Like

As one data point, yaml has a basic version of this. It’d be enough for the „reuse my list of deps“ case at least.

I would strongly hate yaml manifests though :upside_down_face:

2 Likes