Extend SwiftPM `PackageDescription` to introduce metadata

Erica_Sadun · June 18, 2020, 7:56pm

A Swift Package defines the sources and dependencies for successful compilation. The PackageDescription specifies items like the supported Swift version, linker settings, and so forth.

What it does not do is offer metadata. You won’t find email for the active project manager, a list of major authors, descriptive tags, an abstract or discussion of the package, a link to documentation, deprecation information or links to superceding packages upon deprecation.

Here's a first approximation on how this might exist natively within the package.

let package = Package(
    name: "now",
    platforms: [
      .macOS(.v10_12)
    ],
    metadata: [
      .maintainer("erica@ericasadun.com"),
      .tags(["dates", "calendar", "scheduling", "time", "appointments"]),
      .abstract("Times around the world because no brain should have to work out what time it is in NYC or what time it is here when it's 4PM in London or what time 4PM in London is here..."),
    ],
...

What are your thoughts about updating the Package standard to incorporate metadata and what kinds of fields do your believe would have the highest priority for inclusion?

SDGGiesbrecht · June 18, 2020, 8:04pm

Some prior discussion exists here:

Jon_Shier · June 18, 2020, 8:15pm

Yes, please! Especially with the approaching standardization of a package repo standard, it would be great to have these values integrated as part of that standard, rather than tacked on later. Generally, supporting the same metadata the community already uses as part of existing tools, like CocoaPods, would be a good start.

xwu · June 18, 2020, 10:28pm

I tend to agree with @Aciid's point in that thread that metadata seems ill-served by Swift's strong typing. A lot of the advantages that might make it nice to define a package's configuration in Swift (such as the possibility of having arbitrary imperative logic) are inapplicable to metadata. Moreover, the approach includes some unduly restrictive limitations: for instance, the order in which metadata fields are named cannot be varied.

I would be curious to explore alternative formats, such as a metadata YAML file. (YAML, notably, having been proposed for use in at least one other pitch, for overlays.)

Erica_Sadun · June 18, 2020, 10:45pm

My other suggestion would be to put an optional markup-language file (I suggested JSON) in repos at the same level as Package.swift (for example, Package.metadata, PackageMetadata.json, or some such), where the data would be in an expected location with a community-guided standard but without touching directly on the package manager. However, my completely unscientific survey suggests more people would prefer the package to offer at least a basic set of metadata.

Max_Desiatov · June 18, 2020, 10:55pm

I strongly support using JSON instead of YAML if this metadata ever ends up in a separate file. JSON parsers are available everywhere (including Foundation), while if any 3rd-party tool would like to parse some YAML package metadata, it needs to pull a YAML parser as an additional dependency. There are also enough problems with YAML that make it much more confusing to use than JSON.

xwu · June 19, 2020, 12:01am

You've mentioned this before and have received a number of replies, so there's little point to belaboring it:

YAML is a superset of JSON, so every valid JSON file is a valid YAML file, which means that you can write your own files in pure JSON wherever YAML is accepted.

API notes are already written in YAML, cross-import overlays have been proposed where the declaration files are to be written in YAML, and one of the GSOC projects will localize diagnostic messages using YAML. At this point, it's hard to justify reaching for another text serialization format than YAML when LLVM/Clang/Swift has converged on the format.

But overall, that is beside the point that it seems overall to be a more congenial design to write metadata in a separate text serialization format.

Max_Desiatov · June 19, 2020, 12:06am

Please note that I don't argue much against writing YAML, except the parts that people can write ambiguous YAML they'd expect to parse, but it won't (and these errors will only be uncovered at run time). The bigger problem here is with reading YAML. It's hard to say that Swift converged on the format if it isn't supported (for both writing and parsing by the way) in either the standard library or even Foundation.

xwu · June 19, 2020, 12:42am

?
The standard library doesn't support reading or writing files, period, and the Foundation API isn't under the control of the Swift project. Anyway, this is neither here nor there for this topic.

Max_Desiatov · June 19, 2020, 12:55am

I've never disputed any of these facts anywhere, so I'm not sure what's your exact point here.

I'm only trying to compare two formats from two perspectives:

The convenience of use for developer tools and the whole ecosystem, i.e. how easy it is for someone to parse it with the libraries bundled in every Swift SDK. One of these formats is supported by Foundation, the other one isn't.
Maintainability, i.e. how easy it is to write an invalid file that won't be correctly parsed.

Both of these points are important if an external metadata file is chosen to use one of these formats. The first one would have an impact on how easy it would be to maintain developer tools that parse the metadata file, while the second has an impact on how likely it is that package authors (especially beginners) could provide an invalid file.

What else would you like to be clarified with regards to the comparison?

xwu · June 19, 2020, 1:29am

Naturally, if a part of SwiftPM uses a particular text serialization format, then there will be utilities shipped with SwiftPM that make such use possible. We can sink that into whatever level of the project is most appropriate. What Foundation APIs happen to exist should be irrelevant to the question of which text serialization format is most appropriate for use in the Swift project itself; that's the tail wagging the dog.

YAML's wide adoption argues against its corner cases being meaningful barriers to correct usage; I don't buy that "yes" or "no" parsing as "true" or "false" is going to stop users from writing metadata files correctly. It is hard to see why YAML would have been selected over other formats for the components of the LLVM/Clang/Swift infrastructure that make use of it already if correctness were a major issue for YAML as compared to those alternatives.

Again, I don't think this discussion is relevant for the main thrust of this topic.

Karl · June 19, 2020, 6:17am

It sounds like a nice feature, but I don't believe it's necessary.

Git commits already contain author email addresses
I don't think tags or a description of the package can be in the manifest because of localisation. I'm pretty sure we don't want localised variants of Package.swift files, so we'd need a separate file which can be localised (or have its information keyed by locale).

In general I just feel like this is a poor man's README.

Why move information from the README in to machine-processable parts? Is there any tooling which cares about the list of "major" authors (and can't use Git)? Is any user better served by a one-line description than a README file (which these days are often in Markdown - including sections, tables and code snippets which illustrate the API)?

Tags are possibly interesting for aiding search algorithms, but I feel that any half-decent README indexer should be able to recognise a line like "Tags: dates, time, calendar", and the recent package repo spec explicitly does not include search.

Max_Desiatov · June 19, 2020, 10:21am

These aren't the only pitfalls with YAML, consider for example someone adding a tag that indicates a support for Swift 5.3 (I could perfectly see someone doing that, maybe hoping that it will boost their library in a search index):

tags:
  - 5.3

This won't parse correctly, but there's literally no type checking whatsoever, no requirements to quote list items etc. A package author writing this could literally never discover that this is an error, while a package index ingesting this metadata will fail, most probably failing to parse the rest of the metadata.

What indication do you see that points to YAML being selected for those use cases purely on technical merits? YAML is more concise than JSON, and it's been "hyped" recently, being promoted by products such as Docker, Kubernetes etc. I doesn't look like everyone is happy in that ecosystem with how it turned out. Based on their experience, it's a good opportunity to be more careful and conservative by reviewing all the possible consequences.

I would also argue that Clang/LLVM and Swift overlays didn't stumble upon these issues because they work either with purely machine-generated YAML or mostly machine-generated and then manually edited files. Their YAML list items don't need to be quoted as those are mostly valid symbols in their target language, where an item like 5.3 is not a valid symbol. Here though, we're discussing a much wider use case. We're giving everyone, not just compiler engineers, but also beginner package authors, an opportunity to describe their package. It would be great to follow the principle of least surprise and prevent errors that could occur in this much wider setting.

I would also argue against bundling yet another dependency with the Swift toolchain just for the sake of adopting "yet another" (see what I did there ) format, where JSON support is already included with it and could solve the issue at hand just as well (while hopefully the metadata will be added in Package.swift and we'll forget about the risk of using YAML altogether). The Swift toolchain is already something that takes plenty of time to download and install by taking hundreds of megabytes. It compares unfavourably to toolchains of other languages (e.g. Go about 100MB and Rust about 150MB). I hope Swift could avoid adding more required dependencies and tools that would have only limited application in the ecosystem.

xwu · June 19, 2020, 11:57am

Max_Desiatov:

These aren't the only pitfalls with YAML, consider for example someone adding a tag that indicates a support for Swift 5.3 (I could perfectly see someone doing that, maybe hoping that it will boost their library in a search index):
tags:
  - 5.3

Do you mean semantic versioning tags? By convention, those begin with "v", and SwiftPM expects that format. Do you mean something closer to GitHub topics? I don't see common use of topics that are numbers.

Obviously, a metadata file that fails to parse would be called out by SwiftPM. We would expect the same thing to happen if JSON were the format for serialized text.

I'd call that a technical merit, wouldn't you? YAML describes itself as "human-friendly," so what you're arguing is that the format isn't fit for purpose. I disagree.

Again, what APIs happen to exist in Foundation should be irrelevant to the question of which text serialization format is most appropriate for the Swift project. There is no need to think that it would be a dependency; the functionality will either be part of SwiftPM or it can be sunk into a more appropriate library. It's silly to be arguing about the file size for a feature that would not add meaningfully to it, and which is already used widely in third-party tooling such as SwiftLint and SourceKitten).

But again, let's stop talking about this; it's not germane to the main point at hand.

SDGGiesbrecht · June 19, 2020, 4:48pm

From semver.org:

Is “v1.2.3” a semantic version?

No, “v1.2.3” is not a semantic version. However, prefixing a semantic version with a “v” is a common way (in English) to indicate it is a version number.

Outside that FAQ comment, there is no v attached to any version number in the document.

SwiftPM does permit v‐ prefixed tags, but its documentation and examples use raw semantic versions as tags, i.e. 5.3. None of the packages in the swift- namespace prefix their versions with v‐ either.

Jon_Shier · June 19, 2020, 6:04pm

Technically true, but not useful, given that random contributors are not the maintainers of the project. Users could guess, but the git history is hardly authoritative. Additionally, many projects would want a project email rather than personal emails for users to contact.

Karl · June 19, 2020, 8:46pm

It certainly is useful! It has line-level information about who actually did what and when. AFAIK, users aren't struggling to discover who maintains projects today, so there is no problem to solve.

Also, you can configure the git email address to be whatever you want (and some providers such as GitHub provide facade addresses), so there is no reason for personal email addresses to go there if you don't want them to.

I don't see any reason for Swift to do it's own thing here. It is typically better to follow established conventions unless you have clear and specific use-cases that require you to customise it. Today, we have commits for the actual authorship information, and CODE-OWNERS/MAINTAINERS files to list gatekeepers for particular components. It seems to work fine.

daveverwer · June 21, 2020, 1:49pm

Great thread so far. As some of you are aware, we're also thinking about this over at the Swift Package Index project and I thought I'd chime in here.

Sven and I have discussed back and forth the issue of whether this belongs in the manifest, or in another file and we've come down quite firmly on the side of a separate file. I think there's a fundamental difference between the technical details in a package manifest and metadata like package description, tags, authors, etc…

For example, if the Package.swift manifest changes, for example adding a new dependency or supporting a new platform, those changes would normally be prompted by code changes in the project and I'd expect a new version release of a package as a result of that. If someone adds a sentence to a package description, or adds a tag, I wouldn't. It's mixing two different types of metadata.

Mixing metadata isn't bad in itself, but given that an auxiliary metadata file would also be also easier to update and easier to propose changes to, I'd be inclined to support that over changes to Package.swift.

That said, some of the metadata that we're discussing over at the SPI project absolutely belongs in the manifest. For example, support for Linux.

I just did some work on our proposal for this this morning, consolidating and aggregating the feedback we've had so far. The issue I've made this morning is here. At the moment, we're concentrating on getting a comprehensive list of what people might want to find in a metadata file, before talking about formats, or even the final set of data that we'll support. I'd love to get any input over there, and of course will also be monitoring this thread.

I'm very excited to get something like this going, it's definitely needed.

daveverwer · June 21, 2020, 2:40pm

I just had another thought. There's one more point in favour of making this a separate metadata file rather than part of Package.swift.

Metadata like this is useful in many contexts, and having it as a data format rather than Swift code that needs running through dump-package or similar is much more accessible to sites and other tools that are interested in reading that metadata.

daveverwer · June 24, 2020, 1:35pm

There has been some good discussion over on this thread around this in the last week and I'm getting ready to propose a draft format for a file external to the Package.swift to the community for final feedback before we make the Swift Package Index able to parse it.

I've also just opened up two new threads on what format the file should use and what the file should be called. I'd love to get your feedback on that too.

Thanks!