GSOC semver project idea

Hello, I was looking through the project ideas for Google Summer of Code, and the semantic versioning suggestion sounded like a pretty cool idea!

I was wondering what capabilities might need to be added to SourceKit? I’ve found some documentation on the module interface API, but I’m not too sure how exactly to use the framework to query it. Are there any examples of how SourceKit is used from C++ or Swift?

Also will this tool also need to communicate with git to figure out what the interface of the previous version was?

Hey Luke!

Awesome, we think this will be a fun project for students and an awesome addition to SwiftPM!

SourceKit framework is written in C++ but it has C APIs which can be used from Swift. My hope is that we will need to do something minimal in SourceKit (if even that). There is a very good community project called SourceKitten that is written in Swift and built on top of the C API.

Right. We will need to generate the module interface of the input version and then we need to diff it against the current module interface. We would not want to modify the current state of repository, so we can create a checkout of the input version in a temporary location (maybe using git checkout-index) and then generate an interface for that.

I think for the initial version, it may be acceptable if SwiftPM takes a serialized version of module interface as an input (instead of a version). This will be annoying for users but will make a good minimum viable feature.

Hi Aciid!
I checked out SourceKitten and used it to get the module interface for Foundation:

      "key.kind" : "source.lang.swift.decl.struct",
      "key.length" : 658,
      "key.name" : "Measurement",
      "key.namelength" : 11,
      "key.nameoffset" : 265460,
      "key.offset" : 265453,
      "key.substructure" : [
        {
          "key.accessibility" : "source.lang.swift.accessibility.public",
          "key.kind" : "source.lang.swift.decl.var.instance",
          "key.length" : 17,
          "key.name" : "value",
          "key.namelength" : 5,
          "key.nameoffset" : 265750,
          "key.offset" : 265746,
          "key.setter_accessibility" : "source.lang.swift.accessibility.public",
          "key.typename" : "Double"
        },
        {
          "key.accessibility" : "source.lang.swift.accessibility.public",
          "key.kind" : "source.lang.swift.decl.function.method.instance",
          "key.length" : 35,
          "key.name" : "init(value:unit:)",
          "key.namelength" : 35,
          "key.nameoffset" : 265841,
          "key.offset" : 265841,
          "key.substructure" : [
            {
              "key.kind" : "source.lang.swift.decl.var.parameter",
              "key.length" : 13,
              "key.name" : "value",
              "key.namelength" : 5,
              "key.nameoffset" : 265846,
              "key.offset" : 265846,
              "key.typename" : "Double"
            },
            {
              "key.kind" : "source.lang.swift.decl.var.parameter",
              "key.length" : 14,
              "key.name" : "unit",
              "key.namelength" : 4,
              "key.nameoffset" : 265861,
              "key.offset" : 265861,
              "key.typename" : "UnitType"
            }
          ]
        }, ...

I feel like there's a lot of useful fields here:

  • accessibility
  • name
  • kind
  • substructure
  • typename

Would comparing these be a good starting point?

I think for the initial version, it may be acceptable if SwiftPM takes a serialized version of module interface as an input (instead of a version). This will be annoying for users but will make a good minimum viable feature.

Would this serialized version be in a human format and maintained by package authors? Or would it be more automated, like maybe the JSON response from SourceKit or something that a command in SwiftPM would output?

Yep, looks like the right direction!

Storing JSON response from SourceKit (or a stripped down version of it) sounds good. I expect we will add a command in SwiftPM that writes this file. For example:

$ swift package generate-serialized-module-interface --output /tmp/v1.0.0.json

Then this output can be compared with a different version or checkout by generating this file again and then diffing it against the input file. Some examples:

$ swift package suggest-next-version --from-version 1.0.0 --input-file /tmp/v1.0.0.json 

No API change; suggested version is 1.0.1.

or

$ swift package suggest-next-version --from-version 1.0.0 --input-file /tmp/v1.0.0.json 

New API(s) added; suggested version is 1.1.0.

Diff:

+ public func doMoreThings()

@Luke_Lau The following project generates API diffs thanks to SourceKit (through SourceKitten). It might be a good starting point to research the GSCO semver project:

2 Likes

Thanks, this will definitely be useful!
It looks like it uses sourcekitten doc, which in turn uses the editor.open request from SourceKit. I used the editor.open.interface request, and I'm wondering if one is a superset of the other? They both seem to return the fields that would be useful for semver, like accessibility and kind etc.

One difference though is that for editor.open.interface I needed to compile the source file to a module first, but SourceKitten is just passing the file along like so:

return [
    "key.request": UID("source.request.editor.open"),
    "key.name": path,
    "key.sourcefile": path
]

I presume it would be more desirable to be able to generate the interface without having to build it beforehand.
Would there be any caveats with SwiftPM like build flags or dependencies that would force the tool to build it with SwiftPM first, or could we just leave the building to SourceKit?

The apidiff project that @hartbit linked above seems to do something akin to this, it clones the project twice into tmp directories and then checkouts to the specified refs.
I'm not familiar with git checkout-index though, what did you have in mind with it? I played with a bit and was able to copy over files from the index to a certain directory, but from the current revision.

Ah right, checkout-index won't work for this. I think we can do a shared clone (using the option --shared). That should make the operation less expensive than a regular clone.

Since Swift has no explicit header files, we will have to pay the cost of compiling to generate the public interface. We need to decide if we want to compile ourselves or let SourceKit do it. I don't know which one is faster (or better) for our use-case but we can figure that out as part of the implementation. But either way, I believe we will need to pass the correct build flags to SourceKit.

I noticed Swift PM builds a .swiftmodule in the .build directory for library targets when running swift build. I think it would be quite common to check the API diff after running that command, especially just before a release/tag is created. swift build seems to only trigger a build when needed i.e. running it twice in a row only builds it once, so could we use it to prevent unnecessary rebuilding?

I'm also curious as to what should happen to executable type packages. I guess it doesn't really make sense to generate an API diff for them, although I've seen .swiftmodules generated for them in the .build directory too.

It would be difficult to predict if the build directory has the updated build contents. I think we should have separate build directories for the versions that are being compared. This will ensure that we have the correct swiftmodule files.

Yeah, this feature is mostly aimed towards library authors.

I've submitted a draft proposal on Google docs, feedback is welcome!