[Review] SE-0019 Swift Testing (Package Manager)

As I've taken some more time to evaluate the criticisms people have raised (several off-list), my view of the proposal has changed. There is enough substantive dissent here that we should send this back for more cooking.

A summary of the criticisms that convinced me are:

1. Whether testing and SwiftPM should be coupled at all. This isn't obviously clear to everybody that SwiftPM is the right place to solve the problem and a reasoned justification in the document is necessary. The only treatment of this coupling at all is that it "will help ensure a stable and reliable packaging ecosystem", but there is no discussion of why this is so. IMO this is a critical omission.

2. Whether privileging XCTest is in the best interest of the ecosystem. My informal poll reveals much more debate than I had expected about whether people even want to use XCTest for their programs, and so I think an approach that starts with defining the interface first and wiring XCTest into it second is better for the long-term health of the ecosystem than an approach that starts with XCTest first and invents a separate system for third-party later which we may or may not retrofit XCTest back into. I think a proposal that takes the latter position should include a reasoned technical justification for that decision.

3. The proposal's treatment of various topics "in passing" is frustrating. For example, the proposal says

We expect that such an implementation would take the form of a Swift protocol

If we are right now deciding to use a protocol (over say a UNIX process model) then this is not the kind of detailed discussion we need to justify that decision–and I say that as a person in favor of doing it.

Or if we are not right now deciding that, why is it in the proposal? So I think the proposal needs to be a lot clearer about what is an essential part of what is being proposed and what is nonessential, and right now it tries to cut both ways on several points. The protocol system, JUnit XML, Testability for debug builds, opting out of additional compilation, all exist in this grey area where I am not sure if they are passing ideas or binding precedent that we aren't defending well (or at all in some cases).

4. I think all reviewers supporting the proposal have found "irregularities" in the document. Paul Cantrell said it best that this was a "vision and first draft" but "not a releasable product". I have argued that we should use a different standard of evidence for greenfield proposals.

The fact is, though, there is no "official" explanation for why we should +1 a document that we all agree has deficiencies. I have tried to argue that it is a stopgap solution that is necessary but I find my own argument poor.

In another context of SwiftPM, Daniel Dunbar said:

the approach we want to take for developing and adding features is to be in a good place *at the time of release*... With that mindset, it doesn't usually make sense to add features that we consider actively/potentially harmful, even if that solves some current short term pain.

(emphasis in original)

It seems like the same principle should apply here. If we have consensus that this proposal is potentially harmful (and there are some details that I would characterize that way) then let's send it back and get a better document.

5. The "Alternatives considered" section admits that it considers no alternatives, but I think there are clear alternatives to each of the 4 numbered criticisms above. Specifically each of these are potential alternatives to the proposal:

    1. Do testing from a completely separate tool that isn't SwiftPM (probably frontended through swift-multitool)
    1a. Or building the test frontend inside the XCTest project instead
    2. Start with defining interface for the testing framework first, instead of potentially having one codepath for XCTest and another codepath for third party
    3. UNIX process model instead of protocol, JSON format instead of JUnit XML, compiling without testability for debug builds by default, etc.
    4. Doing nothing, until we have a "final draft" proposal that reviewers can endorse with fewer reservations.

To be clear, I'm not particularly advocating in favor of doing any of these, but I do think we can do a lot better than not imagining any alternatives, which is the present text.

While I don't advance these as arguments to reject, I think we should take the opportunity to fix them:

1. Being more careful about not assuming compilation times are free, see Paul Cantrell and David Owens's excellent arguments
2. There is some kind of documentation snafu in the swift-evolution process as it relates to this proposal. The process document <https://github.com/apple/swift-evolution/blob/master/process.md#review-process&gt; says the Review Manager is a member of the core team, but our RM is not listed there <Swift.org - Community Overview. I'm sure there is a reasonable explanation, but we should fix the docs.

I want testing as bad as all of you, but I see enough substance in the criticism that I now believe we should reject the proposal as it presently appears before us.

2. Whether privileging XCTest is in the best interest of the ecosystem. My informal poll reveals much more debate than I had expected about whether people even want to use XCTest for their programs, and so I think an approach that starts with defining the interface first and wiring XCTest into it second is better for the long-term health of the ecosystem than an approach that starts with XCTest first and invents a separate system for third-party later which we may or may not retrofit XCTest back into. I think a proposal that takes the latter position should include a reasoned technical justification for that decision.

I have no opinion about this proposal (to be honest, I haven't even used the package manager yet), but let's be careful not to make the perfect the enemy of the good here. If we want testing to be part of the package manager, it's perfectly acceptable for the initial version to rely on XCTest. The interface between SwiftPM and XCTest can then be gradually abstracted and refined until it becomes a generic test harness; then a separate proposal could formalize that interface and add support for alternative test frameworks.

···

--
Brent Royal-Gordon
Architechies

but let's be careful not to make the perfect the enemy of the good here.

This ship has sailed: it is the present policy to make perfect the enemy of the good, or, as others word it <https://github.com/apple/swift-package-manager/pull/107#issuecomment-168847823&gt;:

it doesn't usually make sense to add features that we consider actively/potentially harmful, even if that solves some current short term pain.

I don't necessarily agree with the policy but I do think it should be consistently applied. If we know we're going to refactor the XCTest interface later we shouldn't implement it the wrong way now, or else we should implement everything the wrong way now if it solves a present short-term pain, but there is nothing unique about the testing feature that does not equally apply to other situations.

This ship has sailed: it is the present policy to make perfect the enemy of the good, or, as others word it:

it doesn't usually make sense to add features that we consider actively/potentially harmful, even if that solves some current short term pain.

I don't necessarily agree with the policy but I do think it should be consistently applied. If we know we're going to refactor the XCTest interface later we shouldn't implement it the wrong way now, or else we should implement everything the wrong way now if it solves a present short-term pain, but there is nothing unique about the testing feature that does not equally apply to other situations.

I don't think the situations are the same. Quickly skimming that thread, it seems like people are arguing that the proposed feature is a step in the wrong direction. (I don't know enough to say if I agree with them; I'm simply characterizing their position.) But if you think testing support in SwiftPM is a good idea, then going from "no testing" to "XCTest-only testing" is not a step in the wrong direction, it's a step in the *right* one. Some of the design and code from the XCTest-only solution will live on in a more general one; in fact, generalizing the test harness is a straightforward extension of it.

Remember, the Swift project favors incremental development. <Swift.org - Contributing; A tightly-coupled test harness is an incremental step towards a loosely-coupled test harness. That's a very different case from a workaround that does not bring us closer to a permanent solution.

···

--
Brent Royal-Gordon
Architechies

I don't think the situations are the same. Quickly skimming that thread, it seems like people are arguing that the proposed feature is a step in the wrong direction. (I don't know enough to say if I agree with them; I'm simply characterizing their position.)

I argue that this proposal is a step in the wrong direction. I suspect David and Brian might argue that as well (I don't want to put words in their mouth though). So that is the sense in which the situations are similar: people are arguing the proposal is a step in the wrong direction.

A tightly-coupled test harness is an incremental step towards a

loosely-coupled test harness.

While going from specific to generic is usually the right approach, in this
case I think going specific is going to create a lot of work that could be
avoided until there's a more finalized version of what the long-term goal
is going to be. Beyond that I'm not even sure how the build tool would
integrate with XCTest in a way that amounts to more than "build it as an
executable" and "run the executable." The language in the proposal raises a
coupling concern, but technically speaking, it's not really going to be
much of a concern.

The more big-picture point I want to make clear is that the build tool,
IMO, shouldn't take an opinion on how tests are run, what their output is,
etc. By just running a binary testing target with a main.swift and checking
the exit status, we can push a lot of the specifics down into those tools
which enables the community to develop testing tools more freely without
having to wait for this discussion to be 100% finished (or for any sort of
proposal/review process from the language/tooling itself, for that matter).
Adopting more detailed standards, at this point, from a build-tool level,
feels extremely premature.

So, between the facts that:
- there's a perfectly acceptable intermediate step for almost all user
experiences (including the "default" one of using XCTest),
- this proposal is very prescriptive in some parts, and very broad and
open-ended in others,
- the proposal for integrating third-party testing tools is still under
discussion,
- some of the details here might be better off living in XCTest

I just think this idea needs some more time to bake, from a technical
perspective... and I think swift-build-dev is a good place to continue that
discussion. But as others mentioned the proposal is open-ended enough that
I don't think doubling down on talking implementation details invalidates
any of it -- the proposal as worded feels more like a declaration of
direction than anything, and in general I like the direction! But I'd
prefer some of the specifics (e.g, around testing output) be brought up to
those working on XCTest instead of integrated with the build tool itself.

So: proposal seems okay from a big-picture perspective, except for a few
things, but I don't think this is ready to have work begin on it yet in
earnest. And because of that, I'd say a simple, working version of testing
should be implemented first, just to make the build tool more usable for
many developers (including me).

- Brian

···

On Mon, Jan 11, 2016 at 5:14 AM, Drew Crawford <drew@sealedabstract.com> wrote:

> I don't think the situations are the same. Quickly skimming that thread,
it seems like people are arguing that the proposed feature is a step in the
wrong direction. (I don't know enough to say if I agree with them; I'm
simply characterizing their position.)

I argue that this proposal is a step in the wrong direction. I suspect
David and Brian might argue that as well (I don't want to put words in
their mouth though). So that is the sense in which the situations are
similar: people are arguing the proposal is a step in the wrong direction.

Hi all,

I'd like to thank everyone for their participation in this discussion. I originally scheduled this review for Jan 5th-7th, but left the review running as active discussion was still ongoing. At this point I am going to mark this review as "Under revision" and have Max incorporate your feedback before bringing it back for a subsequent review.

Specifically, here are the outstanding points I'm aware of and what I'll expect we'll do with each:

– There is currently no way to conditionally compile code for testing (a la "-D TESTING").

This is not going to be addressed by this initial proposal, but we should revise the proposal to mention that this needs to be solved in the future.

– The current proposal does not say that we'll support a test-without-build action.

We should revise the proposal to call out that this should be supported. This is important for workflows that involve building products on a machine other than the one which will run the tests.

– The current proposal is for a --test subcommand to 'swift build' and leaves 'swift test' for the future, without good justification.

We should revise the proposal to say that we'll implement this using 'swift test' now instead of a --test flag to 'swift build'.

– The current proposal says that we will automatically build test code when building your package code, even when you're not explicitly running tests. This has raised concerns about the performance implications of always building test code and whether this is the right default.

We still think that for most packages, it will not be overly costly to automatically build the tests as well, and that doing so makes it easier to avoid accidentally breaking your test code when you change your package's code. As such, we still plan to leave this as the default, but we are prepared to be proven wrong, and to reverse the default behavior here in the future, if community feedback indicates that we've gotten this wrong once SwiftPM is in wider use.

We do think that it would be nice to be able to turn this off on a per-package basis for all users of a package, so that packages with slow-to-compile tests can be built without needing to explicitly pass "--without-tests", but we don't want to clutter up Package.swift with that sort of configuration and don't have another great place to put it right now. We should mention this in the proposal and also explicitly address that this behavior is controversial. We should also explicitly call out that it is only the top-level package's tests that will be built automatically, not the tests of depended-upon packages.

– The current proposal does not mention test-only dependencies.

This is already implemented with via testDependencies specified in Package.swift, but this proposal should be revised to mention this mechanism.

– The current proposal does not provide a mechanism for supplying shared utility code that multiple test modules in a package wish to share. The only way to do this currently is to put that utility code in a separate package which is specified as a testDependency of this package.

This initial proposal is not going to address this need, but it should be revised to mention the problem and call out that we need to fix this in the future.

– The current proposal does not specify how test modules can be explicitly specified in the Package.swift or how those test modules can establish dependencies.

The proposal does already say that it will be possible to specify test modules in Package.swift in the future and just not in this initial implementation. The proposal should be revised to mention that this will also include the ability to specify dependencies of a test module. This should probably not just be listed as a "Backwards Compatibility" feature as that is not the only purpose of needing to explicitly specify test modules.

– The current proposal does not specify whether we will automatically run all tests or only specific tests, and whether it would be all tests in the top-level package or all tests in the entire package tree.

We think the right behavior is for 'swift test' to run all tests in the top-level package only, and for there to be a flag that runs all tests in the recursive package tree when desired; the proposal should be revised to reflect this. A concern was raised about how this works when there are different types of tests, e.g. unit tests vs performance tests; since this initial implementation of the feature doesn't have a notion of different types of tests, we think that this isn't an issue yet, and when we do support different test types we may want to revisit this behavior.

– The current proposal does not specify how to run specific tests.

The proposal should be revised to call out that it is possible to invoke specific test modules (and maybe specific tests?) by name.

– The evolution process document says that proposal review managers should be a member of the swift core team, but the review manager for this proposal is not listed as a member of the core team.

The evolution process documentation is geared towards proposals for the swift language; we're using the same process for SwiftPM, but there are a few discrepancies. For SwiftPM, the review manager for each proposal will likely be either myself, Max Howell, or Daniel Dunbar for the foreseeable future; this is essentially the "SwiftPM core team", though that's not currently formally defined. We should update the swift evolution process document to mention how SwiftPM differs here.

– The proposal says no alternatives were considered, but there are alternatives to explicitly mention.

The alternatives Drew mentioned were:

    1. Do testing from a completely separate tool that isn't SwiftPM (probably frontended through swift-multitool)
    1a. Or building the test frontend inside the XCTest project instead
    2. Start with defining interface for the testing framework first, instead of potentially having one codepath for XCTest and another codepath for third party
    3. UNIX process model instead of protocol, JSON format instead of JUnit XML, compiling without testability for debug builds by default, etc.
    4. Doing nothing, until we have a "final draft" proposal that reviewers can endorse with fewer reservations.

We should revise the proposal to explain why we are taking this approach instead of those.

– Should we be doing this at all, or should testing support be completely decoupled from SwiftPM?

To some degree testing _is_ decoupled; SwiftPM will know how to invoke the XCTest test runner, and it will define the protocol a test framework / runner needs to implement to work with SwiftPM, but it doesn't know any details beyond that for how test execution works; that's left up to the test framework. That said, we believe that this level of knowledge of testing is appropriate. SwiftPM provides the build system, and the build system needs to know how to build testing targets if we're to support testing, so some level of knowledge is necessary. The advantage of separating out the remaining knowledge (how to invoke the test runner) is minimal, and there's significant cost to requiring a completely separate tool which would need to integrate with SwiftPM just to invoke the test runner. Knowing how to build tests and how to run them are related tasks, and we don't see a real benefit to completely isolating that knowledge.

– Should other test frameworks be supported via a basic function with a simple success or failure return code which can call out to whatever test system the package wants, instead of needing to define a real protocol for test result reporting and requiring someone to write a SwiftPM adaptor for that protocol for each test framework someone wants to use with SwiftPM?

This is something we should discuss in the separate third-party testing support proposal, not in this proposal. That said, I believe there is a strong advantage to defining a real protocol, as it allows all packages to have uniform test result reporting, regardless of what test framework the package uses. This will be important if we ever want to have a package index that can understand the health of submitted packages (or do things like let you test the effect of changes in a widely-depended upon library on clients of that library that the index knows about). It's also important for clients of a package that want to be able to hook up the tests of their dependencies to a CI system in a standard way.

– Should SwiftPM have special knowledge of XCTest baked in at all, or should test support be completely generic?

Most of the details in this proposal (such as how to define test modules, how to determine dependencies, and how to run the tests) are actually completely independent of XCTest. That said, we do believe that there is a benefit to supplying a zero-configuration-required (given convention-based layout), out-of-the-box ecosystem default testing solution, and that XCTest is the best candidate to be that solution. Having a standard makes it easy for new package authors to get off the ground and running quickly and easily. It also lets us go ahead and implement this test support now with something real before we've defined the generic protocol for other testing solutions. When we do review the proposal for third party testing frameworks, we should make sure that it requires minimal configuration to use such frameworks – that XCTest's "default" nature doesn't make other solutions cumbersome to use.

Thank you all for your feedback!

  - Rick