RFC: Policies for Swift Platform Development

Hello fellow developers,

In order to make it easier to collaborate on adding additional platforms, it would be useful to formalise some expectations around platform development strategies to ensure that the work around all the other goals can continue effectively. I would like to request comments on the following attempt to create such a document. This is the effective continuation of the previous thread about how do we enable platform proliferation for Swift and encourage contributors to work on different ports.

I hope that we can get a productive discussion from this and create a strong basis for continued efforts for new targets.

Saleem

Proposal: Policies for Swift Platform Development

Core Principles

In order to create a stable ecosystem for Swift, it is important that we maintain a single coherent ecosystem across platforms. Whenever technical feasible, the project should aim to provide the same interfaces, behaviours, and capabilities on every platform. For example, the compiler and build system should support both static and dynamic linking of libraries on all platforms.

This not only makes it easier for users of the language to be able to target different environments easily, but also the language implementers by reducing the complexity of the implementation through fewer divergent code paths. By sharing the code paths, the implementation can be vetted more thoroughly by multiple toolchains (e.g. MSVC, gcc, clang). This gives a higher reliability in the Swift compiler itself and helps catch issues before users hit them. It also enables additional avenues of approach for isolating and debugging defects when they are found.

We should also exploit temporal locality in our approach to system development. It is easier to resolve issues with the context fresh in mind rather than trying to recover the details after time has passed. As such, it is important to ensure that we carefully time bound regressions in order to address them quickly and efficiently.

Because future features often build upon existing functionality, removal of functionality from a specific target endangers future functionality on that platform. When not actively maintained, code quickly bitrots, and will result in platforms rapidly regressing.

Enabling Collaboration and Incubation of Features

Software engineering is a collaborative process. As such, it is often required that we have features that are not yet fully ready to be merged in order to enable people to collaborate. This requires that we have a maturation process for features to allow developers to quickly make progress.

One approach to enable this is to permit development features being incubated to be made an explicit opt-in, and unsupported option - no driver level control, the feature can only be accessed through the frontend, and is marked as experimental. That is, features can be gated as -Xfrontend -enable-experimental-feature which restricts and explicitly marks the feature as incomplete. When the feature is formally introduced with driver level options and control, they would be enabled on the major platforms (given that the feature is not inherently platform specific).

Features graduating from experimental to production would be evaluated across the major platform environments (i.e. macOS, Linux, Windows, Android). The new feature work should be enabled across these platforms, with the possibility of exceptions being granted in certain circumstances (e.g. the feature does not make sense on a particular platform).

Platform Support

In order to be able to support multiple platforms, we need to have a means for having platform owners who are the points of contact for developers to reach out to in the case of platform specific issues when they arise. Additionally, although many of the day-to-day issues are easily resolved by builds and tests, there are possibilities of platform specific issues to come up. Having a platform owners group who is explicitly identified to help resolve the issues will ensure that the different platforms are able to help keep the overall project healthy.

Additionally, having a tracking mechanism to identify specific features and each individual test that is disabled for any reason that is time-bound to be fixed, will ensure that the platforms continue to remain in a healthy state. Feature authors should be responsible for working with the platform owners to ensure that features are enabled on other platforms before the next release.

As platform support improves, it would be possible to expand the major platform list. In order to be considered a major platform, the platform should be at a comparable feature set with the other platforms. Under certain circumstances, it may be possible to have exceptions granted to the platform after discussion with the core team and engineering operations team. Failure to maintain the platform’s feature parity may result in the platform being removed from consideration as a major platform with input from the engineering operations team.

We should create a table of functionalities that are deemed as part of the platform compatibility. This would begin as a forward facing document, as enumerating the existing features is deemed too onerous. This document should reside in the Swift repository. Changes by the community to backfill past features are acceptable, however, no guarantees would be made about the completeness of the document for features prior to Swift 6. Issues tracking the implementation of the feature could be linked to JIRA if appropriate.

Feature implementers should be encouraged to reach out to the platform ports owners to ensure that problem areas of integration are addressed early in the design phase to help reduce the problems of the feature causing problems for the ports if they are introduced without consideration for other platforms. This would help ensure that platforms do not regress on functionality as new features are introduced into the project.

Build Improvements

In order to ease the builds with different C++ toolchains, we should enable any additional diagnostics which increase the likelihood of clang catching a compile issue that other compilers may see. This includes things like enabling -Werror=gnu and using -std=c++14 rather than -std=gnu++14. Whenever possible, we should enhance the diagnostics in clang to identify the issues that other compilers identify (assuming that they are not actual issues in the other compiler).

In order to make it easier for collaboration in maintaining ports, we should introduce a new document to centralize documentation on how to address common pitfalls when targeting all major platforms. This document would be built up incrementally. Since a large source of these issues tends to be undefined behaviour in the (C++) language, it would be best to enable a UBSAN build of the compiler.

Evolving Testing

Testing software in an automated fashion is critical to ensuring that things do not regress. Oftentimes when tests fail on a platform they are indicating that something is not being handled completely and that we are relying on specific behaviours. Simply marking the test as XFAIL largely leaves the tests disabled indefinitely. Having test expectations diverging on different platforms means that issues on the platform remain unnoticed. Filing a defect report on the issue does not guarantee that the issue will be resolved (particularly if it is deep within the original implementation and not something that the platform owners are familiar with).

One option to reduce the impact of XFAIL on a particular platform is to universally mark the test as XFAIL unless there is a fundamental platform specific behaviour that is being tested (which indicates that the test may be better to mark as UNSUPPORTED). This would help ensure that platforms are mutually evolving.

It may be beneficial to apply some code coverage metrics to the tests to ensure that platforms are being tested equally. There is some form of this which is not being highlighted currently: lit provides a testing summary, and perhaps we need to highlight the summary more effectively to indicate the quality of testing for a change. Better visibility in the characteristics of the tests being disabled should allow us to make more informed decisions.

Continuous Integration

In order to ensure that developers are able to quickly identify the problems, we need to have the ability to quickly test changes against different platforms. The Windows support has had post-commit testing for nearly a year. Currently, the Windows platform is possible to test in CI with optional opt-in. It would be beneficial to enable this to non-blocking pre-commit to ensure both the stability and scalability of the testing infrastructure. As we gain better confidence in the ability to support development in a pre-commit form, we should consider moving it to a required pre-commit test.

Escalation Policy

We should create a collaborative environment which encourages developers to try to keep all the ports working and progressing together. However, it is important that we have a process in place to escalate issues.

If a specific port fails, the change author should attempt to resolve the issue based on their knowledge and any documentation that is available. If after consulting the documentation and trying to address the issue with their knowledge, they could reach out to the platform owners (note that we do not want this to degrade to “the compile failed on this platform” hand it off to the platform owners, change author should make an effort to resolve the issue).

The platform owners should be given ample time (~24 hours excluding holidays/weekends) to respond to the issue. If the platform owners do not respond or are unable to resolve the issue, it is acceptable to mark test failures as XFAIL and file a release blocker bug to ensure that the issue will be properly addressed.

If possible, it is best to incorporate the changes for the platform into the change itself. However, if the change is too large or completely orthogonal to the change, it may become necessary to file an issue for the platform owner to resolve separately. In such a case, the offending tests would be marked as XFAIL, an issue filed for the port, and the change could continue to be merged while the port maintainers asynchronously resolve the issue.

Release Management

In order to better track tests that we disable temporarily, when tests are marked as XFAIL to allow us to merge a change, we should file a JIRA issue to track the disabled test and mark it as a release blocker so that we can ensure that the problem is addressed before the release. The release manager would be responsible for ensuring that the issues are scrubbed before a release, identifying why a disabled test is not enabled again before a release if that should occur.

The release manager would periodically review the state of the ports (identifying tests which have been disabled either because they could not be solved in time or the port maintainers were not available), and ensure that they are brought to the attention of the port maintainers. This would allow them to make recommendations on priorities. Trends should be identified and used to make suggestions for improvements to policies.

Compiler specific workarounds should be and under a macro to identify the workaround and enable us to remove them easily. This would allow us to quickly remove the workarounds when the compiler requirements are bumped.

Footnotes

  • CI runs indicate that Linux has ~1389 unsupported tests, Windows has ~1397 unsupported tests, macOS has ~190 unsupported tests. Experiences with the Windows port show that some of these tests are inappropriately marked as unsupported and a set of them were enabled during the Windows port. It could be useful to periodically evaluate the current state of the disabled/unsupported tests.
  • The test failures at least for Windows tends to usually be something where the compiler differences require a small change (e.g. a move assignment operator is not synthesized without an explicit request) or path separator issues where the compiler is simply not normalizing the path as it should.
  • This document is written with the current platforms of interest being macOS, iOS, Linux, Windows, and Android as major platforms, but should keep things open for other platforms
22 Likes

Really happy to see movement on platform support!

Is the set of current major platforms in scope for this RFC, or are the list of platforms an example? If we're defining the set of major platforms I would like it to be called out more obviously so that no one can miss it among the more policy-oriented questions.

Are you speaking narrowly about Swift language features, or does this also include tooling (e.g. lldb, swiftpm, sourcekit-lsp)?

Are these already enabled in llvm/clang builds? That seems like a prerequisite for keeping them clean in Swift.

Are you suggesting adding new CI jobs with ubsan, or enabling ubsan in existing builds?

If the XFAIL'd test succeeds, it results in UPASS and the overall test suite is considered to have failed. Are you suggesting changing that? If not, we can't universally XFAIL unless the test really does fail on all platforms.

Is this not the status quo? I thought we have been doing this successfully for a few weeks.

Can you elaborate? We have many (dozens? hundreds?) of places where we are working around clang 3.8 (host compiler on Ubuntu 16.04) deficiencies around moving from return values. It doesn't seem like it would help to artificially macro these.

Thanks for the detailed feedback, and sorry about the delay, been dealing with some bugs that were really pressing.

The list was meant to be an example, but I think that you bring up a valid point that there can be value in making it explicit. The documentation is meant to be more comprehensive than the set of platforms that are currently supported. Does it perhaps make sense to make that:

i.e. Apple platforms, Linux, Windows, and Android

to make it clear that all the Apple platforms are being included?

For the moment, I think that it is narrowly the Swift language features, though my personal preference would be for this to be broader with the tooling as well over time.

These are already enabled for the Swift builds already, it is actually stating the status quo.

I think that nightly runs of UBSAN would be sufficient to help flush out issues with undefined behaviour, though we could do that as a pre-commit check if there are enough resources. IIRC, LLVM actually does it nightly.

Ah, right. I think that we could mark the test as REQUIRES: SR-???? as a way to disable the test though. Thanks for pointing that out.

It is indeed the status quo :). This document was in the works for a while, so some of the things have come to pass.

I don't expect to be able to retroactively do this. This is a suggestion for approaches moving forward to help identify these issues and make it easier to switch to better forms later. This also allows platforms where the compiler may be able to do better to take advantage of it.

Please note that I wont be updating the document in place, but, would make the changes once we come to a consensus.

1 Like

You touched a little on toolchains like MSVC, GCC. Is there a desire for all three major tools chains to produce similar Swift toolchain to Clang? Is there tooling already in place that is able to verify this comparability during a merge validation?

I think that it depends on the definition of producing a toolchain. The Swift toolchain needs a copy of clang as well for the C/C++ interoperability. As such, clang is a hard requirement. The recommendation is to try to maintain the ability to build swiftc with GCC, MSVC, and clang as LLVM and clang already build themselves with all three.

Note that the runtime and even the core libraries really do need clang to build and will remain that way (either due to ABI support requirements or because of use of GNU extensions).

No, there is not, but that is relatively easy to do. The Windows builds actually already do build with MSVC. The Apple platforms already build with clang. So, we could switch the latest Ubuntu release to gcc while having the older Ubuntu releases use clang. This will allow us to have the coverage and keep up with the newer language features. This split is purely for allowing us to stick to the latest GCC on the LTS release. If the older release have a new enough GCC, then it may make sense to stay with the same compiler.

1 Like

Nice!

Sounds reasonable. Maybe we should just add ubsan to our asan bots since we can use them together.

That sounds great to me, sounds like something that just needs to be figured out how to adjust the flags for those builds.

One of the issues that may be encountered is that corelibs testing uses XCTest, which does not have a XFAIL notion built-in. Foundation has a hacked-up internal API for marking tests that are known to fail (testExpectedToFailOn…(…)) but it does not run those tests once XFAIL'd.

(XCTest has a bevy of other limitations that are not directly relevant but may become so if we were to pursue improvements here — for example, its inability to do crash testing.)

3 Likes

If you're not running the xfail'd test anyway (ie it's not verifying that the test actually does fail), could you use XCTSkipIf functionality added in Introduce XCTSkip and related APIs for skipping tests by stmontgomery · Pull Request #297 · apple/swift-corelibs-xctest · GitHub ?

I wasn't aware, and I'm happy it's there now that I am. I agree that we should adopt it for the current implementation.

It would be better that we run the test and verify it fails, so that if a policy assumes that XFAIL tests are, it is consistent across the board.

Edit: Also, as above, this may require things like proper crash testing without Xcode as a runner.

2 Likes

Thank you for this thoughtful proposal for the support of multiple platforms. It would be fantastic to have Swift available wherever one wants it. Swift is a great language and millions of people create with it. But is it perfect today? Well, no, there are still some improvements that its users would love to have. And there's the rub: How does the support for additional platforms affect our ability to make those improvements?

The tradeoff I see, which I would like to made more explicit, is: How much cost do we want to pay in productivity for each platform we support? How can we measure how much we do pay? I don't mean to propose a number here, rather I don't want us to die the death of a thousand cuts. That is, we might adopt a number of process proposals, each of which seems reasonable at the time, only to find ourselves in a place where we feel weighed down by the cumulative effect of many small burdens.

It takes additional time for both engineering and testing to make anything work across such disparate platforms as Unix and Windows. The skill set an engineer must have to navigate the tools is not the same across platforms.

I would like to see an inventory of the costs, an agreement of how much cost we are willing to pay, and some means of measuring the ongoing costs. The size of the existing code base, the long time it takes to build a compiler, the pitfalls involved with debugging build failures caused by inconsistencies in libraries, LLVM, etc., the difficulties in recreating crashes, the CI testing time, the PR review time, already smother the flames of our ingenuity. Can we support other platforms without snuffing them out? How? Let's strike the balance, consciously, deliberately, and with data.

I see this tradeoff differently -- in some sense, it is not a tradeoff at all, and neither it is a new one. Spending engineering time on supporting diverse platforms is an inherent cost of developing a successful programming language. I think the Swift project has committed to these costs when Apple decided to make Swift a portable programming language, relevant outside of the Apple ecosystem.

If it is a tradeoff, it is not about choosing how much time to spend supporting platforms that one is not immediately interested in, it is about whether Swift should be a successful programming language relevant outside of Apple's operating systems.

For example, PRs that my coworkers and I submit these days are primarily motivated by improving Swift on Linux. However, it would be unthinkable in these these changes we would knowingly break Swift on Apple's operating systems -- and indeed, it is enforced by presubmit checks on PRs. If my PR passes presubmit checks and then some post-submit tests discover breakage on Apple operating systems, I expect that my PR would be rolled back as soon as the issue is noticed. I see my costs of supporting Apple operating systems as an integral part of the cost of making the changes that I need to make -- and I hope others take the same point of view when they have to spend extra time on Linux support.

It is understandable that a given contributor could be interested in only some of the platforms that Swift supports, or even only one. However, the open source collaboration would only work if we understand this conflict of interest and collaborate.

If developers who are interested only in a certain platform would ignore the concerns of people who support other platforms, we would have an adversarial environment, and that's not a good thing.

I think that developers who are interested in improving Swift on a certain platform X, regardless of whether the interest is personal or funded by some company, should be prepared to spend at least the same amount of effort for every other supported platforms as they spent on implementing platform-specific parts for their platform X.

@compnerd I generally support the direction of the RFC, but I'd prefer to more clearly separate the general principles from specifics. For example, general principles section should say that we want Swift to support multiple platforms with the same set of features, however, it should not mention any specific compiler flags, compilers, operating systems, or ISAs. A separate section (or a separate document?) should detail which platforms, which compilers, etc. are exactly supported, and to what extent, something like Redirecting....

I think 24 hours is definitely not enough, especially if you factor in the timezone difference: 24 hours is not enough for 1 roundtrip of communication between west coast of the US and EMEA.

A carte blanche to disable tests is not the current practice, I think -- nor I think it would be a healthy policy for the project. Imagine that someone is landing a change that improves support for a non-Apple OS, but breaks a bunch of macOS overlay tests. I don't think it is acceptable today to just disable those tests 24 hours after sending an FYI notice to an engineer from Apple. In fact, I don't think it would be acceptable to disable those tests after any implicit communication timeout -- I'd expect it to require explicit approval from an Apple engineer.

5 Likes

The current references to specifics are more for example purposes, not to be an exhaustive list. I agree that such a document would be well outside the scope of this RFC, this is meant to get agreement on the principles for the development.

That is a purely a line in the sand value, but I think that decreasing below that can be a problem. I must confess I did not consider the round-trip time between the US and EMEA. Could I ask you to suggest a time frame? The important thing here is that it is something that everyone feels comfortable with.

The platform owners are involved - the platform owner for Darwin would likely be someone from Apple. The idea is to encourage developers to actually test and validate their changes. The platform owners are meant to help advise how to solve the problem not do all the work themselves. That last point is crucial - 1-2 engineers cannot be responsible for the entire platform, but they are meant to help advise the other engineers on the specifics of a platform. The decision to XFAIL a test should be something that is taken in conjunction not as a unilateral decision.

Hey Saleem. I have a few thoughts/concerns here. I also find overall that this document is a bit fuzzy on the details which are /really/ important for this sort of policy document. It would be helpful if you could provide a document with a set of explicit policies/actions (preferably with flow/state diagrams) that show how you imagine this working. I have some idea of what you are trying to express, but I am also sometimes shooting in the dark.

One last note. Below you do not talk about under what context is it acceptable to revert a change that is broken on other platforms instead of XFAIL.

I may do another run through later.

Comments inline below:

| compnerd Saleem Abdulrasool
May 7 |

  • | - |

Hello fellow developers,

In order to make it easier to collaborate on adding additional platforms, it would be useful to formalise some expectations around platform development strategies to ensure that the work around all the other goals can continue effectively. I would like to request comments on the following attempt to create such a document. This is the effective continuation of the previous thread about how do we enable platform proliferation for Swift and encourage contributors to work on different ports.

I hope that we can get a productive discussion from this and create a strong basis for continued efforts for new targets.

Saleem

Proposal: Policies for Swift Platform Development

Core Principles

In order to create a stable ecosystem for Swift, it is important that we maintain a single coherent ecosystem across platforms. Whenever technical feasible, the project should aim to provide the same interfaces, behaviours, and capabilities on every platform. For example, the compiler and build system should support both static and dynamic linking of libraries on all platforms.

This not only makes it easier for users of the language to be able to target different environments easily, but also the language implementers by reducing the complexity of the implementation through fewer divergent code paths. By sharing the code paths, the implementation can be vetted more thoroughly by multiple toolchains (e.g. MSVC, gcc, clang). This gives a higher reliability in the Swift compiler itself and helps catch issues before users hit them. It also enables additional avenues of approach for isolating and debugging defects when they are found.

We should also exploit temporal locality in our approach to system development. It is easier to resolve issues with the context fresh in mind rather than trying to recover the details after time has passed. As such, it is important to ensure that we carefully time bound regressions in order to address them quickly and efficiently.

While I agree that it is important to fix issues as soon as possible, it is important that we have an escape hatch/flexibility here. I go into that more below when you talk about this. Every time we want to allow for an escape hatch, we should be explicit.

Because future features often build upon existing functionality, removal of functionality from a specific target endangers future functionality on that platform. When not actively maintained, code quickly bitrots, and will result in platforms rapidly regressing.

I would add here that when the code is not actively maintained and tested. The testing part of this is really important from a CI/overall correctness/maintainance point of view.

Enabling Collaboration and Incubation of Features

Software engineering is a collaborative process. As such, it is often required that we have features that are not yet fully ready to be merged in order to enable people to collaborate. This requires that we have a maturation process for features to allow developers to quickly make progress.

One approach to enable this is to permit development features being incubated to be made an explicit opt-in, and unsupported option - no driver level control, the feature can only be accessed through the frontend, and is marked as experimental. That is, features can be gated as -Xfrontend -enable-experimental-feature which restricts and explicitly marks the feature as incomplete. When the feature is formally introduced with driver level options and control, they would be enabled on the major platforms (given that the feature is not inherently platform specific).

Features graduating from experimental to production would be evaluated across the major platform environments (i.e. macOS, Linux, Windows, Android). The new feature work should be enabled across these platforms, with the possibility of exceptions being granted in certain circumstances (e.g. the feature does not make sense on a particular platform).

Who makes the decision here? The core team? Does it happen as part of the evolution process? You do not elaborate. Can you explain?

Also, there are many places in this document that are important to back reference to this escape hatch. I’ll mark it inline when I see it.

One last thing to add is that it may be useful to have a notion here of an incubated feature that goes through the evolution process but is not considered a truly production feature (and in the language) until it is available on all platforms.

Platform Support

In order to be able to support multiple platforms, we need to have a means for having platform owners who are the points of contact for developers to reach out to in the case of platform specific issues when they arise. Additionally, although many of the day-to-day issues are easily resolved by builds and tests, there are possibilities of platform specific issues to come up. Having a platform owners group who is explicitly identified to help resolve the issues will ensure that the different platforms are able to help keep the overall project healthy.

Additionally, having a tracking mechanism to identify specific features and each individual test that is disabled for any reason that is time-bound to be fixed, will ensure that the platforms continue to remain in a healthy state. Feature authors should be responsible for working with the platform owners to ensure that features are enabled on other platforms before the next release.

I do not think that it is reasonable to expect all features to be on all platforms before the next release. Consider a late landing feature towards the end of a release. Dealing with regressions on other platforms before the release goes out may not be an acceptable outcome given scheduling constraints. That being said, a few thoughts:

  1. I /do/ think though that it would be reasonable to expect a feature to be either in the next release or release +1 (noting that release+1 would be for late landing features). That will guarantee that there is some “down time” for engineers in their schedule (when the schedule calms down) when the engineers can invest time in fixing these issues. This would help avoid the inherent friction in between individual contributors schedules and platform support. Inherently in my mind the feature would be considered either experimental or in incubation.

  2. This to me is a place where we could use the feature flag+incubation sort of thing to work around this problem. The document should have a back reference IMO to the feature flag thing.

As platform support improves, it would be possible to expand the major platform list. In order to be considered a major platform, the platform should be at a comparable feature set with the other platforms.

What is a “comparable feature set”? I guess this is defined by the platform compatibility “table of functionalities” you mention below?

Under certain circumstances, it may be possible to have exceptions granted to the platform after discussion with the core team and engineering operations team. Failure to maintain the platform’s feature parity may result in the platform being removed from consideration as a major platform with input from the engineering operations team.

Explicitly who makes this decision? This is the type of fuzzy language I spoke about above. We need to be precise of word here.

We should create a table of functionalities that are deemed as part of the platform compatibility. This would begin as a forward facing document, as enumerating the existing features is deemed too onerous. This document should reside in the Swift repository. Changes by the community to backfill past features are acceptable, however, no guarantees would be made about the completeness of the document for features prior to Swift 6. Issues tracking the implementation of the feature could be linked to JIRA if appropriate.

Who makes this table and decides what should be in it?

Feature implementers should be encouraged to reach out to the platform ports owners to ensure that problem areas of integration are addressed early in the design phase to help reduce the problems of the feature causing problems for the ports if they are introduced without consideration for other platforms. This would help ensure that platforms do not regress on functionality as new features are introduced into the project.

Sometimes this is not possible. Consider a feature that is heavily being worked on and is not ready until late in the development cycle. This is the sort of friction that we need to avoid in the project. I /do/ think though that if we allow experimental feature flags + a requirement around release N+1 we can avoid this friction.

Build Improvements

In order to ease the builds with different C++ toolchains, we should enable any additional diagnostics which increase the likelihood of clang catching a compile issue that other compilers may see. This includes things like enabling -Werror=gnu and using -std=c++14 rather than -std=gnu++14. Whenever possible, we should enhance the diagnostics in clang to identify the issues that other compilers identify (assuming that they are not actual issues in the other compiler).

In order to make it easier for collaboration in maintaining ports, we should introduce a new document to centralize documentation on how to address common pitfalls when targeting all major platforms. This document would be built up incrementally. Since a large source of these issues tends to be undefined behaviour in the (C++) language, it would be best to enable a UBSAN build of the compiler.

I would add here that we should add static analysis and things like clang-tidy. I think having bots like that could avoid the majority of these issues. Any discussion like this should be tied in with a suggested overall minimum CI.

Evolving Testing

Testing software in an automated fashion is critical to ensuring that things do not regress. Oftentimes when tests fail on a platform they are indicating that something is not being handled completely and that we are relying on specific behaviours. Simply marking the test as XFAIL largely leaves the tests disabled indefinitely. Having test expectations diverging on different platforms means that issues on the platform remain unnoticed. Filing a defect report on the issue does not guarantee that the issue will be resolved (particularly if it is deep within the original implementation and not something that the platform owners are familiar with).

Isn’t this just asking for better tooling around tracking XFAILs?

Also, I think it is important to make a reference back to the experimental feature flag section and say that XFAILing for experimental feature flags are inherently ok.

One option to reduce the impact of XFAIL on a particular platform is to universally mark the test as XFAIL unless there is a fundamental platform specific behaviour that is being tested (which indicates that the test may be better to mark as UNSUPPORTED). This would help ensure that platforms are mutually evolving.

It may be beneficial to apply some code coverage metrics to the tests to ensure that platforms are being tested equally. There is some form of this which is not being highlighted currently: lit provides a testing summary, and perhaps we need to highlight the summary more effectively to indicate the quality of testing for a change. Better visibility in the characteristics of the tests being disabled should allow us to make more informed decisions.

Continuous Integration

In order to ensure that developers are able to quickly identify the problems, we need to have the ability to quickly test changes against different platforms. The Windows support has had post-commit testing for nearly a year. Currently, the Windows platform is possible to test in CI with optional opt-in. It would be beneficial to enable this to non-blocking pre-commit to ensure both the stability and scalability of the testing infrastructure. As we gain better confidence in the ability to support development in a pre-commit form, we should consider moving it to a required pre-commit test.

Why are we mentioning Windows specifically here? This is a forward thinking policy document. Specific platforms are irrelevant since we are talking about general properties. Can you rephrase this in terms of general platform support and minimum CI requirements for platforms. That is not mentioned here. We should be explicit about the minimum CI requirements of testing.

Escalation Policy

This section needs to be significantly more explicit. I would like an explicit flow/state diagram.

We should create a collaborative environment which encourages developers to try to keep all the ports working and progressing together. However, it is important that we have a process in place to escalate issues.

If a specific port fails, the change author should attempt to resolve the issue based on their knowledge and any documentation that is available. If after consulting the documentation and trying to address the issue with their knowledge, they could reach out to the platform owners (note that we do not want this to degrade to “the compile failed on this platform” hand it off to the platform owners, change author should make an effort to resolve the issue).

The platform owners should be given ample time (~24 hours excluding holidays/weekends) to respond to the issue.

IMO 24 hours isn’t enough. I think 2-3 business days is more appropriate.

If the platform owners do not respond or are unable to resolve the issue, it is acceptable to mark test failures as XFAIL and file a release blocker bug to ensure that the issue will be properly addressed.

I imagine tracking release blocker bugs will mean we need to improve our JIRA work flow/create tooling. Do you have any thoughts on this/know where we are?

If possible, it is best to incorporate the changes for the platform into the change itself. However, if the change is too large or completely orthogonal to the change, it may become necessary to file an issue for the platform owner to resolve separately. In such a case, the offending tests would be marked as XFAIL, an issue filed for the port, and the change could continue to be merged while the port maintainers asynchronously resolve the issue.

Release Management

In order to better track tests that we disable temporarily, when tests are marked as XFAIL to allow us to merge a change, we should file a JIRA issue to track the disabled test and mark it as a release blocker so that we can ensure that the problem is addressed before the release. The release manager would be responsible for ensuring that the issues are scrubbed before a release, identifying why a disabled test is not enabled again before a release if that should occur.

The release manager would periodically review the state of the ports (identifying tests which have been disabled either because they could not be solved in time or the port maintainers were not available), and ensure that they are brought to the attention of the port maintainers. This would allow them to make recommendations on priorities. Trends should be identified and used to make suggestions for improvements to policies.

I would be clear that this means that we need to invest in tooling here.

To understand what the general principles and workflow should be, and to evaluate them, I think it would be helpful to see the specifics as well. For example, when we're talking about acceptable XFAIL'ing of tests, it would be reasonable to have a different policy depending on the platform (stable and supported platforms vs. experimental and upcoming), or depending on the features that are broken, or depending on the amount or severity of breakage.

For any sort of communication timeout with specific people, I think 3-5 business days would be a better time period. Specific people can be sick, take vacation etc. If we have a rotation that is always staffed with people who are working on that day (like a buildcop rotation), then 2-3 business days.

However, I don't think that a communication timeout really works as a tie breaker. We should rather ensure that we don't have communication timeouts, that people remain responsive.

I don't think it is acceptable to just involve platform owners (which in practice means sending an FYI message about the breakage) and then XFAIL tests if they fail to respond ("If the platform owners do not respond or are unable to resolve the issue, it is acceptable to mark test failures as XFAIL and file a release blocker bug to ensure that the issue will be properly addressed.") Maybe the policy should be different for different platform support tiers, but at least for some platforms I don't think any regression would be tolerated (today, it would be Apple operating systems for example).

I think it depends on the nature of the feature. Features that are by their nature cross-platform should be available uniformly on all platforms -- otherwise, we are fragmenting the language. It is no longer "Swift version X.Y", it is "Swift version X.Y as implemented on platform Z".

Imagine that we would want to add a new API to Array, but due to implementation difficulties (say, interaction with the memory allocator) engineers who are contributing the feature can only implement the API on Windows within the couple of weeks that are left before the release. I don't think we should ship such an API in a release.

A feature landing late is not an excuse to skip the evolution process, fragment the language by introducing differences across platforms, or cut corners in any other way. If a feature can't pass the quality bar, it should not make it into the release. We can set a different quality bar for different platforms, different aspects of the ecosystem, etc., but "people implementing the feature are short on time" absolutely must never be a factor in determining the quality bar.

2 Likes

In that case, would it not be better to delay the release on the failing platform? Having certain features be available from language version 1 on platform X and from language version 2 on platform Y would be a really poor user experience.

8 Likes

The decision to XFAIL a test should be something that is taken in conjunction not as a unilateral decision.

Yes, this is an important point: I believe that XFAILing a test should be the last resort and done in agreement with the platform maintainers and not the common practice.

It should be the burden of the patch author to make sure that the patch works on all supported platforms. If a patch breaks a platform and we need to get the bots blue, the patch should be reverted instead of pushing technical debt onto the platform maintainers in the form of an XFAIL. It should be the patch author's responsibility to work with the platform maintainers on a solution — the solution may be an XFAIL if it is decided that the partch isn't going to be supported on the platform, but it should be decided ahead of time and together with the platform maintainers.

4 Likes