Hello Swift community!
I would like to pitch the creation of a package containing APIs that provide a unified approach to publishing and consuming results data generated by developer activities, including (but not limited to) building and testing, both at desk and in continuous integration services. A few examples of this data are build logs, test outcomes, issues (build, test, runtime, etc.), retry details, CI job orchestration, package resolution, and VM provisioning steps.
This proposal represents the work of my team of Xcode engineers at Apple who are responsible for test and build results. There is functioning prototype code for many aspects of the proposal below, but those implementations are in no way assumed to be foregone conclusions about how the pitched project will actually unfold. Rather, we hope to use them to guide discussions and welcome alternative approaches.
We hope to break ground on this project soon and look forward to your feedback.
Problem
Developer activities such as building and testing produce considerable amounts of data designed to answer two key questions:
- Did it work?
- If not, why?
This data takes many forms, ranging from raw text logging to domain-specific customized structures to arbitrary assets such as screen recordings. Processing the data to answer the key questions - especially the "why" - is challenging for both humans and machines. The biggest cause of this difficulty is the lack of consistent structuring. As a result, tools cannot efficiently and reliably direct a user to the relevant bits that can help them triage and diagnose problems in their work.
Goals
- Organize data from build and test actions using a consistent structure.
- Facilitate viewing of results information by humans.
- Allow efficient machine-parsing of results by tools and IDEs.
- Collect results data in a single place which can be inspected or transferred as one artifact.
- Permit multiple entities (e.g. processes, hosts/devices, services) to contribute to a single stream of events.
- Support extensibility by allowing publishers to represent custom data when necessary.
- Enable sophisticated UI tools to “stream” events and display live updates efficiently.
- Simplify analysis by modern AI/LLM tools with limited context windows.
- Align the experience of working with results data from local and CI operations.
- Provide the data foundations for historical metrics tracking (build performance, test results/durations)
An approach to structured results
This project proposes several API modules to provide a structured results foundation for test frameworks, build systems, CI infrastructure, IDEs, and other tooling.
A new Package
These APIs would be implemented in a new package, housed in a new repo within the GitHub repo under /swiftlang . Initially we would expose the libraries in this package for other toolchain projects to import/link, but not for regular users to directly import in their own projects. The CLI tool would eventually be exposed in OSS toolchains and made available to end users.
Publishing APIs
The publisher modules are meant to be adopted by entities that produce results data, such as a test framework or build system. These modules will have an extremely minimal set of dependencies so that adopters such as a test framework can do so without introducing a cycle or other conflict with projects that use the test framework. The publisher approach is also designed as a distributed system, where results data can be emitted from multiple processes on multiple hosts using a "channel" abstraction.
Consumption APIs
To process the structured results data, the Consumer modules will provide APIs that can be used to read, filter, and transform the data, both "live" - while it is actively being produced - and with completed results. While the publisher is meant to be minimal in its dependencies, the consumer does not have the same constraints.
CLI
A command line tool that wraps the Consumer module is included for utility. It will support actions such as:
- merging results from different jobs
- exporting results data to other standard formats
- exporting attached assets to files
- getting aggregate information from a result, including such things as the number of test failures, a list of build issues, duration of the job or portions of the job
- etc.
Data format/specification
The data will be emitted as JSON Lines ("JSONL", https://jsonlines.org), where each line represents a distinct telemetry event - a span start, a "record" (leaf node), or a span finish. JSONL makes it easy to manage multiple discrete streams from different (distributed) producers, flexible batch management, and easy (cheap/high performance) manipulation in terms of combining data without having to process it.
Each event contains a structured "payload" with a core schema expressed in the project for things such as test runs, compilation tasks, build/test diagnostics, user-defined activities, and more. This schema also supports extension in clients and composition of other, arbitrary data. Payload extensibility will make this generalized system for structured results function as a compliment (rather than a competitor) to the adoption of domain-specific schemas, such as proposed for SARIF in https://forums.swift.org/t/pitch-sarif-support-for-swift-diagnostics/85513.
The structured results schema (including its core types) will be published using OpenAPI or similar approaches to enable generation of interfaces in other languages, as needed by third party tooling.
Example usage scenario
A user initiates a Test action from an IDE on a connected mobile device
In this scenario there are at least two processes involved, executing on different hosts - the IDE (e.g. Xcode) and the test runner (e.g. Swift Testing). Each generates data relevant to the results, so each is considered a producer with its own channel for publishing. Channels - while distinct streams from each other - are also spans that encode parent references like every other event, providing context within the larger “session” of results. The IDE channel begins emitting events as soon as the user action commences, providing a high level organization of the result data, perhaps including details about the code being tested, versions of languages/compilers/other tools and any other context that is pertinent. The test runner on the device creates its own channel when it launches, with some results metadata forwarded from the IDE, and begins to emit its own events including any initialization/preparation data and then the tree of its test executions and their individual results. The output of this channel may be written to a file on the device and retrieved on completion or streamed back over a socket to the IDE so that it can show live results as tests are executed. The consumer module would be used for capturing, aggregating, and extracting data for display to the user.
Candidate Clients
The success of this effort is predicated on buy-in and adoption by projects which generate and consume the kind of data defined here as “developer results”. We’ve identified the following as ideal adopters and are either actively engaged with them now or planning to reach out shortly:
- Swift Testing
- Swift Build
- Swift Package Manager
- Corelibs XCTest
Alternatives considered
Using an “off the shelf” telemetry solution
The proposed direction for this project is strongly influenced by telemetry solutions, so it's a reasonable question whether those systems would be sufficient in themselves. There are two considerations that led to the decision to develop a solution that borrows from telemetry concepts but is not itself a pure telemetry solution. The first of these is the value of arbitrary (and potentially large) attachments as diagnostic tools. The second is the importance of a strong schema of types for the various actions (build, test, etc.) that goes beyond the attribute dictionary level of user data provided by telemetry systems both in terms of its structure and its extensibility.
Use Protobuf or another serialization format
The advantages of JSONL are covered in the data format section above, but there are alternative ways data might be serialized including formats such as Protobuf, which offer their own advantages. This is an area where the technology choice will continue to be evaluated, with tradeoffs such as dependency requirements, storage characteristics, parsing efficiency, and others will be considered. It’s also possible that the serialization approach will be “pluggable”, with clients being able to select from a set of options or provide their own solution.
Open source the .xcresult bundle format
A predecessor technology in this problem space is Apple's Result Kit/result bundles (".xcresult") produced by Xcode during building and testing actions. Result bundles generated by Xcode may only be inspected using its included xcresulttool CLI tool. The Result Kit approach differs considerably from the one proposed in this project because much of the data is post-processed before being written, rather than preserving the original "raw" sequence of events. In addition, for CI systems and other services, it's preferable to decouple the storage of large attachment files from the event data. Another consideration is that formulating the result as a stream of events, rather than a finalized artifact, makes it possible to incrementally deliver results back to users. Finally, the bundle structure itself is an Apple-platform concept, which makes it less appealing for a solution aimed at a cross-platform audience. That said, it is still possible that we would define a "bundle-like" directory structure for managing combined event and attachment data in local and peer-sharing scenarios.
Implementing a storage service
Storage of results data in cloud services is a closely related problem but one we think should remain decoupled, at least in the initial phases of the project. The focus here is to design APIs and protocols that efficiently support cloud storage services, without implementing those services themselves. Storage requirements vary considerably between organizations and use cases, in terms of:
- scale of the data produced
- analysis goals for the data
- duration of storage