Proposal: External Telemetry Service Integration for swift-build

This proposal introduces a mechanism for swift-build to emit build telemetry events to an external service, enabling observability without adding dependencies to the swift-build binary itself.

Context

We are building Tuist, a platform that helps teams optimize their builds, test runs, and project maintenance. A core part of our mission is providing actionable insights based on build telemetry data.

Historically, extracting meaningful data from Xcode builds has required parsing proprietary, undocumented formats and reverse-engineering build artifacts. This approach is fragile, limited in scope, and breaks across Xcode versions. Teams building developer tooling have long struggled with this opacity.

With swift-build now being open source, we see an opportunity to address this at the source. Rather than continuing to work around closed systems, we would like to propose a first-class mechanism for exposing build telemetry in a standard, extensible format. This would benefit not only Tuist but the entire ecosystem of developer tools, CI/CD platforms, and observability solutions that serve Swift developers.

Motivation

As Swift adoption grows in larger codebases and CI/CD environments, understanding build performance becomes increasingly important. Teams need insights into:

  • Build duration breakdowns: Which targets and tasks consume the most time?
  • Incremental build efficiency: How often are tasks rebuilt unnecessarily?
  • CI/CD optimization: Where are the bottlenecks across many builds?
  • Regression detection: Did a recent change slow down builds?
  • AI-assisted architecture decisions: Coding agents and AI assistants can query historical build data to provide actionable insights, such as identifying modules that should be split, detecting circular dependencies that slow builds, or recommending parallelization opportunities based on the dependency graph.
  • Cross-environment and historical analysis: Exporting telemetry to a centralized backend enables teams to collect build data across developer machines, CI runners, and different configurations. This longitudinal view reveals trends over time: Are builds getting slower? Which changes caused regressions? How do build times compare across the team?
  • Build archaeology: With accumulated telemetry data, teams can perform forensic analysis of their build history. When did this target start taking twice as long? Which commit introduced the dependency that broke incremental builds? Why did CI times spike last quarter? This "archaeology" of build data turns tribal knowledge into queryable facts, helping teams understand the evolution of their codebase and make informed decisions about technical debt.
  • Rebuild causality: Understanding what rebuilt is only half the picture - teams need to know why. Was it a source file change? A transitive dependency update? Modified build settings? Cache eviction? Without causality information, diagnosing false cache invalidations or tracing which commit broke incremental builds becomes guesswork. Exposing rebuild reasons transforms debugging from "something changed" to "this specific input changed, triggering this rebuild chain."

While swift-build has extensive internal observability infrastructure (delegates, protocol messages, activity tracking), this data is not easily accessible to external tools. The existing SWIFTBUILD_TRACE_FILE provides post-build analysis but lacks real-time streaming and structured telemetry semantics.

Industry-standard observability platforms like OpenTelemetry, Jaeger, Zipkin, and Prometheus have become the backbone of modern infrastructure monitoring. Integrating swift-build with these ecosystems would unlock powerful analysis capabilities without reinventing tooling.

Prior Art

Other build systems have already embraced OpenTelemetry integration. Notably, the Gradle ecosystem has the opentelemetry-gradle-plugin, which exports build traces to any OpenTelemetry-compatible backend. This plugin has seen adoption in the Android and JVM communities, demonstrating the value of build observability. Swift developers deserve similar capabilities.

However, directly coupling swift-build to OpenTelemetry (or any specific telemetry framework) would:

  • Add significant dependencies to a core piece of Apple's build infrastructure
  • Force a particular telemetry choice on all users
  • Increase binary size and compilation time
  • Create maintenance burden for dependency updates

Proposed Solution

We propose a minimal, decoupled design where swift-build emits telemetry events to an optional external service over a Unix socket. A separate telemetry service (outside the swift-build repository) receives these events and can forward them to any backend.

Architecture Overview

+----------------------------------------------------------+
|                      swift-build                          |
|  (no new dependencies)                                    |
|                                                           |
|   +---------------------------------------------------+   |
|   |  TelemetryEmitter                                 |   |
|   |  - Enabled via SWIFTBUILD_TELEMETRY_SOCKET        |   |
|   |  - Connects to Unix socket                        |   |
|   |  - Emits MsgPack-serialized events                |   |
|   |  - Non-blocking, fire-and-forget                  |   |
|   +------------------------+--------------------------+   |
+----------------------------|------------------------------+
                             | Unix socket
                             v
+----------------------------------------------------------+
|              swift-build-telemetry                        |
|  (separate repository/binary)                             |
|                                                           |
|   - Listens on Unix socket                                |
|   - Receives and decodes events                           |
|   - Translates to OpenTelemetry spans/metrics             |
|   - Exports via OTLP to any backend                       |
+----------------------------------------------------------+
                             |
                             v
                  +--------------------+
                  | Jaeger / Zipkin /  |
                  | Prometheus / etc.  |
                  +--------------------+

Design Principles

  1. Zero coupling: swift-build only knows about a socket path, not what consumes events
  2. Opt-in activation: Disabled by default, enabled via environment variable
  3. No new dependencies: Reuses existing MsgPack serialization already in swift-build
  4. Failure tolerance: Telemetry failures never affect build execution
  5. Extensibility: Event schema can evolve without breaking the telemetry service

Detailed Design

Activation

Telemetry emission is controlled by an environment variable:

export SWIFTBUILD_TELEMETRY_SOCKET=/tmp/swiftbuild-telemetry.sock

When set, swift-build connects to the specified Unix socket and emits events. If the socket is unavailable or the connection fails, swift-build continues normally without telemetry.

Event Protocol

Events are serialized using the existing MsgPack format already used for IPC in swift-build. Each event is a self-contained message with a common header:

protocol TelemetryEvent: Serializable {
    static var eventType: String { get }
    var timestamp: UInt64 { get }  // Nanoseconds since epoch
    var sessionID: String { get }   // Links events to a build session
}

Proposed Events

Build Lifecycle Events

Event Description Key Fields
BuildStarted Build operation begins sessionID, configurationName, targetCount
BuildCompleted Build operation ends sessionID, result (success/failure/cancelled), duration

Target Lifecycle Events

Event Description Key Fields
TargetStarted Target build begins sessionID, targetName, targetType, configuration
TargetCompleted Target build ends sessionID, targetName, result, duration

Task Lifecycle Events

Event Description Key Fields
TaskStarted Individual task begins sessionID, targetName, taskType, signature
TaskCompleted Individual task ends sessionID, taskSignature, result, duration
TaskUpToDate Task skipped (cached) sessionID, taskSignature

Diagnostic Events

Event Description Key Fields
DiagnosticEmitted Warning/error produced sessionID, severity, message, location

Causality Events

Event Description Key Fields
TaskInvalidated Task needs rebuild sessionID, taskSignature, reason, changedInputs

The reason field would indicate why the task was invalidated:

  • sourceChanged - A source file in the task's inputs was modified
  • dependencyChanged - A transitive dependency was rebuilt
  • buildSettingChanged - Build settings affecting this task changed
  • outputMissing - Expected output file does not exist
  • signatureMismatch - Task signature differs from cached version
  • cacheEvicted - Task was evicted from the build cache
  • forcedRebuild - Clean build or explicit rebuild requested

The changedInputs field would list the specific files or settings that triggered the invalidation, enabling precise root cause analysis.

OpenTelemetry Mapping

The external telemetry service translates events to OpenTelemetry semantics:

swift-build Event OpenTelemetry Concept
BuildStarted/Completed Root span for the build
TargetStarted/Completed Child span under build
TaskStarted/Completed Child span under target
TaskUpToDate Span with cached: true attribute
TaskInvalidated Span event with invalidation.reason and invalidation.inputs attributes
DiagnosticEmitted Span event or log record
Task durations Histogram metrics

Wire Format

Each message on the socket follows this format:

[4 bytes: payload length (little-endian uint32)]
[N bytes: MsgPack-encoded event]

Example encoded event:

{
  "type": "TaskCompleted",
  "timestamp": 1703123456789000000,
  "sessionID": "abc123",
  "targetName": "MyApp",
  "taskSignature": "swift-compile-MyApp-ViewController.swift",
  "result": "success",
  "duration": 1234567890
}

Implementation in swift-build

The implementation is minimal and confined to a single new component:

// New file: Sources/SWBBuildSystem/TelemetryEmitter.swift

final class TelemetryEmitter: Sendable {
    static let shared: TelemetryEmitter? = {
        guard let path = ProcessInfo.processInfo
            .environment["SWIFTBUILD_TELEMETRY_SOCKET"] else {
            return nil
        }
        return TelemetryEmitter(socketPath: path)
    }()

    private let socket: DispatchIO?
    private let queue = DispatchQueue(label: "swiftbuild.telemetry")

    func emit(_ event: some TelemetryEvent) {
        guard let socket else { return }
        queue.async {
            // Serialize and send, ignoring errors
            let serializer = MsgPackSerializer()
            event.serialize(to: serializer)
            // ... write to socket
        }
    }
}

Integration points in existing code:

// In BuildOperation.swift or OperationDelegate
func buildStarted(...) {
    TelemetryEmitter.shared?.emit(BuildStartedEvent(...))
    // ... existing code
}

func taskCompleted(...) {
    TelemetryEmitter.shared?.emit(TaskCompletedEvent(...))
    // ... existing code
}

External Telemetry Service

A reference implementation (swift-build-telemetry) would be provided in a separate repository:

github.com/swiftlang/swift-build-telemetry  (or community-maintained)

This service:

  • Depends on swift-otel for OpenTelemetry integration
  • Listens on a Unix socket
  • Decodes MsgPack events
  • Creates OpenTelemetry spans with proper parent-child relationships
  • Exports via OTLP to configured backends

Example usage:

# Terminal 1: Start telemetry service
swift-build-telemetry --socket /tmp/swiftbuild-telemetry.sock \
                      --otlp-endpoint http://localhost:4317

# Terminal 2: Build with telemetry
SWIFTBUILD_TELEMETRY_SOCKET=/tmp/swiftbuild-telemetry.sock \
  xcodebuild build -project MyApp.xcodeproj

Alternatives Considered

1. Direct OpenTelemetry Integration

Embedding OpenTelemetry directly in swift-build would provide the richest integration but:

  • Adds ~50+ transitive dependencies
  • Increases binary size significantly
  • Couples swift-build to a specific telemetry framework
  • Creates ongoing maintenance burden

2. Callback/Plugin Architecture

A dynamic library plugin system where users provide a telemetry dylib:

  • More complex to implement and document
  • Security concerns with loading arbitrary code
  • Platform-specific considerations (dylib vs framework vs dll)

3. Extended Trace File Format

Enhancing the existing SWIFTBUILD_TRACE_FILE with more structured data:

  • Only provides post-build analysis, not real-time streaming
  • File I/O overhead for every event
  • Requires polling or file watching

4. stdout/stderr Event Stream

Printing JSON events to a dedicated file descriptor:

  • Simpler than sockets but less flexible
  • Harder to manage in complex build scenarios
  • Potential interference with build output parsing

Security Considerations

  • The Unix socket is local-only, limiting exposure
  • Socket path is user-controlled via environment variable
  • No sensitive build data beyond what's already in build logs
  • Telemetry service runs with user privileges

Future Directions

This proposal intentionally starts minimal. Future enhancements could include:

  • Metrics aggregation: Emit summary statistics, not just events
  • Sampling: Reduce overhead by sampling frequent events
  • Filtering: Allow configuration of which events to emit
  • TCP/UDP support: For remote telemetry collection
  • Structured logging integration: Align with Swift's logging ecosystem

Open Questions

  1. Event granularity: Should we emit file-level events (e.g., per-source-file compilation)?
  2. Backward compatibility: How do we version the event schema?
  3. Buffer behavior: Should swift-build buffer events if the socket is slow?
  4. Metrics vs traces: Should we also emit aggregated metrics, or only trace events?
  5. Session correlation: How should we correlate events across incremental builds?

Acknowledgments

This proposal builds on the excellent existing observability infrastructure in swift-build, particularly the delegate-based architecture and MsgPack serialization.

17 Likes

One thing I’d love to see added: rebuild causality. The current schema captures what rebuilt and how long, but not why. Was it a source file change? A transitive dependency? Build settings? Cache eviction?

This context seems essential for the “build archaeology” use case you mentioned. Without it, diagnosing false cache invalidations or tracing “which commit broke incremental builds” becomes much harder.

4 Likes

That’s a very good idea @calube. I updated the proposal to reflect your suggestion. Let me know what you think about it.

1 Like