Proposal: External Telemetry Service Integration for swift-build

pepicrft · December 22, 2025, 3:47pm

This proposal introduces a mechanism for swift-build to emit build telemetry events to an external service, enabling observability without adding dependencies to the swift-build binary itself.

Context

We are building Tuist, a platform that helps teams optimize their builds, test runs, and project maintenance. A core part of our mission is providing actionable insights based on build telemetry data.

Historically, extracting meaningful data from Xcode builds has required parsing proprietary, undocumented formats and reverse-engineering build artifacts. This approach is fragile, limited in scope, and breaks across Xcode versions. Teams building developer tooling have long struggled with this opacity.

With swift-build now being open source, we see an opportunity to address this at the source. Rather than continuing to work around closed systems, we would like to propose a first-class mechanism for exposing build telemetry in a standard, extensible format. This would benefit not only Tuist but the entire ecosystem of developer tools, CI/CD platforms, and observability solutions that serve Swift developers.

Motivation

As Swift adoption grows in larger codebases and CI/CD environments, understanding build performance becomes increasingly important. Teams need insights into:

Build duration breakdowns: Which targets and tasks consume the most time?
Incremental build efficiency: How often are tasks rebuilt unnecessarily?
CI/CD optimization: Where are the bottlenecks across many builds?
Regression detection: Did a recent change slow down builds?
AI-assisted architecture decisions: Coding agents and AI assistants can query historical build data to provide actionable insights, such as identifying modules that should be split, detecting circular dependencies that slow builds, or recommending parallelization opportunities based on the dependency graph.
Cross-environment and historical analysis: Exporting telemetry to a centralized backend enables teams to collect build data across developer machines, CI runners, and different configurations. This longitudinal view reveals trends over time: Are builds getting slower? Which changes caused regressions? How do build times compare across the team?
Build archaeology: With accumulated telemetry data, teams can perform forensic analysis of their build history. When did this target start taking twice as long? Which commit introduced the dependency that broke incremental builds? Why did CI times spike last quarter? This "archaeology" of build data turns tribal knowledge into queryable facts, helping teams understand the evolution of their codebase and make informed decisions about technical debt.
Rebuild causality: Understanding what rebuilt is only half the picture - teams need to know why. Was it a source file change? A transitive dependency update? Modified build settings? Cache eviction? Without causality information, diagnosing false cache invalidations or tracing which commit broke incremental builds becomes guesswork. Exposing rebuild reasons transforms debugging from "something changed" to "this specific input changed, triggering this rebuild chain."

While swift-build has extensive internal observability infrastructure (delegates, protocol messages, activity tracking), this data is not easily accessible to external tools. The existing SWIFTBUILD_TRACE_FILE provides post-build analysis but lacks real-time streaming and structured telemetry semantics.

Industry-standard observability platforms like OpenTelemetry, Jaeger, Zipkin, and Prometheus have become the backbone of modern infrastructure monitoring. Integrating swift-build with these ecosystems would unlock powerful analysis capabilities without reinventing tooling.

Prior Art

Other build systems have already embraced OpenTelemetry integration. Notably, the Gradle ecosystem has the opentelemetry-gradle-plugin, which exports build traces to any OpenTelemetry-compatible backend. This plugin has seen adoption in the Android and JVM communities, demonstrating the value of build observability. Swift developers deserve similar capabilities.

However, directly coupling swift-build to OpenTelemetry (or any specific telemetry framework) would:

Add significant dependencies to a core piece of Apple's build infrastructure
Force a particular telemetry choice on all users
Increase binary size and compilation time
Create maintenance burden for dependency updates

Proposed Solution

We propose a minimal, decoupled design where swift-build emits telemetry events to an optional external service over a Unix socket. A separate telemetry service (outside the swift-build repository) receives these events and can forward them to any backend.

Architecture Overview

+----------------------------------------------------------+
|                      swift-build                          |
|  (no new dependencies)                                    |
|                                                           |
|   +---------------------------------------------------+   |
|   |  TelemetryEmitter                                 |   |
|   |  - Enabled via SWIFTBUILD_TELEMETRY_SOCKET        |   |
|   |  - Connects to Unix socket                        |   |
|   |  - Emits MsgPack-serialized events                |   |
|   |  - Non-blocking, fire-and-forget                  |   |
|   +------------------------+--------------------------+   |
+----------------------------|------------------------------+
                             | Unix socket
                             v
+----------------------------------------------------------+
|              swift-build-telemetry                        |
|  (separate repository/binary)                             |
|                                                           |
|   - Listens on Unix socket                                |
|   - Receives and decodes events                           |
|   - Translates to OpenTelemetry spans/metrics             |
|   - Exports via OTLP to any backend                       |
+----------------------------------------------------------+
                             |
                             v
                  +--------------------+
                  | Jaeger / Zipkin /  |
                  | Prometheus / etc.  |
                  +--------------------+

Design Principles

Zero coupling: swift-build only knows about a socket path, not what consumes events
Opt-in activation: Disabled by default, enabled via environment variable
No new dependencies: Reuses existing MsgPack serialization already in swift-build
Failure tolerance: Telemetry failures never affect build execution
Extensibility: Event schema can evolve without breaking the telemetry service

Detailed Design

Activation

Telemetry emission is controlled by an environment variable:

export SWIFTBUILD_TELEMETRY_SOCKET=/tmp/swiftbuild-telemetry.sock

When set, swift-build connects to the specified Unix socket and emits events. If the socket is unavailable or the connection fails, swift-build continues normally without telemetry.

Event Protocol

Events are serialized using the existing MsgPack format already used for IPC in swift-build. Each event is a self-contained message with a common header:

protocol TelemetryEvent: Serializable {
    static var eventType: String { get }
    var timestamp: UInt64 { get }  // Nanoseconds since epoch
    var sessionID: String { get }   // Links events to a build session
}

Proposed Events

Build Lifecycle Events

Event	Description	Key Fields
`BuildStarted`	Build operation begins	`sessionID`, `configurationName`, `targetCount`
`BuildCompleted`	Build operation ends	`sessionID`, `result` (success/failure/cancelled), `duration`

Target Lifecycle Events

Event	Description	Key Fields
`TargetStarted`	Target build begins	`sessionID`, `targetName`, `targetType`, `configuration`
`TargetCompleted`	Target build ends	`sessionID`, `targetName`, `result`, `duration`

Task Lifecycle Events

Event	Description	Key Fields
`TaskStarted`	Individual task begins	`sessionID`, `targetName`, `taskType`, `signature`
`TaskCompleted`	Individual task ends	`sessionID`, `taskSignature`, `result`, `duration`
`TaskUpToDate`	Task skipped (cached)	`sessionID`, `taskSignature`

Diagnostic Events

Event	Description	Key Fields
`DiagnosticEmitted`	Warning/error produced	`sessionID`, `severity`, `message`, `location`

Causality Events

Event	Description	Key Fields
`TaskInvalidated`	Task needs rebuild	`sessionID`, `taskSignature`, `reason`, `changedInputs`

The reason field would indicate why the task was invalidated:

sourceChanged - A source file in the task's inputs was modified
dependencyChanged - A transitive dependency was rebuilt
buildSettingChanged - Build settings affecting this task changed
outputMissing - Expected output file does not exist
signatureMismatch - Task signature differs from cached version
cacheEvicted - Task was evicted from the build cache
forcedRebuild - Clean build or explicit rebuild requested

The changedInputs field would list the specific files or settings that triggered the invalidation, enabling precise root cause analysis.

OpenTelemetry Mapping

The external telemetry service translates events to OpenTelemetry semantics:

swift-build Event	OpenTelemetry Concept
`BuildStarted/Completed`	Root span for the build
`TargetStarted/Completed`	Child span under build
`TaskStarted/Completed`	Child span under target
`TaskUpToDate`	Span with `cached: true` attribute
`TaskInvalidated`	Span event with `invalidation.reason` and `invalidation.inputs` attributes
`DiagnosticEmitted`	Span event or log record
Task durations	Histogram metrics

Wire Format

Each message on the socket follows this format:

[4 bytes: payload length (little-endian uint32)]
[N bytes: MsgPack-encoded event]

Example encoded event:

{
  "type": "TaskCompleted",
  "timestamp": 1703123456789000000,
  "sessionID": "abc123",
  "targetName": "MyApp",
  "taskSignature": "swift-compile-MyApp-ViewController.swift",
  "result": "success",
  "duration": 1234567890
}

Implementation in swift-build

The implementation is minimal and confined to a single new component:

// New file: Sources/SWBBuildSystem/TelemetryEmitter.swift

final class TelemetryEmitter: Sendable {
    static let shared: TelemetryEmitter? = {
        guard let path = ProcessInfo.processInfo
            .environment["SWIFTBUILD_TELEMETRY_SOCKET"] else {
            return nil
        }
        return TelemetryEmitter(socketPath: path)
    }()

    private let socket: DispatchIO?
    private let queue = DispatchQueue(label: "swiftbuild.telemetry")

    func emit(_ event: some TelemetryEvent) {
        guard let socket else { return }
        queue.async {
            // Serialize and send, ignoring errors
            let serializer = MsgPackSerializer()
            event.serialize(to: serializer)
            // ... write to socket
        }
    }
}

Integration points in existing code:

// In BuildOperation.swift or OperationDelegate
func buildStarted(...) {
    TelemetryEmitter.shared?.emit(BuildStartedEvent(...))
    // ... existing code
}

func taskCompleted(...) {
    TelemetryEmitter.shared?.emit(TaskCompletedEvent(...))
    // ... existing code
}

External Telemetry Service

A reference implementation (swift-build-telemetry) would be provided in a separate repository:

github.com/swiftlang/swift-build-telemetry  (or community-maintained)

This service:

Depends on swift-otel for OpenTelemetry integration
Listens on a Unix socket
Decodes MsgPack events
Creates OpenTelemetry spans with proper parent-child relationships
Exports via OTLP to configured backends

Example usage:

# Terminal 1: Start telemetry service
swift-build-telemetry --socket /tmp/swiftbuild-telemetry.sock \
                      --otlp-endpoint http://localhost:4317

# Terminal 2: Build with telemetry
SWIFTBUILD_TELEMETRY_SOCKET=/tmp/swiftbuild-telemetry.sock \
  xcodebuild build -project MyApp.xcodeproj

Alternatives Considered

1. Direct OpenTelemetry Integration

Embedding OpenTelemetry directly in swift-build would provide the richest integration but:

Adds ~50+ transitive dependencies
Increases binary size significantly
Couples swift-build to a specific telemetry framework
Creates ongoing maintenance burden

2. Callback/Plugin Architecture

A dynamic library plugin system where users provide a telemetry dylib:

More complex to implement and document
Security concerns with loading arbitrary code
Platform-specific considerations (dylib vs framework vs dll)

3. Extended Trace File Format

Enhancing the existing SWIFTBUILD_TRACE_FILE with more structured data:

Only provides post-build analysis, not real-time streaming
File I/O overhead for every event
Requires polling or file watching

4. stdout/stderr Event Stream

Printing JSON events to a dedicated file descriptor:

Simpler than sockets but less flexible
Harder to manage in complex build scenarios
Potential interference with build output parsing

Security Considerations

The Unix socket is local-only, limiting exposure
Socket path is user-controlled via environment variable
No sensitive build data beyond what's already in build logs
Telemetry service runs with user privileges

Future Directions

This proposal intentionally starts minimal. Future enhancements could include:

Metrics aggregation: Emit summary statistics, not just events
Sampling: Reduce overhead by sampling frequent events
Filtering: Allow configuration of which events to emit
TCP/UDP support: For remote telemetry collection
Structured logging integration: Align with Swift's logging ecosystem

Open Questions

Event granularity: Should we emit file-level events (e.g., per-source-file compilation)?
Backward compatibility: How do we version the event schema?
Buffer behavior: Should swift-build buffer events if the socket is slow?
Metrics vs traces: Should we also emit aggregated metrics, or only trace events?
Session correlation: How should we correlate events across incremental builds?

Acknowledgments

This proposal builds on the excellent existing observability infrastructure in swift-build, particularly the delegate-based architecture and MsgPack serialization.

calube · December 22, 2025, 4:22pm

One thing I’d love to see added: rebuild causality. The current schema captures what rebuilt and how long, but not why. Was it a source file change? A transitive dependency? Build settings? Cache eviction?

This context seems essential for the “build archaeology” use case you mentioned. Without it, diagnosing false cache invalidations or tracing “which commit broke incremental builds” becomes much harder.

pepicrft · December 22, 2025, 4:42pm

That’s a very good idea @calube. I updated the proposal to reflect your suggestion. Let me know what you think about it.