[Feedback] Server Metrics API

Proposal Review: SSWG-0002 (Server Metrics API)

After the discussion thread, we are proposing this as a final revision of this proposal and enter the proposal review phase which will run until the 22th March 2019.

We have integrated most of the feedback from the discussion thread so even if you have read the previous version, you will find some changes that you hopefully agree with. To highlight a few of the major changes:

  • use constructors instead of factory
  • remove caching module

The feedback model will be very similar to the one known from Swift Evolution. The community is asked to provide feedback in the way outlined below and after the review period finishes, the SSWG will -- based on the community feedback -- decide whether to promote the proposal to the Sandbox maturity level or not.

What goes into a review of a proposal?

The goal of the review process is to improve the proposal under review through constructive criticism and, eventually, determine the evolution of the server-side Swift ecosystem.

When reviewing a proposal, here are some questions to consider:

  • What is your evaluation of the proposal?

  • Is the problem being addressed significant enough?

  • Does this proposal fit well with the feel and direction of Swift on Server?

  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Thank you for contributing to the Swift Server Work Group!

What happens if the proposal gets accepted?

If this proposal gets accepted, the official repository will be created and the code (minus examples, the proposal text, etc) will be submitted. The repository will then become usable as a SwiftPM package and a version (likely 0.1.0) will be tagged. The development (in form of pull requests) will continue as a regular open-source project.


Server Metrics API

  • Proposal: SSWG-0002
  • Authors: Tom Doron
  • Preferred initial maturity level: Sandbox
  • Name: swift-metrics
  • Sponsor: Apple
  • Status: Active review (7th...23th March, 2019)
  • Implementation: https://github.com/tomerd/swift-server-metrics-api-proposal/, if accepted, a fresh repository will be created under Apple ยท GitHub
  • External dependencies: none
  • License: if accepted, it will be released under the Apache 2 license
  • Pitch: Server: Pitches/Metrics
  • Description: A flexible API package that aims to become the standard metrics API which Swift packages can use to emit metrics. The delivery, aggregation and persistence of the events is handled by other packages and configurable by the individual applications without requiring users of the API package to change.

Introduction

Almost all production server software needs to emit metrics information for observability. The SSWG aims to provide a number of packages that can be shared across the whole Swift Server ecosystem so we need some amount of standardization. Because it's unlikely that all parties can agree on one full metrics implementation, this proposal is attempting to establish a metrics API that can be implemented by various metrics backends which then post the metrics data to backends like prometheus, graphite, publish over statsd, write to disk, etc.

Motivation

As outlined above, we should standardize on an API that if well adopted would allow application owners to mix and match libraries from different parties with a consistent metrics collection solution.

Proposed solution

The proposed solution is to introduce the following types that encapsulate metrics data:

Counter: A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart. For example, you can use a counter to represent the number of requests served, tasks completed, or errors.

counter.increment(100)

Recorder: A recorder collects observations within a time window (usually things like response sizes) and can provides aggregated information about the data sample, for example count, sum, min, max and various quantiles.

recorder.record(100)

Gauge: A Gauge is a metric that represents a single numerical value that can arbitrarily go up and down. Gauges are typically used for measured values like temperatures or current memory usage, but also "counts" that can go up and down, like the number of active threads. Gauges are modeled as Recorder with a sample size of 1 and that does not perform any aggregation.

gauge.record(100)

Timer: A timer collects observations within a time window (usually things like request durations) and provides aggregated information about the data sample, for example min, max and various quantiles. It is similar to a Recorder but specialized for values that represent durations.

timer.recordMilliseconds(100)

How would you use counter, recorder, gauge and timer in you application or library? Here is a contrived example for request processing code that emits metrics for: total request count per url, request size and duration and response size:

    func processRequest(request: Request) -> Response {
      let requestCounter = Counter("request.count", ["url": request.url])
      let requestTimer = Timer("request.duration", ["url": request.url])
      let requestSizeRecorder = Recorder("request.size", ["url": request.url])
      let responseSizeRecorder = Recorder("response.size", ["url": request.url])

      requestCounter.increment()
      requestSizeRecorder.record(request.size)

      let start = Date()
      let response = ...
      requestTimer.record(Date().timeIntervalSince(start))
      responseSizeRecorder.record(response.size)
    }

Detailed design

As seen above, the constructor functions Counter, Timer, Gauge and Recorder provides a concrete metric object. This raises the question of what metrics backend will you actually get? The answer is that it's configurable per application. The application sets up the metrics backend it wishes the whole application to use when it first starts. Libraries should never change the metrics implementation as that is something owned by the application. Configuring the metrics backend is straightforward:

    MetricsSystem.bootstrap(MyFavoriteMetricsImplementation())

This instructs the MetricsSystem to install MyFavoriteMetricsImplementation as the metrics backend to use. This can only be done once at the beginning of the program.

Metrics Types

Counter

Following is the user facing Counter API. It must have reference semantics, and its behavior depends on the CounterHandler implementation.

public class Counter: CounterHandler {
    @usableFromInline
    var handler: CounterHandler
    public let label: String
    public let dimensions: [(String, String)]

    public init(label: String, dimensions: [(String, String)], handler: CounterHandler) {
        self.label = label
        self.dimensions = dimensions
        self.handler = handler
    }

    @inlinable
    public func increment<DataType: BinaryInteger>(_ value: DataType) {
        self.handler.increment(value)
    }

    @inlinable
    public func increment() {
        self.increment(1)
    }
}

Recorder

Following is the user facing Recorder API. It must have reference semantics, and its behavior depends on the RecorderHandler implementation.

public class Recorder: RecorderHandler {
    @usableFromInline
    var handler: RecorderHandler
    public let label: String
    public let dimensions: [(String, String)]
    public let aggregate: Bool

    public init(label: String, dimensions: [(String, String)], aggregate: Bool, handler: RecorderHandler) {
        self.label = label
        self.dimensions = dimensions
        self.aggregate = aggregate
        self.handler = handler
    }

    @inlinable
    public func record<DataType: BinaryInteger>(_ value: DataType) {
        self.handler.record(value)
    }

    @inlinable
    public func record<DataType: BinaryFloatingPoint>(_ value: DataType) {
        self.handler.record(value)
    }
}

Gauge

Gauge is a specialized Recorder that does not preform aggregation.

public class Gauge: Recorder {
    public convenience init(label: String, dimensions: [(String, String)] = []) {
        self.init(label: label, dimensions: dimensions, aggregate: false)
    }
}

Timer

Following is the user facing Timer API. It must have reference semantics, and its behavior depends on the TimerHandler implementation.

public class Timer: TimerHandler {
    @usableFromInline
    var handler: TimerHandler
    public let label: String
    public let dimensions: [(String, String)]

    public init(label: String, dimensions: [(String, String)], handler: TimerHandler) {
        self.label = label
        self.dimensions = dimensions
        self.handler = handler
    }

    @inlinable
    public func recordNanoseconds(_ duration: Int64) {
        self.handler.recordNanoseconds(duration)
    }
}

Implementing a metrics backend (eg prometheus client library)

An implementation of a metric backend needs to conform to the MetricsFactory protocol:

public protocol MetricsFactory {
    func makeCounter(label: String, dimensions: [(String, String)]) -> CounterHandler
    func makeRecorder(label: String, dimensions: [(String, String)], aggregate: Bool) -> RecorderHandler
    func makeTimer(label: String, dimensions: [(String, String)]) -> TimerHandler
}

Having CounterHandler, TimerHandler and RecorderHandler define the metric capturing API:

public protocol CounterHandler: AnyObject {
    func increment<DataType: BinaryInteger>(_ value: DataType)
}
public protocol TimerHandler: AnyObject {
    func recordNanoseconds(_ duration: Int64)
}
public protocol RecorderHandler: AnyObject {
    func record<DataType: BinaryInteger>(_ value: DataType)
    func record<DataType: BinaryFloatingPoint>(_ value: DataType)
}

Here is an example of contrived in-memory implementation:

class SimpleMetrics: MetricsFactory {
    init() {}

    func makeCounter(label: String, dimensions: [(String, String)]) -> CounterHandler {
        return ExampleCounter(label, dimensions)
    }

    func makeRecorder(label: String, dimensions: [(String, String)], aggregate: Bool) -> RecorderHandler {
        let maker:(String,  [(String, String)]) -> Recorder = aggregate ? ExampleRecorder.init : ExampleGauge.init
        return maker(label, dimensions)
    }

    func makeTimer(label: String, dimensions: [(String, String)]) -> TimerHandler {
        return ExampleTimer(label, dimensions)
    }

    private class ExampleCounter: CounterHandler {
        init(_: String, _: [(String, String)]) {}

        let lock = NSLock()
        var value: Int64 = 0
        func increment<DataType: BinaryInteger>(_ value: DataType) {
            self.lock.withLock {
                self.value += Int64(value)
            }
        }
    }

    private class ExampleRecorder: RecorderHandler {
        init(_: String, _: [(String, String)]) {}

        private let lock = NSLock()
        var values = [(Int64, Double)]()
        func record<DataType: BinaryInteger>(_ value: DataType) {
            self.record(Double(value))
        }

        func record<DataType: BinaryFloatingPoint>(_ value: DataType) {
            // this may loose precision, but good enough as an example
            let v = Double(value)
            // TODO: sliding window
            lock.withLock {
                values.append((Date().nanoSince1970, v))
                self._count += 1
                self._sum += v
                self._min = min(self._min, v)
                self._max = max(self._max, v)
            }
        }

        var _sum: Double = 0
        var sum: Double {
            return self.lock.withLock { _sum }
        }

        private var _count: Int = 0
        var count: Int {
            return self.lock.withLock { _count }
        }

        private var _min: Double = 0
        var min: Double {
            return self.lock.withLock { _min }
        }

        private var _max: Double = 0
        var max: Double {
            return self.lock.withLock { _max }
        }
    }

    private class ExampleGauge: RecorderHandler {
        init(_: String, _: [(String, String)]) {}

        let lock = NSLock()
        var _value: Double = 0
        func record<DataType: BinaryInteger>(_ value: DataType) {
            self.record(Double(value))
        }

        func record<DataType: BinaryFloatingPoint>(_ value: DataType) {
            // this may loose precision but good enough as an example
            self.lock.withLock { _value = Double(value) }
        }
    }

    private class ExampleTimer: ExampleRecorder, TimerHandler {
        func recordNanoseconds(_ duration: Int64) {
            super.record(duration)
        }
    }
}
5 Likes

Very excited to see this hit the formal proposal milestone!

A request for people reviewing the proposal:

There's an open change proposal for adding some level of lifecycle awareness for metrics implementation libraries which might need them, details explained in Consider a form of "unregister"/"destroy" for metrics #6, and the implementation PR Add MetricsSystem.release() to allow metric lifetime management #6 #11 (names can change, the general feature matters).

If someone has opinions on this or ideas how to solve this in a different way, please chime in!

2 Likes

thanks @ktoso +1. the concept and goals of lifecycle awareness made in the PR make allot of sense. hopefully we can come up with a system that does not require explicit "release" since it is error prone, but if we cant i am in favor of taking it in as is

1 Like

Overall the proposal looks and feels great! I have 2 points I'd like to discuss though:

First off, the naming/labeling of the metrics. Right now, it's just a string, from the example in the proposal:

let requestCounter = Counter("request.count", ["url": request.url])

However, and I'm talking from a Prometheus standpoint here since it's the only backend I have experience with, prometheus metric labels have to conform to a specific name formatting, in this case the dot syntax is not allowed.
My proposal for this would be to, instead of using 1 string, use a variadic list of strings so you'd create the counter like this:

let requestCounter = Counter("request", "count", ["url": request.url])
// Or this
let requestCounter = Counter(["request", "count"], ["url": request.url])

where the bootstrapped metrics implementation can use those parts to create a metric label. In case of prometheus, this would end up as request_count but in case of other libraries it might be a plain request.count


The other thing I'm missing in the Proposal that was discussed in the discussion phase is that, where possible & feasible, backend implementations should allow for a user to talk to backend specific implementations by adding an extension to MetricsSystem something like this:

extension MetricsSystem {
    static func Prometheus() throws -> PrometheusClient {
        // Get a hold of the bootstrapped provider
        // Note: Pseudo code
        guard let provider = self.provider as? PrometheusClient else {
            throw MetricsError.requestedTypeNotBootstrapped
        }
        return provider
    }
}

In the end, allowing users to use, in this case, prometheus specific properties, instead of just the generic bits.

Also interested in other opinions on the whole getting hold of a specific backend part :smile:

3 Likes

Thank you for putting forward this proposal, that's really cool!

  • What is your evaluation of the proposal?

:+1:, +1

  • Is the problem being addressed significant enough?

for sure

  • Does this proposal fit well with the feel and direction of Swift on Server?

absolutely

  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

no

  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

another quick read right now, bunch of chats in the past but did not try out the real code yet

Interesting point, thanks for raising. I agree that it's a bit tricky indeed... esp. when a library wants to emit metrics and does not know about such requirements (and shouldn't know about them). I'm wondering raw arrays are the way to go though. I experimented a bit now with introducing a specific type for the labels, which seems right, though there's open questions there:

//     func makeCounter(label: MetricLabel, dimensions: [(String, String)]) -> CounterHandler

public struct MetricLabel: Hashable, 
        ExpressibleByArrayLiteral, ExpressibleByStringLiteral {
    public typealias ArrayLiteralElement = String

    let elements: [String]

    public init(arrayLiteral elements: String...) {
        self.elements = elements
    }
    public init(_ element: String) {
        self.elements = [element]
    }

    var defaultLabel: String {
        return self.label(separator: ".")
    }

    func label(separator: String) -> String {
        return self.elements.joined(separator: separator)
    }

    public init(stringLiteral value: StringLiteralType) {
        // option A:
        // upside: more correct, the only way to to many elements is via []
        // downside: is blowing up in runtime... though could be okey?
        precondition(!value.contains("."), "Single element labels MUST NOT contain `.`, instead please use ")
        self.elements = [value]
        // end of option A

        // option B:
        // upside: simple to use
        // downside: is that it's "heavy"... and only considers the `.` and not other ones (e.g. `/`)...
        self.elements = value.split(separator: ".").map { String($0) } // TODO meh, a bit heavy
        // end of option B
    }

    public init(extendedGraphemeClusterLiteral value: ExtendedGraphemeClusterLiteralType) {
        self.init(stringLiteral: "\(value)") // TODO how to properly impl this one?
    }

    public init(unicodeScalarLiteral value: UnicodeScalarLiteralType) {
        self.init(stringLiteral: "\(value)") // TODO how to properly impl this one?
    }
}

Questions being:

  • should it conform to ExpressibleByStringLiteral?
    • that makes API surface a bit smaller since we then can for many cases perhaps express it as just "label"
    • how to properly implement all inits then?
  • when single string passed in and it contains ., should it:
    • blow up -- option A (I'm in favor of this one)
    • split on the . -- option B

Doing this allows the label to more naturally be used as key for things I think, rather than passing the raw [String] around. Though I'm not sure about the ExpressibleBy... things, WDYT? If we don't do them then surface API would have to deal with it by doing the String... or overloads...

Would perhaps doing that, but always storing as the MetricLabel and that being passed into the metric libraries then be perhaps a good common ground? So user API would be taking String..., and make the label inside, before passing to the handlers.


Sounds alright; would allow users to make sure they always hit the same instance of the metrics lib and don't accidentally create or use another one... It'd always be "the one". We'd have to make the factory computed property public, like shown below, but I think that's likely fine, as we already protect access to the underlying _factory with locks and the bootstrap takes care of init only once...

public enum MetricsSystem {
    public static var factory: MetricsFactory {

Would you want to send in a PR for this one?

I might as well post my answers to the main questions while I'm here now :slight_smile:

What is your evaluation of the proposal?

:+1: it is a great start and will enable various libraries to start offering metrics into a shared ecosystem. Can't wait for backend implementations to pick it up as well.

Is the problem being addressed significant enough?

Yes, from a server side perspective this is key to enabling a mature ecosystem of production ready apps which can report their metrics to backends, as is best practice (or rather... requirement) for serious server side systems.

Does this proposal fit well with the feel and direction of Swift on Server?

Yes, it aligns naming and feel wise with the already accepted Logging APIs.

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

Yes, used codahale metrics most of the time in JVM apps, as well as built libraries emitting metrics into various backends.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Substantial effort, including considerations for library authors and lifecycle of metrics. See also the addition of lifecycle proposal [Feedback] Server Metrics API - #2 by ktoso (link to post).


By the way, from a Prometheus perspective, @MrLotU could you have a look at the lifecycle things? I know prometheus allows "removing" a metric, though for it the contract is that a metric must never "change its type" right? Is everything needed to implement this offered in my proposal for lifecycle hooks?

For this, I think we should try to achive ExpressibleByStringLiteral conformance. What might be a possibility is to go with a more generic version of option B, splitting not on only dots, but any non alphanumeric character.
Example implementation could look like this:

struct MetricLabel: ExpressibleByStringLiteral {
    let elements: [String]
    
    init(stringLiteral value: StringLiteralType) {
        var e = [String]()
        var v = value
        while let char = v.first(where: { (c) in
            !"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890".contains(c)
        }), let index = v.firstIndex(of: char) {
            let str = String(v[..<index])
            e.append(str)
            v.removeFirst(str.count + 1)
        }
        if !v.isEmpty {
            e.append(v)
        }
        self.elements = e
    }
}

It's not the prettiest code and still runs into the fact that it's quite "heavy", but it takes away any other formatting issues and the risk of blowing up at runtime. As for the other 2 initializers, with the above snippet, I didn't have to provide specific implementations.

I do also like the String... user facing API solution, where the internals would communicate either using the MetricsLabel struct or just plain Array<String>s


PR Created: This makes the MetricsFactory property public by MrLotU ยท Pull Request #12 ยท tomerd/swift-server-metrics-api-proposal ยท GitHub


I submitted a review to your PR, but had just some minor nitpicks. Everything required for Prometheus to function is in the PR/Proposal :smile:

  • What is your evaluation of the proposal?

:+1:.

  • Is the problem being addressed significant enough?

Absolutely.

  • Does this proposal fit well with the feel and direction of Swift on Server?

Yes.

  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

Yes. Codahale for the JVM (and another similar internal system).

  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Read through this thread, and the linked implementation. Did not try to use it myself, though.

[/quote]

1 Like

Also, my formal answers before they're too late :slight_smile:


What is your evaluation of the proposal?

+1 Really like the way this is going.

Is the problem being addressed significant enough?

Yes, it's a good build stone for the community to have a centralized way to expose metrics leaving the implementation details up to the end user

Does this proposal fit well with the feel and direction of Swift on Server?

Yes, like said above, metrics is a big part of serverside infrastructure and a must have in many professional environments.

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

I've never worked with one centralized API like this, but always worked with a specific implementation direcly, but compared to that, the API looks really nice and clean, and seems well suited for the task.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Decent effort, both reading this post and the discussion post, as working on my own backend implementation.

1 Like

Apologies that I am a day late on this. I am +1 on taking this to the next stage of the SSWG process so we can start to build with it. Thanks to all who have contributed.

2 Likes

thanks everyone for the constructive feedback. the SSWG will vote on this proposal on 4/4, taking into account all feedback provided by then

2 Likes

@tomerd Sorry, to only ask this now. Usually, the best and most performant way to have the customisation point be a concrete type and make other types available via protocol extensions. So for example instead of

public protocol CounterHandler: AnyObject {
    func increment<DataType: BinaryInteger>(_ value: DataType)
}

You would usually write

public protocol CounterHandler: AnyObject {
    func increment(_ value: Int64)
}

extension CounterHandler {
    @inlinable
    public func increment<DataType: BinaryInteger>(_ value: DataType) {
        self.increment(Int64(value))
    }
}

Why? For the best performance, it's very beneficial for the protocols to use functions with concrete types because then there's no need for specialisation. At compile time, the compiler can often not know what concrete type will be used so it needs to create a 'generic at runtime' version of the function. If however all functions on the protocol do not use generics, there's no need for a 'generic at runtime' version of the function.

You can add the generic functions in a protocol extension because protocol extensions cannot be 'overridden' by protocol implementations and therefore there's no question for the compiler which implementation is used. So it knows the implementation and because of the @inlinable it can even inline the code and specialise it.

The same applies to RecordHandler. If possible, the best strategy is usually to have the protocol have only non-generic customisation points and to then add the generic convenience functions as protocol extensions.

Is there any reason why we couldn't do this here?

Why's the user-facing class Counter itself a CounterHandler?

1 Like

With this in mind, we'd always need the default to be the most precise value possible, in this case Double, because converting everything to anything with less precision would hurt in a metrics library.
Other than that, I'm all for this :smile:

2 Likes

@johannesweiss no problem, can do. cc @lukasa who was involved with this originally

Why's the user-facing class Counter itself a CounterHandler ?

its actually not required technically. i'v chose to so so since the API is atm identical and it was a good way to make sure we implement it

feedback addressed in:

and

2 Likes

I know the review period is technically over, but while starting to implement my Prometheus library, I ran into an issue. I've opened an issue on the GH, but thought I'd cross-post here too: How to handle generic handlers ยท Issue #15 ยท tomerd/swift-server-metrics-api-proposal ยท GitHub

2 Likes

I'll give it a look today, hopefully we can figure something out :-)

The project is going to land in "sandbox", so that's exactly the time to hack on implementations and keep adjusting until the API is rock solid, thanks for your work -- it is invaluable for polishing the design! :+1:

1 Like

the SSWG accepted this proposal to the 'sandbox' stage, and with the repository now open, we can officially kick off the open-source swift-metrics project :sweat_smile:

the repository got seeded from the pitch/proposal one which was under my personal github alongside a few changes:

  • name: swift-metrics
  • improved readme and API docs
  • removed the proposal examples
  • a bunch of smaller fixes and non-functional changes

Thank you so much for all the contributions! Now more than ever: please keep your awesome contributions coming, this is only the start of swift-metrics and swift-metrics is only the start of a metrics ecosystem in swift :love_you_gesture: :peace_symbol:

11 Likes