[Discussion] Swift Prometheus Implementation

MrLotU · June 10, 2019, 5:30pm

SwiftPrometheus - Prometheus Metrics in Swift

Proposal: SSWG-xxxx
Authors: Jari Koopman / LotU
Review Manager: TBD
Status: Implemented
Pitch: Pitches/Prometheus

Package Description

Prometheus client side implementation.


Package Name	`SwiftPrometheus`
Module Name	`Prometheus` & `PrometheusMetrics`
Proposed Maturity Level	Sandbox
License	Apache 2.0
Dependencies	swift-nio > 1.0.0 - swift-metrics > 1.0.0

Introduction

For a background on metrics see the metrics proposal discussion and feedback thread.

Prometheus is one of the most widely used libraries for metrics in the serverside world. SwiftPrometheus is a client side implementation in Swift, with the ability to use it both connected to & separately from swift-metrics.

Motivation

With Prometheus being one of the most widely used metric reporting tools, it's a buildstone that can not be left out in a serverside ecosystem. This package is created for everyone to use & build upon for their metric reporting.

Detailed design

SwiftPrometheus works around one base class PrometheusClient and some metric types around it. The prometheus metric types are:
(from the prometheus docs)

Counter - A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart.
Gauges - A gauge is a metric that represents a single numerical value that can arbitrarily go up and down.
Histogram - A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. It also provides a sum of all observed values.
Summary - Similar to a histogram , a summary samples observations (usually things like request durations and response sizes). While it also provides a total count of observations and a sum of all observed values, it calculates configurable quantiles over a sliding time window.

SwiftPrometheus provides fully featured implementations for all of them, including a thin wrapper around them for integration with swift-metrics.

API Layout

Below section will lay out the public API of this package. For the internal APIs I would suggest you to read through the code on GitHub . This section is split up into two parts, using this library standalone, or using it integrated with the swift-metrics package.

Without swift-metrics

To get started, initialise an instance of PrometheusClient

let myProm = PrometheusClient()

Once done, you can use the create* APIs to create any of the above described metric types.

// MetricLabels is a helper type used to add labeled metrics.
struct MyCodable: MetricLabels {
   var thing: String = "*"
}

// - Counter
let counter = myProm.createCounter(forType: Int.self, named: "my_counter", helpText: "Just a counter", initialValue: 42, withLabelType: MyCodable.self)

counter.inc() // Increment by one
counter.inc(12) // Increment by a value
counter.inc(12, labels: MyCodable(thing: "test")) // Increment a labeled counter

// - Gauge
let gauge = myProm.createGauge(forType: Int.self, named: "my_gauge", helpText: "Just a gauge", initialValue: 42, withLabelType: MyCodable.self)
gauge.inc() // Same APIs as Counter
gauge.dec() // Same APIs as `inc()` but reversed.
gauge.set(42) // Set the gauge to a specific value

// - Histogram
// Histograms use special labels, different than the Counter & Gauge
struct HistogramLabels: HistogramLabels {
   var le: String = ""
   let route: String

   init() {
       self.route = "*"
   }

   init(_ route: String) {
       self.route = route
   }
}

let histogram = myProm.createHistogram(forType: Double.self, named: "my_histogram", helpText: "Just a histogram", labels: HistogramLabels.self)

histogram.observe(123) // Observes a value

// - Summary
// Like Histograms, Summaries use different label types.
struct SummaryLabels: SummaryLabels {
   var quantile: String = ""
   let route: String

   init() {
       self.route = "*"
   }

   init(_ route: String) {
       self.route = route
   }
}

let summary = myProm.createSummary(forType: Double.self, named: "my_summary", helpText: "Just a summary", labels: SummaryLabels.self)

summary.observe(123) // Observes a value

Then, after you have some metric types, you can use .getMetrics() on your PromtheusClient to get your Prometheus formatted string with all the data.
For example, in a Vapor app:

router.get("/metrics") { req -> String in 
    return myProm.getMetrics()
}

With swift-metrics

For use with swift-metrics, most of the steps described above work the same. To bootstrap the MetricsSystem you create a client and feed it to MetricsSystem:

let myProm = PrometheusClient()
MetricsSystem.bootstrap(myProm)

After that, you can use the metric types used by swift-metrics for your metrics. The mapping is as follows:

Counter -> counter
Gauge -> Gauge
Recorder (agg) -> Histogram
Timer -> Summary

To get a hold of your PrometheusClient either to:
a) use custom prometheus behaviour; or
b) get your metrics output
there is a utility function on MetricsSystem

let myProm = try MetricsSystem.prometheus()

This will either return the PrometheusClient used with .bootstrap() or throw an error if MetricsSystem was not bootstrapped with PrometheusClient
Note: There currently is no support for retrieving PrometheusClient when being used with MultiplexMetricsHandler

Maturity Justification

The implementation has the full feature set required for production use and meets the minimum requirements set forth by the SSWG (except for the fact that I'm a 1 man army creating this library)

Alternatives considered

Other than using a different metrics backend than Prometheus, there are not many alternatives to consider. One thing I'd like to point out though:

This library has support for the destroying of metrics in the way set forth by the swift-metrics package. However, as described in the Prometheus documentation, once a metric is created with a specific type, so for example a Counter named my_counter and that counter is destroyed, it's not allowed to, at a later time, re-create a metric named my_counter with a DIFFERENT type. (Creating another counter is fine). To keep track of this, PrometheusClient will hold a dictionary of metric names & types. ([String: MetricType]. This means that even if you destroy your metrics, your memory footprint will (gradually) increase. All of this is process bound and will reset on a process restart.

Thanks & ending notes

On the ending note of this proposal, I would like to thank a few people specifically:

@johannesweiss - Technical help & advise
@ktoso - Technical help & advise
Anyone who gave input during the initial pitch.

Next to these specific mentions, I'd like to thank you for taking the time to read my proposal and I would love for you to leave a comment below with your thoughts & comments

Mordil · June 11, 2019, 3:47am

Great work!

Since I haven't personally used Prometheus, most of my feedback is more from a standpoint of a general API user.

One major point that I didn't see covered in the proposal, nor the project's README, and I haven't deduced from source code (and this might not need answering for people familiar with Prometheus): Where do these metrics go?

Is the purpose of this library to connect to a Prometheus client, to act as one, or to generate reports to send to a client (by either a project's homebrew'd solution, or a "higher level" library)?

Label Caching

However, as described in the Prometheus documentation, once a metric is created with a specific type, so for example a Counter named my_counter and that counter is destroyed, it's not allowed to, at a later time, re-create a metric named my_counter with a DIFFERENT type. (Creating another counter is fine). To keep track of this, PrometheusClient will hold a dictionary of metric names & types. ( [String: MetricType] . This means that even if you destroy your metrics, your memory footprint will (gradually) increase. All of this is process bound and will reset on a process restart.

That's an interesting caveat that seems to put you in a sticky situation.

Some choices I see regarding this:

You acknowledge there's no persistent guarantee of complying to the specification (current stance - will need more "in your face" documentation to warn developers)
You don't attempt to comply to the specification outside of asserts & debug flags (also has complications for use of library)
Attempt to roll a solution that persists the guarantee of compliance

Factory Methods

Each of the metrics types are created with the factory methods on PrometheusClient that have a create* name.

According to the API Design Guidelines - Strive for Fluent Usage

Begin names of factory methods with “ make ”, e.g. x.makeIterator() .

I'm also curious as to what necessitates these, over say, initializers as the preferred & documented way of creating the various metric types.

Miscellaneous

I sense an API smell when I see get prefixes in method names. For example getMetric() -> String could be buildReport() -> String or generateReport() -> String
Are inc, dec, etc. all specified by Prometheus? They're not entirely descriptive if they aren't
a. See API Design Guidelines - Terminology
Thoughts on MetricsSystem.prometheus() from switching to a computed property prometheus: PrometheusClient?
a. You're "searching" for the client in the MetricsSystem bootstrapping, so I would expect nil just as much as a throw, but being as it stands with throwing, I might still guard to discard the optionality from try? or have my codebase have several try! or throwing methods.
b. You could provide the same assistance to developers by using assert or preconditionFailure

MrLotU · June 11, 2019, 3:17pm

@Mordil Thanks for your input

Prometheus works based on scraping, so you give your Prometheus Server instance the IPs, ports & paths (usually /metrics) of your clients, and the server will poll once a minute to aggregate metrics.

I'm open to sugestions on this, but room to manouver is limited since this is a requirement set by Prometheus.

The difference between create and make has no specific reason. The reason they're currently factory methods and not initializers is because they hold on to the PrometheusClient they're connected to. I can, however, just make the inits public to resolve this. I'll also rename the factory methods to make instead of create

I see where you're coming from, even though this method should not really pose that much use to library users, I checked some other implementations. In node they use plain .metrics(), in python & go it's .collect(). I'm open to other sugestions, but reading this, I think I'll change it to .collect() as well to reflect the other packages.

I copied the function names from the python implementation, but am open to change to more descriptive .increment and decrement

If I recall correctly the reason I/we went with a throwing function instead of a computed property was because in lots of scenarios and use cases, you will already be in a throwing function (for example, a Vapor routing closure). In that case, it was (IMO) nicer to be able to plop your try down instead of adding a guard let or if let construction. Open for discussion on this though!

Thanks for your feedback, I hope I addressed everything like this.

Mordil · June 12, 2019, 5:19am

Ah, so it was just my lack of knowledge. Good to know!

I think your current situation is a best first start, as long as you make a note to be explicit in documentation / project README about the guarantees made or not made. In the future when someone makes a good file logger - or you roll your own solution - would be to setup a local file that serializes out data on a background thread asynchronously to read from at startup?

Also, instead of a dictionary, perhaps it could just be a collection of hashes from the label & type?

It could be argued that you are then forcing those who don't work in throwing contexts to then be in throwing contexts when you could just do LoggingSystem.prometheus?.<whatever> since you're just gathering metrics and not some functional requirement, such as fetching models from a database.

But I wouldn't consider this a big point - both designs in Swift have their tradeoffs and I mostly wanted to hear the rationale

On the API names - it all sounds good. Just wanted to bring it up!

ktoso · July 5, 2019, 5:11am

Hi there,
I made some time to go through the existing repo in depth, and left some comments in line -- most of them already addressed, kudos @MrLotU For reference comments here https://github.com/MrLotU/SwiftPrometheus/pull/4

This is looking very good in general

I think that would indeed be good to change; collect() sounds like a good name;
related comment here Implement swift-metrics by MrLotU · Pull Request #4 · swift-server/swift-prometheus · GitHub

I think it's fine with the dictionary, it's an internal thing and going for just hashes could sacrifice correctness... Let's go with the existing dict there :)

Otherwise: this looks great and I think the PR can soon be merged and we can proceed to next SSWG phases with this one Great work and looking forward to being able to use this :)

MrLotU · July 5, 2019, 8:27am

Thanks for your input. Both @ktoso & @Mordil. I've gone ahead and merged the PR into the master branch & released 1.0.0-alpha.1. I'll soon open a new forum thread with all the addressed feedback and after that release 1.0.0.

Once again thanks for your time spent on this!

tomerd · July 5, 2019, 6:31pm

thanks @MrLotU for this library, one correction:

swift-metrics does have an explicit Gauge type, so while it is in-fact implemented as a non-aggregating recorder, the mapping above should say Gauge -> Gauge

MrLotU · July 5, 2019, 6:36pm

Thanks for your comment @tomerd. I’ll update this both here and in the PR I opened on GitHub. Should I include both in the list, or just Gauge (since a non aggregating recorder also becomes a Prometheus Gauge)

tomerd · July 5, 2019, 6:51pm

just Gauge imo

MrLotU · July 13, 2019, 10:52am

Feedback thread: [Feedback] Swift Prometheus Implementation
CC @tomerd could you lock this thread?