Client side Prometheus implementation

MrLotU · November 18, 2018, 1:04pm

Swift Prometheus: Client-side Prometheus implementation in Swift

This module, Prometheus, would include Prometheus metric types for developers to create and use, to serve prometheus formatted metrics any way they want.

Pitch:
A low level Client-Side Prometheus implementation.

Motivation:
A lot of companies and employers don't want their serverside applications running without any sort of insight whatsoever. Prometheus is a Monitoring system used by many big names out there to keep track of what their applications are doing.

This package is a Client-Side implementation of Prometheus, allowing you to serve up Prometheus formatted data that a Server-Side Prometheus application can read out, and store.

There currently are some swift solutions out there that provide Prometheus monitoring, for example SwiftMetrics, but none of them are framework independent, and none of them expose the ability for the end user to add metrics of their own without editing package source.

API Design
Note: All of this is still in early stages, and might very well change based on community input

The Swift Prometheus API is designed to be simple. It has no dependencies, only uses Swift builtin types, and requires no setup.

Using prometheus in your project is as simple as this:

import Prometheus

let counter = Prometheus.shared.createCounter(forType: Int.self, named: "my_counter")

counter.inc() // Increment value by 1
// OR use counter.inc(12) to increment the counter by a given value

// To get a hold of your metrics

Prometheus.shared.getMetrics()

In this case, only incrementing the counter once, with a value of 1, this will give the follwing output:

# TYPE my_counter counter
my_counter 1

Also supported are help texts, that can be used to clarify what metric holds what:

import Prometheus

let counter = Prometheus.shared.createCounter(forType: Int.self, named: "my_counter", helpText: "Counts up")

counter.inc()

Prometheus.shared.getMetrics()

which will result in:

# HELP my_counter Counts up
# TYPE my_counter counter
my_counter 1

An example of how this could end up looking in an application (using Vapor syntax here, but framework should not matter)

import Prometheus

router.get("/status") { req -> String in
    return Prometheus.shared.getMetrics()
}

Metrics currently supported are:

Counter
Gauge
Histogram
Summary
Info

To check out the implementations, and full overview of the module, it's on GitHub here

Into the future
Right now, the API I created has some quirks and code I'm not fully happy with. I'll work on cleaning these flaws up, but wanted to share this here already, since the API is functional. Next to that, the code is only partially documented, and has no tests yet. I'll also provide those over the coming week(s)

Next to changes to this module I'll preform, I think it'd be a nice to have, to have default implementations on a per framework basis, covering metrics that most anyone would want, like response times, system information, and other defaults.

Thanks for reading

tanner0101 · December 13, 2018, 6:34pm

Thanks for submitting this pitch @MrLotU. I think a Prometheus client built on Swift NIO would be a great fit in the SSWG project list.

One concern I have with the code example is the use of Prometheus.shared. Does this use locking to achieve thread-safety or will it potentially crash if used cross-thread? I think removing that in favor of initializing instances of Prometheus would be a better approach. I might also call it PrometheusClient to avoid using the same name as the module, since that can cause issues in Swift.

Overall though, I think this is a great proposal and I would love to see it move forward. cc @server-work-group.

MrLotU · December 13, 2018, 6:40pm

Right now, no. I am investigating this, I’m also not sure if manual initializations will fix this if you use one instance and talk to it from more than one eventloop for example. But the initializer to the class is public, so you can most definitely create your own instances.

For the naming, I agree, so I’ll updat that!

Thanks for your input!

tanner0101 · December 13, 2018, 6:42pm

Ah perfect. I would just remove the .shared then. For Vapor, at least, you will want to initialize at least one client per event loop. Probably one client per controller. For other frameworks, you might just initialize a single instance for your whole app. But I would let the user decide how that is done.

MrLotU · December 13, 2018, 6:44pm

I don’t think you do, since it will not post back metrics from all eventloop instances if you request the prometheus string so to say, which is the reason I went for 1 shared instance

tanner0101 · December 13, 2018, 6:51pm

@MrLotU oh I see now. Would there be any reason that you would want to cache the prometheus data anywhere other than application memory? For example, in Redis or memcached or something? Or is the idea with prometheus that you store the results local to the instance, and just gather them from all instances?

If you want to keep the door open to swapping out how the data is stored, you could have two layers like:

PrometheusService
PrometheusStorage

Where PrometheusStorage is a protocol that can be implemented however the user likes. By default an InMemoryPrometheusStorage is used that takes advantage of locking for thread-safety.

Then PrometheusService accepts a generic PrometheusStorage and provides a nice API for getting / setting data.

Otherwise, if there's no reason for giving the user the option for modifying how Prometheus stores its data, then just one PrometheusService type that is thread-safe would make sense to me. However, I would still avoid using .shared since being static makes dependency trees less clear, and opt for having the user pass around a single instance of PrometheusService.

MrLotU · December 13, 2018, 7:37pm

@tanner0101 The way prometheus works, you don't want to do persistance on the client side (website/app/service) but on the server side (Prometheus server). See this example:

In this chart, my library would be the green Exporter part, exposing data on a HTTP endpoint that can be scraped by the Prometheus server instance. This server instance will take care of the storage, and will feed into alerting and things like Grafana for visualisation. So there is no need to persist anything on the client side (in my Library).

This is also the reason I went with the .shared approach, since you only want one single truth about your metrics, Prometheus server usually scrapes just one endpoint (in most cases /metrics) so you want ALL your info there. From my perspective, it'd be better to have 1 "shared" but thread safe PrometheusClient class instance, than to have one per .

Tobias_Haeberle · December 13, 2018, 7:42pm

The Prometheus server will scrape a special endpoint on the webserver (E.G. /metrics) in a defined interval (E.G. every 2s) and collect the metrics. Thus it is enough if the application keeps an in memory representation of the last x measurements. But it has to do so for all requests. So you will want to have a single store per server which collects metrics across all event loops. No way around locking here I guess. But I would also remove the shared instance approach. Also it might be nice to actually spawn a separate server on a different port exclusively for metrics. This port could then be exposed only to an internal network and not to the public.

tomerd · December 17, 2018, 5:25pm

@MrLotU this would be an awesome addition to the ecosystem. its in-line with our published focus areas and a great place to start given how useful prometheus has proven itself to be

i would like however, to first define an abstract metrics API (similar to the one we are defining for logging) which will allow application owners to plug-in different metrics backends, and library developers to emit metrics without getting in the way. the reasoning is summarized in here, that post is about logging but the same is true for metrics as well

to avoid loosing momentum, i suggest we parallelize the two efforts, but we should be prepared to retrofit the prometheus implementation on top of the generic API once that is finalized to achieve consistency in the ecosystem. your experience in developing prometheus implementation can also help influence and direct the generic API and vice versa

MrLotU · December 17, 2018, 6:31pm

@tomerd Many thanks for the reply!

I agree that a centralized metrics API would be really nice. If I can help out in any way, do please let me know Once there is a Metrics API thread, I'll definitely keep an eye out!

ddunbar · January 8, 2019, 6:16pm

This is very cool work, I haven't looked at your implementation in detail but definitely have use cases for this.

One thing that I think is absolutely essential is to have a very clear and good story w.r.t. multithreading. In particular, for some applications it is very important that the metrics code have good performance in a server context (which can mean ensuring low lock contention with high core counts).

ddunbar · January 8, 2019, 6:17pm

How do you expect to handle the semantic mismatch between different metrics systems. Even bridging Prometheus vs StatsD can be hard w/o compromising on one of performance, flexibility, or usability.

tomerd · January 11, 2019, 8:13pm

https://forums.swift.org/t/metrics/

MrLotU · February 4, 2019, 8:59pm

I have a minor update on this. It's from a while back, but hadn't had the time to post here yet.

I reworked the logic of the implementation to reflect some design comments @tanner0101 posted above, and worked on getting the implementation thread safe. I myself hadn't had much experience in this, so if you see any obvious issues, please open an issue or PR.

The README is currently outdated and I'll try to update it ASAP to reflect the new thread-safe/async API.

To see the new API in action, the easiest way would be to look at the main.swift:

github.com

swift-server-community/SwiftPrometheus/blob/master/Sources/PrometheusExample/main.swift

import Prometheus
import Metrics
import NIO

let myProm = PrometheusClient()

MetricsSystem.bootstrap(PrometheusMetricsFactory(client: myProm))

//for _ in 0...Int.random(in: 10...100) {
//    let c = Counter(label: "test")
//    c.increment()
//}

for _ in 0...Int.random(in: 10...100) {
    let c = Counter(label: "test", dimensions: [("abc", "123")])
    c.increment()
}

//for _ in 0...Int.random(in: 100...500_000) {
//    let r = Recorder(label: "recorder")

This file has been truncated. show original