Linode Prometheus Exporter

December 18, 2019 - 5 minutes read - 1054 words

I enjoy using Prometheus and have toyed around with it for some time particularly in combination with Kubernetes. I signed up with Linode [referral] compelled by the addition of a managed Kubernetes service called Linode Kubernetes Engine (LKE). I have an anxiety that I’ll inadvertently leave resources running (unused) on a cloud platform. Instead of refreshing the relevant billing page, it struck me that Prometheus may (not yet proven) help.

The hypothesis is that a combination of a cloud-specific Prometheus exporter reporting aggregate uses of e.g. Linodes (instances), NodeBalancers, Kubernetes clusters etc., could form the basis of an alert mechanism using Prometheus’ alerting.

First step is to ‘standarize’ Linode resource metrics using a Prometheus exporter. My solution – heavily influenced by Matthias Loibl’s digitalocean_exporter which I will use to report Digital Ocean resource metrics – is a Prometheus Exporter for Linode.

It’s a work in progress. It reports basic metrics (Linode, NodeBalancer counts). The exporter uses Linode’s Go SDK. The SDK has a PR for LKE that I will incorporate. I’ve submitted a PR for NodeBalancer statistics. Hopefully (!) the exporter’s README is sufficient to get started, combine the exporter with Prometheus and – all being well (soon) – with AlertManager too. The rest of this post will summarize the development of the exporter.

The exporter uses Prometheus’ Golang SDK. I found the documentation for this SDK to be basic. The Linode Exporter is heavily influenced by the Digital Ocean Exporter () and I used the Digital Ocean Exporter to understand the Prometheus SDK too and as a template for the Linode Exporter.

main.go configures the exporter, registers Prometheus collectors, and establishes a basic HTTP server to serve metrics. For each of Linode’s primary resource types (e.g. Account, Instances, NodeBalancers), there should be a collector. Collectors are formulaic. Each implements Prometheus Collector interface with collect and describe methods. Each collector defines a struct that primarily statically defines a set of Prometheus metric descriptors. These are instantiated by the collector’s constructor (e.g. NewInstanceCollector). Values for each metric are calculated and then set in the relevant collect method before being enqueued. Each descriptor is enqueued in describe. The Prometheus client then realizes these metrics by dequeuing them and rendering the results as text in the Prometheus exposition format when a Prometheus server scrapes the endpoint.

One structural change that I continue to consider is in making these collector lists of metric descriptors less repetitive. The Instance collector has 11 metrics. It’s struct thus lists these 11 metrics, the constructor must create each of them, collect must set each of them and describe must enqueue each of them. It seems (!?) that this may be better done using map[string]prometheus.Desc. The type definition would then need just the map. The constructor, collect and describe could then simply range over the map. The remaining challenge is provide a generic way for collect to determine each metric’s value. Needs more work!

The Linode API documentation is decent. For the majority of API call, it’s trivial to map the responses into Prometheus metrics. I had 2 challenges:

dealing with the API’s statistics;
implementing missing functionality

API statistics

Several API methods provide statistics, e.g. Linode Statistics

Given a Linode Instance ID and a Linode API token, it’s trivial to explore these:

curl -H "Authorization: Bearer ${LINODE_TOKEN}" https://api.linode.com/v4/linode/instances/${LINODE_INSTANCE_ID}/stats

The documentation is scarce but statistics are [][]float64 (a slice of slices of float64). The first dimension corresponds to each ’tick’ of this time-series and is often 64 units long. The second dimension is (always!?) 2 units. The first of these (e.g. stat[X][0]) is, I believe, the epoch is milliseconds. The second (e.g. stats[X][1]) is the value. I added a simple TimeSeries type that iterates over each of these series and calculates the minimum, maximum and (2 types of) average value. The simple average is the total of the values divided by the number of values. The more accurate (!?) average is the total of the areas represented by the values (the length of time multipled by the value) divided by the overall time. The latter calculation assumes that the epochs are ordered.

API missing functionality

Several API methods are not implemented by the SDK. I mentioned the new LKE service previously. NodeBalancer Statistics were also not implemented. I’ve submitted a PR for this method. I used Linode Statistics as my template. Thanks to the Linode team for providing more definitive guidance. There’s template.go that better explains how to do this. The code was trivial. I was able to leverage the existing NodeBalance tests. The bulk of the code is configuring the method’s name, path etc.

I struggled with the tests but was helped by the Linode team. I realized subsequently this is also somewhat documented.

export LINODE_TOKEN=[[YOUR-LINODE-TOKEN]]
make test

results:

go vet ./...
golangci-lint run
go build ./...
2019/12/18 09:51:51 [INFO] LINODE_FIXTURE_MODE play will be used for tests
...

It’s possible to run the tests under Visual Studio Code too but you must add a LINODE_TOKEN to the settings:

"go.toolsEnvVars": {
    "LINODE_TOKEN": "[[YOUR-LINODE-TOKEN]]"
}

For my addition, I created a new type NodeBalancerStats with a single method GetNodeBalancerStats corresponding to the API.

To be able to test this specific type, I am able to run:

make ARGS="-run ^.*NodeBalancerStats" fixtures

and success is:

* Running fixtures
2019/12/18 09:54:20 [INFO] LINODE_FIXTURE_MODE record will be used for tests
PASS
ok  	github.com/linode/linodego	11.390s
* Sanitizing fixtures

I’m not entirely clear on what’s going on here but, what I do understand is that this test records the API calls that are made in fixtures/TestGetNodeBalancerStats.yaml.

NB The filename corresponds to the method name.

This file contains (summarized):

---
version: 1
interactions:
- request:
    body: '{"label":"[REDACTED]-linodego-testing","region":"us-west","client_conn_throttle":20,"tags":null}'
    headers:
      Accept:
      - application/json
      Content-Type:
      - application/json
      User-Agent:
      - linodego 0.12.0 https://github.com/linode/linodego
    url: https://api.linode.com/v4beta/nodebalancers
    method: POST
  response:
    body: '{"id": 123, "label": "[REDACTED]-linodego-testing", "region": "us-west",
      "hostname": "[REDACTED].nodebalancer.linode.com", "ipv4": "[REDACTED]",
      "ipv6": "[REDACTED]", "created": "2018-01-02T03:04:05", "updated":
      "2018-01-02T03:04:05", "client_conn_throttle": 20, "tags": [], "transfer": {"in":
      null, "out": null, "total": null}}'
    headers:
    status: 200 OK
    code: 200
- request:
    headers:
    url: https://api.linode.com/v4beta/nodebalancers/123/stats
    method: GET
  response:
    body: '{"errors": [{"reason": "Stats are unavailable at this time."}]}'
    headers:
    status: 400 BAD REQUEST
    code: 400
- request:
    headers:
      User-Agent:
      - linodego 0.12.0 https://github.com/linode/linodego
    url: https://api.linode.com/v4beta/nodebalancers/123
    method: DELETE
  response:
    headers:
    status: 200 OK
    code: 200

NB In this case, the API returned a 400 indicating there were no statistics available for the NodeBalancer, this is OK