Prometheus Operator `ScrapeConfig`

October 13, 2023 - 6 minutes read - 1082 words

TL;DR Enable ScrapeConfig to use (discovery|target) proxies

I’ve developed a companion, local daemon (called ackalctld) for Ackal that provides a functionally close version of the service.

One way to deploy ackalctld is to use Kubernetes and it would be convenient if the Prometheus metrics were scrapeable by Prometheus Operator.

In order for this to work, Prometheus Operator needs to be able to scrape Google Cloud Run targets because ackalctld creates Cloud Run services for its health check clients.

In order to auth to Cloud Run, Prometheus Operator must be able to generate id tokens with per-target audience.

As I’ve written before, I’ve two solutions:

A general solution is to have the authenticating (to Google Cloud Run) proxy described in #2 above. Running such a proxy, permits the following Prometheus configuration:

scrape_configs:
  - job_name: "Service Discovery of Cloud Run services"
    scheme: http
    proxy_url: http://proxy
    http_sd_configs:
    - url: http://service-discovery-endpoint
      proxy_url: http://proxy

NOTE The above configuration is confusing but combines 2 uses of the (same) proxy. The first use of the proxy is to authenticate requests to the Service Discovery endpoint. This is described by the proxy_url setting as part of http_sd_configs. The second use of the proxy is to authenticate requests to the scrape targets returned by the Service Discovery endpoint.

NOTE The scheme is http. This is to simplify the use of the locally-running proxy. Rather than have it proxy TLS connections too, the proxy receives HTTP requests but it upgrades (every) request to HTTPS. Local connections are insecure. Remote (proxied) connections are secure.

In the case of ackalctld, Service Discovery is performed locally and so the desired configuration is simpler:

scrape_configs:
  - job_name: "Service Discovery of Cloud Run services"
    scheme: http
    proxy_url: http://proxy
    http_sd_configs:
    - url: http://service-discovery-endpoint

NOTE The second proxy_url, to configure the proxy to proxy requests to the Service Discovery endpoint is not needed. The remaining proxy_url, proxies requests to Google Cloud Run to authenticate the requests to the scrape targets (only).

I discovered that Prometheus Operator includes a new Custom Resource ScrapeConfig. The documentation is slightly inconsistent and I’ll include some updates in what follows.

Here’s the proposal for ScrapeConfig CRD.

Here’s Prometheus Operator API documentation for:

monitoring.coreos/v1alpha
- ScrapeConfig
- HTTPSDConfig

The variant that’s of interest for ackalctld is http_sd:

apiVersion: monitoring.coreos.com/v1alpha1
kind: ScrapeConfig
metadata:
  name: http-sd
  namespace: my-namespace
  labels:
    prometheus: system-monitoring-prometheus
    app.kubernetes.io/name: scrape-config-example
spec:
  httpSDConfigs:
    - url: http://my-external-api/discovery
      refreshInterval: 15s

NOTE ScrapeConfig is monitoring.coreos.com/v1alpha1

However, it’s not currently possibly to represent proxies with Service Discovery with the Prometheus Operator and the version of Prometheus Operator that was deployed to my latest (!?) MicroK8s installation did not include the ScrapeConfig (v1alpha) CRDs.

As we would expect, HTTPSDConfig follows Prometheus’ HTTP Service Discovery and expects the discovery endpoint (!) to return JSON in the HTTP_SD format:

[
  {
    "targets": [ "<host>", ... ],
    "labels": {
      "<labelname>": "<labelvalue>", ...
    }
  },
  ...
]

Using ackalctld deployed to Kubernetes, we can:

NAMESPACE="..."
SERVICE="..."

FILTER="{.spec.ports[?(@.name==\"service-disco\")].nodePort}"

PORT=$(\
  kubectl get services/${SERVICE} \
  --namespace=${NAMESPACE} \
  --output=jsonpath="${FILTER}")

curl \
--silent \
--get http://localhost:${PORT} \
| jq -r .

Yielding:

[
  {
    "targets": [
      "{host}.a.run.app:443"
    ],
    "labels": {
      "customer_id": "{customer_id}",
      "endpoint": "{endpoint}",
      "location": "{location}",
      "period": "{period}"
    }
  },
  ...
]

As I mentioned, the Prometheus Operator deployed as an addon to MicroK8s was outdated. You will need v0.68.0 or more recent:

FILTER="{.spec.template.spec.containers[?(@.name==\"prometheus-operator\")].image}"

kubectl get deployment/prometheus-operator \
--namespace=monitoring \
--output=jsonpath="${FILTER}"

quay.io/prometheus-operator/prometheus-operator:v0.68.0

Although MicroK8s says that you can disable then re-enable addons to get the latest version, evidently the latest (supported) version with the latest MicroK8s is not v0.68.0 and so I disabled it, deleted the namespace and deployed the latest version from GitHub:

https://github.com/prometheus-operator/kube-prometheus

kubectl apply \
--server-side \
--filename=manifests/setup

kubectl wait \
--for condition=Established \
--all CustomResourceDefinition \
--namespace=monitoring

kubectl apply \
--filename=manifests/

And then:

FILTER="{.items[?(.metadata.name==\"scrapeconfigs.monitoring.coreos.com\")].metadata.name}"

kubectl get crds \
--output=jsonpath="${FILTER}"

scrapeconfigs.monitoring.coreos.com

I then – naively – tried to deploy a ScrapeConfig:

apiVersion: monitoring.coreos.com/v1alpha1
kind: ScrapeConfig
metadata:
  name: healthchecks
  labels:
    prometheus: system-monitoring-prometheus
    app.kubernetes.io/name: healthchecks
spec:
  httpSDConfigs:
    - url: http://ackalctld.ackalcltd.svc.cluster.local
      refreshInterval: 15s

But, nothing happened 😞

Interestingly, I realized that the operator’s Prometheus Kind did not include any config for ScrapeConfig. Specifically, there is no scrapeConfigSelector nor (and it’s not documented) scrapeConfigNamespaceSelector.

kubectl get prometheus/k8s \
--namespace=monitoring \
--output=jsonpath="{.spec}" \
| jq -r .

{
  "alerting": {
    "alertmanagers": [
      {
        "apiVersion": "v2",
        "name": "alertmanager-main",
        "namespace": "monitoring",
        "port": "web"
      }
    ]
  },
  "enableFeatures": [],
  "evaluationInterval": "30s",
  "externalLabels": {},
  "image": "quay.io/prometheus/prometheus:v2.47.0",
  "nodeSelector": {
    "kubernetes.io/os": "linux"
  },
  "podMetadata": {
    "labels": {
      "app.kubernetes.io/component": "prometheus",
      "app.kubernetes.io/instance": "k8s",
      "app.kubernetes.io/name": "prometheus",
      "app.kubernetes.io/part-of": "kube-prometheus",
      "app.kubernetes.io/version": "2.47.0"
    }
  },
  "podMonitorNamespaceSelector": {},
  "podMonitorSelector": {},
  "portName": "web",
  "probeNamespaceSelector": {},
  "probeSelector": {},
  "replicas": 2,
  "resources": {
    "requests": {
      "memory": "400Mi"
    }
  },
  "ruleNamespaceSelector": {},
  "ruleSelector": {},
  "scrapeInterval": "30s",
  "securityContext": {
    "fsGroup": 2000,
    "runAsNonRoot": true,
    "runAsUser": 1000
  },
  "serviceAccountName": "prometheus-k8s",
  "serviceMonitorNamespaceSelector": {},
  "serviceMonitorSelector": {},
  "version": "2.47.0"
}

I patch‘ed the k8s resource and added:

{
  ...
  "scrapeConfigNamespaceSelector": {}
  "scrapeConfigSelector": {}
  ...
}

NOTE The documentation suggests using a annotation prometheus: system-monitoring-prometheus but I left the configuration to include all ({}) ScrapeConfig’s.

After this change was made, Prometheus Service Discovery included the ScrapeConfig and the Targets included the Cloud Run services returned by ackalctld’s Service Discovery endpoint.

However, as expected, absent the ability to proxy the Cloud Run services (to add auth), when Prometheus tries to scrape the endpoints, it’s failing.

I’d hoped I could hack the proxy_url into the configuration that’s generated by the Prometheus Operator and applied (by restarting) Prometheus.

It doesn’t work 😞

I suspect but don’t know that, the Operator is catching the update (to the Secret that’s used to contain the configuration) and reverting it (to the ScrapeConfig configuration) before Prometheus has a change to scrape the endpoint successfully.

I tried:

NAMESPACE="..."
SCRAPECONFIG="..."

EXPR="
  .scrape_configs[]
  | select(.job_name==\"scrapeconfig/${SCRAPECONFIG}/${NAMESPACE}\")
"

kubectl get secret/prometheus-k8s \
--namespace=monitoring \
--output=jsonpath="{.data.prometheus\.yaml\.gz}" \
| base64 --decode \
| gunzip -c \
| yq eval \
  --expression="${EXPR}"

job_name: scrapeconfig/ackalctld/ackalctld
http_sd_configs:
  - url: http://ackalctld.ackalctld.svc.cluster.local:8080

And then PATCH‘ing it:

NAMESPACE="..."
SCRAPECONFIG="..."
SERVICE="..."

PRXY="7777"

EXPR="
  with(
    .scrape_configs[]
    | select(.job_name==\"scrapeconfig/${SCRAPECONFIG}/${NAMESPACE}\")
    ; .proxy_url|=\"http://${SERVICE}.${NAMESPACE}.svc.cluster.local:${PRXY}\"
  )
"

VALUE=$(\
  kubectl get secret/prometheus-k8s \
  --namespace=monitoring \
  --output=jsonpath="{.data.prometheus\.yaml\.gz}" \
  | base64 --decode \
  | gunzip -c \
  | yq eval \
    --expression="${EXPR}" \
  | gzip -c \
  | base64 --wrap=0)

PATCH="
[
  {
    \"op\":\"replace\",
    \"path\":\"/data/prometheus.yaml.gz\",
    \"value\" : \"${VALUE}\"
  }
]"
kubectl patch secret/prometheus-k8s \
--namespace=monitoring \
--type=json \
--patch="${PATCH}"

So, the next step as shown in the TL;DR atop this post is to revise Prometheus Operator to reflect proxy_url in ScrapeConfig CRD.