Prometheus Operator `ScrapeConfig`
- 6 minutes read - 1082 wordsTL;DR Enable ScrapeConfig to use (discovery|target) proxies
I’ve developed a companion, local daemon (called ackalctld) for Ackal that provides a functionally close version of the service.
One way to deploy ackalctld is to use Kubernetes and it would be convenient if the Prometheus metrics were scrapeable by Prometheus Operator.
In order for this to work, Prometheus Operator needs to be able to scrape Google Cloud Run targets because ackalctld creates Cloud Run services for its health check clients.
In order to auth to Cloud Run, Prometheus Operator must be able to generate id tokens with per-target audience.
As I’ve written before, I’ve two solutions:
- Scraping metrics exposed by Google Cloud Run services that require authentication with OIDC Token Proxy for GCP
- Prometheus HTTP Service Discovery of Cloud Run services
A general solution is to have the authenticating (to Google Cloud Run) proxy described in #2 above. Running such a proxy, permits the following Prometheus configuration:
scrape_configs:
  - job_name: "Service Discovery of Cloud Run services"
    scheme: http
    proxy_url: http://proxy
    http_sd_configs:
    - url: http://service-discovery-endpoint
      proxy_url: http://proxy
NOTE The above configuration is confusing but combines 2 uses of the (same) proxy. The first use of the proxy is to authenticate requests to the Service Discovery endpoint. This is described by the
proxy_urlsetting as part ofhttp_sd_configs. The second use of the proxy is to authenticate requests to the scrape targets returned by the Service Discovery endpoint.
NOTE The
schemeishttp. This is to simplify the use of the locally-running proxy. Rather than have it proxy TLS connections too, the proxy receivesHTTPrequests but it upgrades (every) request toHTTPS. Local connections are insecure. Remote (proxied) connections are secure.
In the case of ackalctld, Service Discovery is performed locally and so the desired configuration is simpler:
scrape_configs:
  - job_name: "Service Discovery of Cloud Run services"
    scheme: http
    proxy_url: http://proxy
    http_sd_configs:
    - url: http://service-discovery-endpoint
NOTE The second
proxy_url, to configure the proxy to proxy requests to the Service Discovery endpoint is not needed. The remainingproxy_url, proxies requests to Google Cloud Run to authenticate the requests to the scrape targets (only).
I discovered that Prometheus Operator includes a new Custom Resource ScrapeConfig. The documentation is slightly inconsistent and I’ll include some updates in what follows.
Here’s the proposal for ScrapeConfig CRD.
Here’s Prometheus Operator API documentation for:
The variant that’s of interest for ackalctld is http_sd:
apiVersion: monitoring.coreos.com/v1alpha1
kind: ScrapeConfig
metadata:
  name: http-sd
  namespace: my-namespace
  labels:
    prometheus: system-monitoring-prometheus
    app.kubernetes.io/name: scrape-config-example
spec:
  httpSDConfigs:
    - url: http://my-external-api/discovery
      refreshInterval: 15s
NOTE
ScrapeConfigismonitoring.coreos.com/v1alpha1
However, it’s not currently possibly to represent proxies with Service Discovery with the Prometheus Operator and the version of Prometheus Operator that was deployed to my latest (!?) MicroK8s installation did not include the ScrapeConfig (v1alpha) CRDs.
As we would expect, HTTPSDConfig follows Prometheus’ HTTP Service Discovery and expects the discovery endpoint (!) to return JSON in the HTTP_SD format:
[
  {
    "targets": [ "<host>", ... ],
    "labels": {
      "<labelname>": "<labelvalue>", ...
    }
  },
  ...
]
Using ackalctld deployed to Kubernetes, we can:
NAMESPACE="..."
SERVICE="..."
FILTER="{.spec.ports[?(@.name==\"service-disco\")].nodePort}"
PORT=$(\
  kubectl get services/${SERVICE} \
  --namespace=${NAMESPACE} \
  --output=jsonpath="${FILTER}")
curl \
--silent \
--get http://localhost:${PORT} \
| jq -r .
Yielding:
[
  {
    "targets": [
      "{host}.a.run.app:443"
    ],
    "labels": {
      "customer_id": "{customer_id}",
      "endpoint": "{endpoint}",
      "location": "{location}",
      "period": "{period}"
    }
  },
  ...
]
As I mentioned, the Prometheus Operator deployed as an addon to MicroK8s was outdated. You will need v0.68.0 or more recent:
FILTER="{.spec.template.spec.containers[?(@.name==\"prometheus-operator\")].image}"
kubectl get deployment/prometheus-operator \
--namespace=monitoring \
--output=jsonpath="${FILTER}"
quay.io/prometheus-operator/prometheus-operator:v0.68.0
Although MicroK8s says that you can disable then re-enable addons to get the latest version, evidently the latest (supported) version with the latest MicroK8s is not v0.68.0 and so I disabled it, deleted the namespace and deployed the latest version from GitHub:
https://github.com/prometheus-operator/kube-prometheus
kubectl apply \
--server-side \
--filename=manifests/setup
kubectl wait \
--for condition=Established \
--all CustomResourceDefinition \
--namespace=monitoring
kubectl apply \
--filename=manifests/
And then:
FILTER="{.items[?(.metadata.name==\"scrapeconfigs.monitoring.coreos.com\")].metadata.name}"
kubectl get crds \
--output=jsonpath="${FILTER}"
scrapeconfigs.monitoring.coreos.com
I then – naively – tried to deploy a ScrapeConfig:
apiVersion: monitoring.coreos.com/v1alpha1
kind: ScrapeConfig
metadata:
  name: healthchecks
  labels:
    prometheus: system-monitoring-prometheus
    app.kubernetes.io/name: healthchecks
spec:
  httpSDConfigs:
    - url: http://ackalctld.ackalcltd.svc.cluster.local
      refreshInterval: 15s
But, nothing happened 😞
Interestingly, I realized that the operator’s Prometheus Kind did not include any config for ScrapeConfig. Specifically, there is no scrapeConfigSelector nor (and it’s not documented) scrapeConfigNamespaceSelector.
kubectl get prometheus/k8s \
--namespace=monitoring \
--output=jsonpath="{.spec}" \
| jq -r .
{
  "alerting": {
    "alertmanagers": [
      {
        "apiVersion": "v2",
        "name": "alertmanager-main",
        "namespace": "monitoring",
        "port": "web"
      }
    ]
  },
  "enableFeatures": [],
  "evaluationInterval": "30s",
  "externalLabels": {},
  "image": "quay.io/prometheus/prometheus:v2.47.0",
  "nodeSelector": {
    "kubernetes.io/os": "linux"
  },
  "podMetadata": {
    "labels": {
      "app.kubernetes.io/component": "prometheus",
      "app.kubernetes.io/instance": "k8s",
      "app.kubernetes.io/name": "prometheus",
      "app.kubernetes.io/part-of": "kube-prometheus",
      "app.kubernetes.io/version": "2.47.0"
    }
  },
  "podMonitorNamespaceSelector": {},
  "podMonitorSelector": {},
  "portName": "web",
  "probeNamespaceSelector": {},
  "probeSelector": {},
  "replicas": 2,
  "resources": {
    "requests": {
      "memory": "400Mi"
    }
  },
  "ruleNamespaceSelector": {},
  "ruleSelector": {},
  "scrapeInterval": "30s",
  "securityContext": {
    "fsGroup": 2000,
    "runAsNonRoot": true,
    "runAsUser": 1000
  },
  "serviceAccountName": "prometheus-k8s",
  "serviceMonitorNamespaceSelector": {},
  "serviceMonitorSelector": {},
  "version": "2.47.0"
}
I patch‘ed the k8s resource and added:
{
  ...
  "scrapeConfigNamespaceSelector": {}
  "scrapeConfigSelector": {}
  ...
}
NOTE The documentation suggests using a annotation
prometheus: system-monitoring-prometheusbut I left the configuration to include all ({})ScrapeConfig’s.
After this change was made, Prometheus Service Discovery included the ScrapeConfig and the Targets included the Cloud Run services returned by ackalctld’s Service Discovery endpoint.
However, as expected, absent the ability to proxy the Cloud Run services (to add auth), when Prometheus tries to scrape the endpoints, it’s failing.
I’d hoped I could hack the proxy_url into the configuration that’s generated by the Prometheus Operator and applied (by restarting) Prometheus.
It doesn’t work 😞
I suspect but don’t know that, the Operator is catching the update (to the Secret that’s used to contain the configuration) and reverting it (to the ScrapeConfig configuration) before Prometheus has a change to scrape the endpoint successfully.
I tried:
NAMESPACE="..."
SCRAPECONFIG="..."
EXPR="
  .scrape_configs[]
  | select(.job_name==\"scrapeconfig/${SCRAPECONFIG}/${NAMESPACE}\")
"
kubectl get secret/prometheus-k8s \
--namespace=monitoring \
--output=jsonpath="{.data.prometheus\.yaml\.gz}" \
| base64 --decode \
| gunzip -c \
| yq eval \
  --expression="${EXPR}"
job_name: scrapeconfig/ackalctld/ackalctld
http_sd_configs:
  - url: http://ackalctld.ackalctld.svc.cluster.local:8080
And then PATCH‘ing it:
NAMESPACE="..."
SCRAPECONFIG="..."
SERVICE="..."
PRXY="7777"
EXPR="
  with(
    .scrape_configs[]
    | select(.job_name==\"scrapeconfig/${SCRAPECONFIG}/${NAMESPACE}\")
    ; .proxy_url|=\"http://${SERVICE}.${NAMESPACE}.svc.cluster.local:${PRXY}\"
  )
"
VALUE=$(\
  kubectl get secret/prometheus-k8s \
  --namespace=monitoring \
  --output=jsonpath="{.data.prometheus\.yaml\.gz}" \
  | base64 --decode \
  | gunzip -c \
  | yq eval \
    --expression="${EXPR}" \
  | gzip -c \
  | base64 --wrap=0)
PATCH="
[
  {
    \"op\":\"replace\",
    \"path\":\"/data/prometheus.yaml.gz\",
    \"value\" : \"${VALUE}\"
  }
]"
kubectl patch secret/prometheus-k8s \
--namespace=monitoring \
--type=json \
--patch="${PATCH}"
So, the next step as shown in the TL;DR atop this post is to revise Prometheus Operator to reflect proxy_url in ScrapeConfig CRD.