Prometheus Operator `ScrapeConfig`
- 6 minutes read - 1082 wordsTL;DR Enable ScrapeConfig
to use (discovery|target) proxies
I’ve developed a companion, local daemon (called ackalctld
) for Ackal that provides a functionally close version of the service.
One way to deploy ackalctld
is to use Kubernetes and it would be convenient if the Prometheus metrics were scrapeable by Prometheus Operator.
In order for this to work, Prometheus Operator needs to be able to scrape Google Cloud Run targets because ackalctld
creates Cloud Run services for its health check clients.
In order to auth to Cloud Run, Prometheus Operator must be able to generate id tokens with per-target audience.
As I’ve written before, I’ve two solutions:
- Scraping metrics exposed by Google Cloud Run services that require authentication with OIDC Token Proxy for GCP
- Prometheus HTTP Service Discovery of Cloud Run services
A general solution is to have the authenticating (to Google Cloud Run) proxy described in #2 above. Running such a proxy, permits the following Prometheus configuration:
scrape_configs:
- job_name: "Service Discovery of Cloud Run services"
scheme: http
proxy_url: http://proxy
http_sd_configs:
- url: http://service-discovery-endpoint
proxy_url: http://proxy
NOTE The above configuration is confusing but combines 2 uses of the (same) proxy. The first use of the proxy is to authenticate requests to the Service Discovery endpoint. This is described by the
proxy_url
setting as part ofhttp_sd_configs
. The second use of the proxy is to authenticate requests to the scrape targets returned by the Service Discovery endpoint.
NOTE The
scheme
ishttp
. This is to simplify the use of the locally-running proxy. Rather than have it proxy TLS connections too, the proxy receivesHTTP
requests but it upgrades (every) request toHTTPS
. Local connections are insecure. Remote (proxied) connections are secure.
In the case of ackalctld
, Service Discovery is performed locally and so the desired configuration is simpler:
scrape_configs:
- job_name: "Service Discovery of Cloud Run services"
scheme: http
proxy_url: http://proxy
http_sd_configs:
- url: http://service-discovery-endpoint
NOTE The second
proxy_url
, to configure the proxy to proxy requests to the Service Discovery endpoint is not needed. The remainingproxy_url
, proxies requests to Google Cloud Run to authenticate the requests to the scrape targets (only).
I discovered that Prometheus Operator includes a new Custom Resource ScrapeConfig
. The documentation is slightly inconsistent and I’ll include some updates in what follows.
Here’s the proposal for ScrapeConfig CRD.
Here’s Prometheus Operator API documentation for:
The variant that’s of interest for ackalctld
is http_sd
:
apiVersion: monitoring.coreos.com/v1alpha1
kind: ScrapeConfig
metadata:
name: http-sd
namespace: my-namespace
labels:
prometheus: system-monitoring-prometheus
app.kubernetes.io/name: scrape-config-example
spec:
httpSDConfigs:
- url: http://my-external-api/discovery
refreshInterval: 15s
NOTE
ScrapeConfig
ismonitoring.coreos.com/v1alpha1
However, it’s not currently possibly to represent proxies with Service Discovery with the Prometheus Operator and the version of Prometheus Operator that was deployed to my latest (!?) MicroK8s installation did not include the ScrapeConfig
(v1alpha
) CRDs.
As we would expect, HTTPSDConfig
follows Prometheus’ HTTP Service Discovery and expects the discovery endpoint (!) to return JSON in the HTTP_SD format
:
[
{
"targets": [ "<host>", ... ],
"labels": {
"<labelname>": "<labelvalue>", ...
}
},
...
]
Using ackalctld
deployed to Kubernetes, we can:
NAMESPACE="..."
SERVICE="..."
FILTER="{.spec.ports[?(@.name==\"service-disco\")].nodePort}"
PORT=$(\
kubectl get services/${SERVICE} \
--namespace=${NAMESPACE} \
--output=jsonpath="${FILTER}")
curl \
--silent \
--get http://localhost:${PORT} \
| jq -r .
Yielding:
[
{
"targets": [
"{host}.a.run.app:443"
],
"labels": {
"customer_id": "{customer_id}",
"endpoint": "{endpoint}",
"location": "{location}",
"period": "{period}"
}
},
...
]
As I mentioned, the Prometheus Operator deployed as an addon to MicroK8s was outdated. You will need v0.68.0 or more recent:
FILTER="{.spec.template.spec.containers[?(@.name==\"prometheus-operator\")].image}"
kubectl get deployment/prometheus-operator \
--namespace=monitoring \
--output=jsonpath="${FILTER}"
quay.io/prometheus-operator/prometheus-operator:v0.68.0
Although MicroK8s says that you can disable then re-enable addons to get the latest version, evidently the latest (supported) version with the latest MicroK8s is not v0.68.0 and so I disabled it, deleted the namespace and deployed the latest version from GitHub:
https://github.com/prometheus-operator/kube-prometheus
kubectl apply \
--server-side \
--filename=manifests/setup
kubectl wait \
--for condition=Established \
--all CustomResourceDefinition \
--namespace=monitoring
kubectl apply \
--filename=manifests/
And then:
FILTER="{.items[?(.metadata.name==\"scrapeconfigs.monitoring.coreos.com\")].metadata.name}"
kubectl get crds \
--output=jsonpath="${FILTER}"
scrapeconfigs.monitoring.coreos.com
I then – naively – tried to deploy a ScrapeConfig
:
apiVersion: monitoring.coreos.com/v1alpha1
kind: ScrapeConfig
metadata:
name: healthchecks
labels:
prometheus: system-monitoring-prometheus
app.kubernetes.io/name: healthchecks
spec:
httpSDConfigs:
- url: http://ackalctld.ackalcltd.svc.cluster.local
refreshInterval: 15s
But, nothing happened 😞
Interestingly, I realized that the operator’s Prometheus
Kind did not include any config for ScrapeConfig
. Specifically, there is no scrapeConfigSelector
nor (and it’s not documented) scrapeConfigNamespaceSelector
.
kubectl get prometheus/k8s \
--namespace=monitoring \
--output=jsonpath="{.spec}" \
| jq -r .
{
"alerting": {
"alertmanagers": [
{
"apiVersion": "v2",
"name": "alertmanager-main",
"namespace": "monitoring",
"port": "web"
}
]
},
"enableFeatures": [],
"evaluationInterval": "30s",
"externalLabels": {},
"image": "quay.io/prometheus/prometheus:v2.47.0",
"nodeSelector": {
"kubernetes.io/os": "linux"
},
"podMetadata": {
"labels": {
"app.kubernetes.io/component": "prometheus",
"app.kubernetes.io/instance": "k8s",
"app.kubernetes.io/name": "prometheus",
"app.kubernetes.io/part-of": "kube-prometheus",
"app.kubernetes.io/version": "2.47.0"
}
},
"podMonitorNamespaceSelector": {},
"podMonitorSelector": {},
"portName": "web",
"probeNamespaceSelector": {},
"probeSelector": {},
"replicas": 2,
"resources": {
"requests": {
"memory": "400Mi"
}
},
"ruleNamespaceSelector": {},
"ruleSelector": {},
"scrapeInterval": "30s",
"securityContext": {
"fsGroup": 2000,
"runAsNonRoot": true,
"runAsUser": 1000
},
"serviceAccountName": "prometheus-k8s",
"serviceMonitorNamespaceSelector": {},
"serviceMonitorSelector": {},
"version": "2.47.0"
}
I patch
‘ed the k8s
resource and added:
{
...
"scrapeConfigNamespaceSelector": {}
"scrapeConfigSelector": {}
...
}
NOTE The documentation suggests using a annotation
prometheus: system-monitoring-prometheus
but I left the configuration to include all ({}
)ScrapeConfig
’s.
After this change was made, Prometheus Service Discovery included the ScrapeConfig
and the Targets included the Cloud Run services returned by ackalctld
’s Service Discovery endpoint.
However, as expected, absent the ability to proxy the Cloud Run services (to add auth), when Prometheus tries to scrape the endpoints, it’s failing.
I’d hoped I could hack the proxy_url
into the configuration that’s generated by the Prometheus Operator and applied (by restarting) Prometheus.
It doesn’t work 😞
I suspect but don’t know that, the Operator is catching the update (to the Secret that’s used to contain the configuration) and reverting it (to the ScrapeConfig
configuration) before Prometheus has a change to scrape the endpoint successfully.
I tried:
NAMESPACE="..."
SCRAPECONFIG="..."
EXPR="
.scrape_configs[]
| select(.job_name==\"scrapeconfig/${SCRAPECONFIG}/${NAMESPACE}\")
"
kubectl get secret/prometheus-k8s \
--namespace=monitoring \
--output=jsonpath="{.data.prometheus\.yaml\.gz}" \
| base64 --decode \
| gunzip -c \
| yq eval \
--expression="${EXPR}"
job_name: scrapeconfig/ackalctld/ackalctld
http_sd_configs:
- url: http://ackalctld.ackalctld.svc.cluster.local:8080
And then PATCH
‘ing it:
NAMESPACE="..."
SCRAPECONFIG="..."
SERVICE="..."
PRXY="7777"
EXPR="
with(
.scrape_configs[]
| select(.job_name==\"scrapeconfig/${SCRAPECONFIG}/${NAMESPACE}\")
; .proxy_url|=\"http://${SERVICE}.${NAMESPACE}.svc.cluster.local:${PRXY}\"
)
"
VALUE=$(\
kubectl get secret/prometheus-k8s \
--namespace=monitoring \
--output=jsonpath="{.data.prometheus\.yaml\.gz}" \
| base64 --decode \
| gunzip -c \
| yq eval \
--expression="${EXPR}" \
| gzip -c \
| base64 --wrap=0)
PATCH="
[
{
\"op\":\"replace\",
\"path\":\"/data/prometheus.yaml.gz\",
\"value\" : \"${VALUE}\"
}
]"
kubectl patch secret/prometheus-k8s \
--namespace=monitoring \
--type=json \
--patch="${PATCH}"
So, the next step as shown in the TL;DR atop this post is to revise Prometheus Operator to reflect proxy_url
in ScrapeConfig
CRD.