Capturing e.g. CronJob metrics with GMP

The deployment of Kube State Metrics for Google Managed Prometheus creates both a PodMonitoring and ClusterPodMonitoring.

The PodMonitoring resource exposes metrics published on metric-self port (8081).

The ClusterPodMonitoring exposes metrics published on metric port (8080) but this doesn’t include cronjob-related metrics:

kubectl get clusterpodmonitoring/kube-state-metrics \
--output=jsonpath="{.spec.endpoints[0].metricRelabeling}" \
| jq -r .

[
  {
    "action": "keep",
    "regex": "kube_(daemonset|deployment|replicaset|pod|namespace|node|statefulset|persistentvolume|horizontalpodautoscaler|job_created)(_.+)?",
    "sourceLabels": [
      "__name__"
    ]
  }
]

NOTE The regex does not include kube_cronjob and only includes kube_job_created patterns.

Posts

Prometheus Operator support an auth proxy for Service Discovery

CRD linting

Returning to yesterday’s failing tests, it’s unclear how to introspect the E2E tests.

kubectl get namespaces

NAME                      STATUS   AGE
...
allns-s2os2u-0-90f56669   Active   22h
allns-s2qhuw-0-6b33d5eb   Active   4m23s

kubectl get all \
--namespace=allns-s2os2u-0-90f56669

No resources found in allns-s2os2u-0-90f56669 namespace.

kubectl get all \
--namespace=allns-s2qhuw-0-6b33d5eb

NAME                                                        READY   STATUS             RESTARTS   AGE
pod/prometheus-operator-6c96477b9c-q6qm2                    1/1     Running            0          4m12s
pod/prometheus-operator-admission-webhook-68bc9f885-nq6r8   0/1     ImagePullBackOff   0          4m7s

NAME                          TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/prometheus-operator   ClusterIP   10.152.183.247   <none>        443/TCP   4m9s

NAME                                                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prometheus-operator                     1/1     1            1           4m12s
deployment.apps/prometheus-operator-admission-webhook   0/1     1            0           4m7s

NAME                                                              DESIRED   CURRENT   READY   AGE
replicaset.apps/prometheus-operator-6c96477b9c                    1         1         1       4m13s
replicaset.apps/prometheus-operator-admission-webhook-68bc9f885   1         1         0       4m8s

kubectl logs deployment/prometheus-operator-admission-webhook \
--namespace=allns-s2qhuw-0-6b33d5eb

Error from server (BadRequest): container "prometheus-operator-admission-webhook" in pod "prometheus-operator-admission-webhook-68bc9f885-nq6r8" is waiting to start: trying and failing to pull image

NAME="prometheus-operator-admission-webhook"
FILTER="{.spec.template.spec.containers[?(@.name==\"${NAME}\")].image}"

kubectl get deployment/prometheus-operator-admission-webhook \
--namespace=allns-s2qjz2-0-fad82c03 \
--output=jsonpath="${FILTER}"

quay.io/prometheus-operator/admission-webhook:52d1e55af

Want:

Posts

Prometheus Operator support an auth proxy for Service Discovery

For ackalctld to be deployable to Kubernetes with Prometheus Operator, it is necessary to Enable ScrapeConfig to use (discovery|target) proxies #5966. While I’m familiar with Kubernetes, Kubernetes operators (Ackal uses one built with the Operator SDK) and Prometheus Operator, I’m unfamiliar with developing Prometheus Operator. This (and subsequent) posts will document some preliminary work on this.

Cloned Prometheus Operator

Branched scrape-config-url-proxy

I’m unsure how to effect these changes and unsure whether documentation exists.

Clearly, I will need to revise the ScrapeConfig CRD to add the proxy_url fields (one proxy_url defines a proxy for the Service Discovery endpoint; the second defines a proxy for the targets themselves) and it would be useful for this to closely mirror the existing Prometheus HTTP Service Discovery use, namely ,http_sd_config>:

Posts

Prometheus Operator `ScrapeConfig`

TL;DR Enable ScrapeConfig to use (discovery|target) proxies

I’ve developed a companion, local daemon (called ackalctld) for Ackal that provides a functionally close version of the service.

One way to deploy ackalctld is to use Kubernetes and it would be convenient if the Prometheus metrics were scrapeable by Prometheus Operator.

In order for this to work, Prometheus Operator needs to be able to scrape Google Cloud Run targets because ackalctld creates Cloud Run services for its health check clients.