Capturing e.g. CronJob metrics with GMP

January 4, 2024 - 2 minutes read - 287 words

The deployment of Kube State Metrics for Google Managed Prometheus creates both a PodMonitoring and ClusterPodMonitoring.

The PodMonitoring resource exposes metrics published on metric-self port (8081).

The ClusterPodMonitoring exposes metrics published on metric port (8080) but this doesn’t include cronjob-related metrics:

kubectl get clusterpodmonitoring/kube-state-metrics \
--output=jsonpath="{.spec.endpoints[0].metricRelabeling}" \
| jq -r .

[
  {
    "action": "keep",
    "regex": "kube_(daemonset|deployment|replicaset|pod|namespace|node|statefulset|persistentvolume|horizontalpodautoscaler|job_created)(_.+)?",
    "sourceLabels": [
      "__name__"
    ]
  }
]

NOTE The regex does not include kube_cronjob and only includes kube_job_created patterns.

You will need to add a regex for kube_cronjob and kube_job metrics that you want in addition.

One way (!) to do this after you’ve deployed Kube State Metrics, is to kubectl patch the clusterpodmonitoring resource.

Of course, a better approach is to edit the Google-provided YAML before you Install Kube State Metrics per the comments inline in the YAML file.

VALUE="kube_(cronjob|daemonset|deployment|job|replicaset|pod|namespace|node|statefulset|persistentvolume|horizontalpodautoscaler)(_.+)?"

PATCH="
[
    {
        'op':'replace',
        'path': '/spec/endpoints/0/metricRelabeling/0/regex',
        'value':'${VALUE}'
    }
]"

kubectl patch clusterpodmonitoring/kube-state-metrics \
--type=json \
--patch="${PATCH}"

NOTE This (VALUE) includes 2 changes:

Adds all kube_cronjob_* metrics
Adds all kube_job_* metrics (removing the redundant kube_job_created_* metrics)

You can demonstrate that the metrics are now scraped by Cloud Monitoring using metrics explorer (prometheus.googleapis.com/kube_cronjob_next_schedule_time/gauge) or using APIs Explorer for Cloud Monitoring’s Prometheus API:

PROJECT="..." # Your Project ID
ENDPOINT="https://monitoring.googleapis.com/v1/projects/${PROJECT}/location/global/prometheus/api/v1/query"

TOKEN="$(gcloud auth print-access-token)"

METRIC="kube_cronjob_next_schedule_time"

curl \
--silent \
--request POST \
--header "Authorization: Bearer ${TOKEN}" \
--header "Accept: application/json"   \
--header "Content-Type: application/json"   \
--data "{\"query\":\"${METRIC}\"}" \
${ENDPOINT} \
| jq -r .

{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "kube_cronjob_next_schedule_time",
          "cluster": "...",
          "cronjob": "hello",
          "instance": "kube-state-metrics-0:metrics",
          "job": "kube-state-metrics",
          "location": "...",
          "namespace": "test",
          "project_id": "..."
        },
        "value": [
          1703893639.8,
          "1703893680"
        ]
      }
    ]
  }
}

NOTE In this case I’d created a CronJob called hello in test namespace.