Prometheus Exporter for Azure (Container Apps)
- 7 minutes read - 1283 wordsI’ve written Prometheus Exporters for various cloud platforms. My motivation for writing these Exporters is that I want a unified mechanism to track my usage of these platform’s services. It’s easy to deploy a service on a platform and inadvertently leave it running (up a bill). The set of exporters is:
- Prometheus Exporter for Azure
- Prometheus Exporter for Fly.io
- Prometheus Exporter for GCP
- Prometheus Exporter for Linode
- Prometheus Exporter for Vultr
This post describes the recently-added Azure Exporter which only provides metrics for Container Apps and Resource Groups.
NOTE The Exporters are self-similar and for consistency (across Exporters) employ patterns that I don’t advocate including:
main.go
in the module root folder- inconsistent metric naming
- The GCP Exporter uses Google’s API Client Libraries exclusively
At some point, I’ll spend the time to update the Exporters.
In the case of this Azure Exporter, my singular goal is to have an Alerting rule:
name: azure_container_apps_running
expr: min_over_time(azure_container_apps_total[15m]) > 0
for: 6h
labels:
severity: page
annotations:
summary: Azure Container Apps ({{ $value }}) running (resource group: {{ $labels.resourcegroup }})
I want to know if I inadvertently leave a Container App deployed for more than 6 hours.
As with the other Exporters in this series, they’re implemented in Golang and using the Prometheus Go client library including promhttp
.
main.go
registers the 2 Azure-specific collectors and an ExporterCollector
that provides start_time
and build_info
metrics for the Exporter itself. main.go
also constructs account
, subscription
and creds
(credentials) values that are used to create new instances of the Azure collectors.
account
uses the Exporter’s azure
package. This provides resources intended to be shared across collectors. In the case of Azure, resource groups provide a container of Azure resources (including Container Apps). To save repeatedly enumerating this list of resource groups, the azure
package, enumerates the resource groups once, caches the results in a map and shares this map across the collectors.
NOTE This cacheing of resource groups means that changes to the set of resource groups aren’t automatically found by the Exporter and it must be restarted to reflect these. This functionality could be improved by expiring the cached value.
subscription
corresponds to the Azure Subscription ID which the Prometheus collectors will use to gather metrics. You can determine this using the Azure Portal: Subscriptions or the Azure CLI: az accounts list
perhaps even az account list --output=json | jq -r .[0].id
.
The creds
are created using []azidentity.NewDefaultAzureCredential
](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity#NewDefaultAzureCredential) which is a developer-friendly mechanism. When running locally, this function grabs the az
CLI’s authenticated user credentials. When running in a container (as this Exporter will), it’s possible to configure the client using environment variables (AZURE_CLIENT_ID
, AZURE_TENANT_ID
and AZURE_CLIENT_CERTIFICATE_PATH
) to acquire credentials from an Azure Service Principal without having to change the code. I’ll explain how this is done later.
// Environment variables
subscription, ok := os.LookupEnv(envSubscription)
if !ok {
log.Fatalf("Expected environment to contain `%s`",
envSubscription,
)
}
// Azure Identity (uses local `az` CLI credentials)
creds, err := azidentity.NewDefaultAzureCredential(nil)
if err != nil {
log.Fatal(err)
}
// Object that holds Azure-specific resources (e.g. Resource Groups)
account := azure.NewAccount()
registry := prometheus.NewRegistry()
registry.MustRegister(collector.NewExporterCollector(
OSVersion,
GoVersion,
GitCommit,
StartTime,
))
registry.MustRegister(collector.NewContainerAppsCollector(
account,
subscription,
creds,
))
registry.MustRegister(collector.NewResourceGroupsCollector(
account,
subscription,
creds,
))
The code for the Collectors is similar and so I’ll focus on ContainerAppsCollector
. Each Collector exposes a single (gauge) metric:
ContainerAppsCollector
exportscontainer_apps_total
ResourceGroupsCollector
exportsresource_groups_total
The collector
package includes a struct (ContainerAppsCollector
) for each Collector and this implements Prometheus’ prometheus.Collector
interface of two methods: Collect
and Describe
.
// ContainerAppsCollector represents Azure Container Apps
type ContainerAppsCollector struct {
account *azure.Account
client *armappcontainers.ContainerAppsClient
Apps *prometheus.Desc
}
Each Collector includes a function to create a new Collector (e.g. NewContainerAppsCollector
) which uses a service-specific client (e.g. armappcontainers.ContainerAppsClient
created from a client factory (e.g. armappcontainers.NewClientFactory
) using the previously described subscription
and creds
.
NOTE Being new to Azure, I was confused by the recurring use of
arm
in Azure-related content until I realized that it’s an acronym of Azure Resource Manager.
clientFactory, err := armappcontainers.NewClientFactory(
subscription,
creds,
nil,
)
if err != nil {
log.Print(err)
}
client := clientFactory.NewContainerAppsClient()
The function completes by returning the account
, the just-created client
and appropriate Prometheus metric descriptors (prometheus.Desc
):
return &ContainerAppsCollector{
account: account,
client: client,
Apps: prometheus.NewDesc(
prometheus.BuildFQName(namespace, subsystem, "total"),
"Number of Container Apps deployed",
[]string{
"resourcegroup",
},
nil,
),
}
Each Collector then needs to implement Collect
and Describe
methods. The latter are trivial and add the Prometheus metric descriptors to a channel. The Collect
method must enumerate the Cloud service’s resource (e.g. Container Apps) create and add Prometheus metrics (prometheus.Metric
):
ctx := context.Background()
var wg sync.WaitGroup
for _, resourcegroup := range c.account.ResourceGroups {
wg.Add(1)
go func(rg *armresources.ResourceGroup) {
defer wg.Done()
count := 0
pager := c.client.NewListByResourceGroupPager(
to.String(rg.Name),
nil,
)
for pager.More() {
page, err := pager.NextPage(ctx)
if err != nil {
log.Print(err)
}
containerapps := page.Value
count += len(containerapps)
}
ch <- prometheus.MustNewConstMetric(
c.Apps,
prometheus.GaugeValue,
float64(count),
[]string{
to.String(rg.Name),
}...,
)
}(resourcegroup)
}
wg.Wait()
To enumerate the Azure Subscription’s (!) Resource Groups, the Collect
method must iterate over the previously enumerated ResourceGroups
(now cached in azure.Account
). Each iteration spawns a Go routine which uses the client (armappcontainers.ContainerAppsClient
) to page through the results tallying a count
. This count
value becomes the Apps
’s metric value that is added to the channel at the Collect
method’s completion.
As long as you have an Azure CLI client that’s authenticated, you can invoke the Exporter:
SUBSCRIPTION="..." # Azure Subscription ID
PORT="8080"
go run .
And, all being well, you can show the exported metrics with:
curl --silent --get http://localhost:${PORT}/metrics
Which should yield something of the form:
# HELP azure_container_apps_total Number of Container Apps deployed
# TYPE azure_container_apps_total gauge
azure_container_apps_total{resourcegroup="foo"} 1
# HELP azure_exporter_build_info A metric with a constant '1' value labeled by OS version, Go version, and the Git commit of the exporter
# TYPE azure_exporter_build_info counter
azure_exporter_build_info{git_commit="",go_version="go1.20",os_version=""} 1
# HELP azure_exporter_start_time Exporter start time in Unix epoch seconds
# TYPE azure_exporter_start_time gauge
azure_exporter_start_time 1.234567890e+09
# HELP azure_resource_groups_total Number of Resource Groups
# TYPE azure_resource_groups_total gauge
azure_resource_groups_total 1
And, the previously desired Prometheus rule:
- name: azure_exporter
rules:
- alert: azure_container_apps_running
expr: min_over_time(azure_container_apps_total{}[15m]) > 0
for: 2h
labels:
severity: page
annotations:
summary: "Azure Container Apps ({{ $value }}) running (resource group: {{ $labels.resourcegroup }})"
I mentioned above that, when deploying the Exporter as e.g. a container, a different auth mechanism is needed than, when running locally when you can leverage the credentials used by the Azure CLI (az
). To address this, I used an Azure Service Principal, assigned it a Reader
role and provided an X509 certificate:
NAME="azure-exporter"
# Create self-signed cert
openssl req \
-x509 \
-newkey rsa:4096 \
-keyout ${NAME}.key \
-out ${NAME}.crt \
-sha256 \
-days 365 \
-nodes \
-subj "/CN=${NAME}"
# Need key+crt PEMs combined
cat ${NAME}.key >> ${NAME}.key+crt
cat ${NAME}.crt >> ${NAME}.key+crt
# Create Service Principal
SUBSCRIPTION="..." # Azure Subscription ID
GROUP="..." # Azure Resource Group ID
az ad sp create-for-rbac \
--name=${NAME} \
--role="Reader" \
--scopes="/subscriptions/${SUBSCRIPTION}/resourceGroups/${GROUP}" \
--cert=@${PWD}/${NAME}.crt
The az ad sp create-for-rbac
command yields JSON of the form:
"appId": "{AZURE_CLIENT_ID}",
"displayName": "{NAME}",
"password": null,
"tenant": "{AZURE_TENANT_ID}"
}
You’ll need the value for AZURE_CLIENT_ID
and AZURE_TENANT_ID
for the next step.
NAME="azure-exporter"
SUBSCRIPTION="..." # Azure Subscription ID
AZURE_CLIENT_ID="..." # Use values from Service Principal
AZURE_TENANT_ID="..."
AZURE_CLIENT_CERTIFICATE_PATH="${PWD}/${NAME}.key+crt"
PORT="8080"
podman run \
--interactive --tty --rm \
--name=azure-exporter \
--env=SUBSCRIPTION=${SUBSCRIPTION} \
--env=AZURE_CLIENT_ID=${AZURE_CLIENT_ID} \
--env=AZURE_TENANT_ID=${AZURE_TENANT_ID} \
--env=AZURE_CLIENT_CERTIFICATE_PATH=/secrets/${NAME}.key+crt \
--volume=${AZURE_CLIENT_CERTIFICATE_PATH}:/secrets/${NAME}.key+crt \
--publish=${PORT}:${PORT}/tcp \
ghcr.io/dazwilkin/azure-exporter:v0.0.2 \
--endpoint=0.0.0.0:${PORT} \
--path="/metrics"
NOTE
AZURE_CLIENT_CERTIFICATE_PATH
refers to the container’s (!) folder (not the host’s)- The
--volume
mapping maps the host’sAZURE_CLIENT_CERTIFICATE_PATH
to the container’s/secrets
folder- In the above, the same name is used for the host’s and container’s
key+crt
filename- In the above, the same
HOST:CONTAINER
port value ({PORT}
) is used- The value of
--endpoint=0.0.0.0:${PORT}
refers to the container’s (!) port.