Automatic Certs w/ Golang gRPC service on Compute Engine
- 6 minutes read - 1224 wordsI needed to deploy a healthcheck-enabled gRPC TLS-enabled service. Fortunately, most (all?) of the SDKs include an implementation, e.g. Golang has grpc-go/health
.
I learned in my travels that:
- DigitalOcean [App] platform does not (link) work with TLS-based gRPC apps.
- Fly has a regression (link) that breaks gRPC
So, I resorted to Google Cloud Platform (GCP). Although Cloud Run would be well-suited to running the gRPC app, it uses a proxy|sidecar to provision a cert for the app and I wanted to be able to (easily use a custom domain) and give myself a somewhat general-purpose solution.
autocert
Golang includes autocert
as part of the Go Cryptograph module. The autocert package “provides automatic access to certificates from Let’s Encrypt and any other ACME-based CA.” which ideal.
I had some facepalm getting autocert
to work (see “Aside”) but, once I understood my error, the module works flawlessly.
Aside:
autocert
logging
autocert
doesn’t log. It’s methods and functions return errors but, when I used the package, none of the methods I used returned errors. A consequence of this was that, when initially I was unable to get the library to work, debugging the issue was a challenge.Ultimately, I cloned the module locally,
replaced
the published version with the clone and added a bunch of logging statements to it. After doing this, my error (!) surfaced almost immediately and I was able to resolve it.
The gist of the working code is really as simple as the documentation suggests (although it didn’t include the callback handler shown below):
// Create Manager using defaults
// `path` is the location where certificates will be cached
// `host` is the fully-qualified domain name(s) for this server
// `email` is where ACME service can send notifications
m := &autocert.Manager{
Prompt: autocert.AcceptTOS,
Cache: autocert.DirCache(path),
HostPolicy: autocert.HostWhitelist(host),
Email: email,
}
// Register callback handler
go func() {
if err := http.ListenAndServe(":http", m.HTTPHandler(nil)); err != nil {
log.Fatalf("HTTP failure\n%s", err)
}
}()
// Once the ACME flow completes, Manager provides a TLS config
c := m.TLSConfig()
// The TLS config can be used as gRPC credentials
opts := grpc.Creds(credentials.NewTLS(c))
// To define a TLS-enabled gRPC server
server := grpc.NewServer(opts)
// Register the SDK-provided healthcheck implementation
healthcheck := health.NewServer()
healthpb.RegisterHealthServer(server, healthcheck)
// Run the gRPC server
if err := server.Serve(lis); err != nil {
log.Fatalf("gRPC failure\n%s", err)
}
I containerized the app, you can grab the image here
It’s challenging to test services that acquire TLS certificates because the server must be exposed and available on the domain name for which the certificate is issued. There are ways around this (ngrok
, inlets, wireguard) but, given the cloud, there are as-easy alternatives. I chose to use Compute Engine but using a VM that’s oriented to only running containers.
Compute Engine
BILLING="[[YOUR-BILLING]]"
PROJECT="[[YOUR-PROJECT]]"
ZONE="[[YOUR-ZONE]]"
INSTANCE="autocert"
IMAGE="ghcr.io/brabantcourt/autocert:094a49007286bedc6db20b7397e0aa92465c3b32"
HOST="[[YOUR-HOST]]"
PORT="443" # Or your preference
STARTUP=... # See below
gcloud compute instances create-with-container ${INSTANCE} \
--container-image=${IMAGE} \
--container-arg=--host=${HOST} \
--container-arg=--port=${PORT} \
--container-arg=--path=/certs \
--tags=http-server,https-server \
--machine-type=f1-micro \
--image-family=cos-stable \
--image-project=cos-cloud \
--container-mount-host-path=mount-path=/certs,host-path=/tmp/certs,mode=rw \
--zone=${ZONE} \
--project=${PROJECT} \
--metadata-from-file=startup-script=${STARTUP}
The commands creates a Compute Engine VM running Google’s Container-Optimized OS. This is a minimal OS that’s tailored to running containers. The Container-Optimized OS permits writing to /tmp
(see file system) and the gcloud
command mounts the host’s /tmp/certs
into the container’s /certs
(read-writable). This is where the container can read|write certs that it acquires.
But, in order not to lose these certs, it’s a good idea to copy any certificates that are created to a more persistent (than a container and than a VM) file system. I’m using Google Cloud Storage (GCS) but only for recreating the container. I leave it to the reader to explore persisting certificates to GCS (or elsewhere) perhaps before the VM (!) is terminated (when the certs will otherwise be lost).
The command expects a startup script (`${STARTUP}) and I’m using this to copy certificates from a GCS bucket into the VM on creation.
startup.sh
:
#!/usr/bin/env bash
TOKEN=$(\
curl \
--silent \
--header "Metadata-Flavor: Google" \
http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token)
ACCESS=$(\
echo ${TOKEN} \
| grep --extended-regexp --only-matching "(ya29.[0-9a-zA-Z._-]*)")
HOST="HOST"
BUCKET="BUCKET"
DIR="/tmp/certs"
mkdir -p ${DIR}
# Copy ACME account key and any hosts
for FILE in "acme_account+key" "${HOST}"
do
echo "File: ${FILE}"
curl \
--silent \
--request GET \
--header "Authorization: Bearer ${ACCESS}" \
--output ${DIR}/${FILE} \
https://www.googleapis.com/storage/v1/b/${BUCKET}/o/${FILE}?alt=media
done
The script is straightforward. It uses the Metadata service to obtain an access token for the VM’s identity (service account) and uses this to authenticate to GCS so that it may copy files (acme_account+key
and any existing host certs) from the buckets to the cache (/tmp/certs
) on the VM.
If you don’t have existing certs, you’ve nothing to copy and the startup script may be excluded. If you do create certs during the lifetime of a VM, I recommend you copying them local for future use. You can use:
gcloud compute scp --recurse ${INSTANCE}:/tmp/certs ${PWD} \
--project=${PROJECT}
Unless you want to pay for a static IP, each time the VM is restarted, it’s going to obtain an IP address from Google’s pool. You’ll need to program your DNS service with the updated IP too.
I’m using Google Domains and it has an interesting feature named Dynamic DNS. Interestingly, this feature is intended to be used to e.g. name machines on home networks which are often behind internet providers that use DHCP to issue your home network’s IP address(es).
I’m taking advantage of this mechanism to dynamic program my Google Domain hosted domains (!) with the IP address obtain from the VM.
Customarily, one can get a Compute Engine VM’s first public IP address with:
gcloud compute instances describe ${INSTANCE} \
--zone=${ZONE} \
--project=${PROJECT} \
--format="value(networkInterfaces[0].accessConfigs[0].natIP)"
But, I was unable to call Google’s Google Domain endpoint with a specific IP. The endpoint always used the caller’s IP address. And so, I’m using the fact that the Compute Engine VM can make the call to Google Domains itself:
USER="[[GOOGLE_DOMAINS_USER]]"
PASS="[[GOOGLE_DOMAINS_PASS]]"
COMMAND="curl https://${USER}:${PASS}@domains.google.com/nic/update?hostname=${HOST}"
# SSH into the Compute Engine VM and run the command to update Google Domains
gcloud compute ssh ${INSTANCE} \
--zone=${ZONE} \
--project=${PROJECT} \
--command="${COMMAND}"
You’ll need to configure your domain in Google Domains per Google’s instructions and then you can grab the username and password from “View credentials” on the Dynamic DNS tab.
NOTE These are not Google account credentials but credentials generated by Google Domains solely for us with this Dynamic DNS host.
If, like me, your scripting this entire process, you may want to block after you gcloud compute instances create-with-container
before you use gcloud compute ssh
to give the VM time to be provisioned and started. The following should suffice:
until [ "RUNNING" = $(gcloud compute instances describe autocert --zone=${ZONE} --project=${PROJECT} --format="value(status)") ]
do
sleep 5s
done
Testing
Once the container is deployed, you may want to check that it’s running correctly and that any certificates have been copied:
# List containers
COMMAND="docker container ls"
# Follow logs from autocert
# You'll need to determine the container's ID (or name)
ID="..."
COMMAND="docker container logs --follow ${ID}"
# List certs
COMMAND="ls -la tmp/certs"
gcloud compute ssh ${INSTANCE} \
--zone=${ZONE} \
--project=${PROJECT} \
--command="${COMMAND}"
The proof though is whether you can e.g. gRPCurl
the server using TLS (and its FQDN):
# If you don't have health.proto
wget https://raw.githubusercontent.com/grpc/grpc/master/src/proto/grpc/health/v1/health.proto \
--output ${PWD}/health.proto
grpcurl \
-proto ./health.proto \
${HOST}:${PORT} \
grpc.health.v1.Health/Check
You expect:
{
"status": "SERVING"
}
Hope that helps!