Infrastructure as Code
- 6 minutes read - 1141 wordsProblem
I’m building an application that comprises:
- Kubernetes¹
- Kubernetes Operator
- Cloud Firestore
- Cloud Functions
- Cloud Run
- Cloud Endpoints
- Stripe
- Firebase Authentication
¹ - I’m using Google Kubernetes Engine (GKE) but may include other managed Kubernetes offerings (e.g. Digital Ocean, Linode, Oracle). GKE clusters are manageable by
gcloud
but other platforms require other CLI tools. All are accessible from bash but are these supported by e.g. Terraform (see below)?
Many of the components are packaged as container images and, because I’m using GitHub to host the project’s repos (I’ll leave the monorepo discussion for another post), I’ve become inculcated and use GitHub Container Registry (GHCR) as the container repo.
I’ve spent most of this week consolidating (!) the app’s deployment into a single repo so that I may most easily stand the app up and tear it down.
I’ve been using bash scripts for deployment and… this post…
I feel hypocritical because I frequently encourage other developers to not, e.g. write Python code that shells out to run gcloud
commands and yet, I find myself, writing bash scripts that use gcloud
extensively. Ok, I’m not shelling out from a process running some higher-level language code to do this, but I wonder whether bash scripts are best for… infrastructure as code.
So, why do I discourage… and I’ll use this as an exemplar… Python code calling out to gcloud
? For example:
print(subprocess.call(
"gcloud container clusters get-credentials {name} --zone={zone} --project={project}".format(
name=cluster_name,
zone=zone,
project=project)
shell=True)
)
My issues are:
- Loss of type, “pipe”‘ing everything into an out of
subprocess.call
as strings - Challenges with error handling
gcloud
is essentially CLI “sugar” over Google’s REST APIs- That Google provides definitive SDKs for all its services
- That
gcloud
is written in Python: python->shell->python->REST
Bash
So, why do I prefer to use gcloud
directly in bash?
- Familiarity
- Simplicity
- Comprehensiveness
- Consistency
I’m very familiar and comfortable with gcloud
. The tool is very easy to use and there are (increasingly) few(er) methods that aren’t available in gcloud
. For ~90% of what I need to deploy (see the list at the top of this list), I’m able to achieve the goal using gcloud
directly.
So, why do I dislike using bash?
- Lack of familiarity
- Lack of simplicity
- Lack of consistency
I’d consider myself almost a power user of bash, but, I learn new “tricks” frequently, I don’t consider it particularly simple to use particularly the inconsistency with comparisons, the syntax, and error handling.
I find non-simple bash scripts to be rather daunting too. My currently deployment script is more than 500 lines long. I’ve decomposed it into a set of functions but, (my use of) bash functions aren’t especially decomposed, continue to pollute the global namespace and have side-effects and my error handling still sucks.
Alternatives
What are the alternatives?
In order of likelihood:
- Golang
- Terraform
- Cloud Deployment Manager
curl
Golang
Given the fact that Google provides (Golang) SDKs for every service and that the majority of the app is written in Golang (there’s a bunch of JavaScript, bash and config too), I could write the deployment in Golang too.
The advantage would be familiarity, consistency, elegance, better (!?) error handling (if I’m not lazy).
The disadvantages include getting partway through and realizing there’s a missing critical component that I’d need to drop into bash to solve, rewriting ostensibly simple gcloud container clusters create
commands as 100+ lines of Golang code (?). And, probably the most important, duplicating efforts that’s been put into e.g. Terraform and Cloud Deployment Manager and that I’ll use once.
Terraform
This is the winning (cloud) infrastructure as code tool. Google develops a Terraform Provider for GCP and appears committed to the project: Using Terraform with Google Cloud
Time invested learning Terraform would be well spent as it’s probable I’ll use the tool on other projects.
My chief concerns with adopting Terraform are the catch-22 up-front investment learning Terraform so that I can use it (I should probably discount this concern) and – perhaps more importantly – the concern that, as with using Golang, I’d get a chunk of the way through the implementation only to find that a critical piece of the effort is not implemented in the provider.
Cloud Deployment Manager (CDM)
I included Google’s CDM in my list but I think it’s unlikely I’d choose it over Terraform. Google develops both Terraform and CDM solutions for GCP but CDM has always felt like a must-provide solution by Google that doesn’t seem (!?) to get sufficient investment.
Like Terraform, it also needs its own implementation of a provider for GCP. This is neither exhaustive nor complete.
Unfortunately, CDM is only applicable to GCP too and would either not work on non-GCP resources or would be difficult to implement.
curl
Since most everything is a REST API call, I could rewrite everything using curl
(or equivalent) calls. The only advantage would be direct access to the HTTP response object (and error codes). This would take a bunch of time. Is mostly string munging (see earlier criticism) and so I feel is not much better than bash.
Kubernetes
Kubernetes (Resources) are a critical component of this app. A combination of Custom Resource Definitions (CRDs) and a Kubernetes Operator are at the heart of the solution.
As part of the app’s “standup” and for testing, I need to be able to apply core and custom resources to the Kubernetes cluster.
In bash, I can leverage a combination of Kubernetes’ CLI (kubectl
) and my own Golang-written unit tests that e.g. create Firestore documents that (use Firestore) triggers to connect to the Kubernetes cluster to create Kubernetes resources.
It’s unclear how easily this functionality would map to the alternative tools. There’s a Terraform Provider for Kubernetes but, does this support CRDs? I’m confident that CDM doesn’t consider Kubernetes resources to be part of its remit and, while I could use curl
against the cluster, yeah, no.
Testing
As part of the deployment process, I’d like to test each step as I go:
- Was the GCP project provisioned?
- Were the Service Accounts and Keys created?
- Was the Firestore (default) database created?
- Were the Firestore Triggers deployed successfully?
- Were the Cloud Run services including the Cloud Endpoints proxy deployed?
- Was the Kubernetes cluster created? CRDs installed? Operator running?
The scope of these tests feels intimidating in anything other than bash.
I think there’s an argument for not using the same tool to test the app as to deploy the app but, I want to keep life simple(st).
Conclusion
I’m going to stick with bash for now.
For each component of the app, I’ve already written bash scripts to deploy and test the component and so, creating the all-up deployment script includes significant copy-and-paste. Although I am refactoring into bash functions to help with the scale of the script.
I continue to think the approach is sub-optimal but, for now, it’s what I’m doing.