Automating Scheduled Firestore Exports
- 4 minutes read - 708 wordsFor my “thing”, I use Firestore to persist state. I like Firestore a lot and, having been around Google for almost (!) a decade, I much prefer it to Datastore.
Firestore has a managed export|import service and I use this to backup Firestore collections|documents.
I’d been doing backups manually (using gcloud
) and decided today to take the plunge and use Cloud Scheduler for the first time. I’d been reluctant to do this until now because I’d assumed incorrectly that I’d need to write a wrapping service to invoke the export.
While walking my dog this morning, I realized that if (it does) Cloud Scheduler supported plain-old HTTP requests then, because every gcloud
command is actually an underlying HTTP/REST API invocation, I could invoke any gcloud
command periodically by just converting the command into the underlying REST call. Tada!
The easiest way to determine the underlying REST calls for a gcloud
command is to append --log-http
to the command, i.e.:
gcloud firestore export gs://${BUCKET} \
--async \
--project=${PROJECT} \
--format="value(name.scope(operations))" \
--log-http
For me, the above yields:
uri: https://firestore.googleapis.com/v1/projects/${PROJECT}/databases/(default):exportDocuments?alt=json
method: POST
== headers start ==
...
b'authorization': [[REDACTED]]
== headers end ==
== body start ==
{"outputUriPrefix": "gs://${BUCKET}"}
== body end ==
==== request end ====
Another way to determine this is via Google’s APIs Explorer and determining the correct service Cloud Firestore API and version. As gcloud
uses v1
(see uri
above), I went with it too. Here’s the link to projects.database.exportDocuments
.
You can test the API call easily using APIs Explorer by plugging in the values in to the form.
NOTE I crafted the
body
before usinggcloud --log-http
and realize it uses{"outputUriPrefix": "gs://${BUCKET}"}
whereas I’d put a placeholder forcollectionIds
in my body. I’m going to remove that for clarity.
You will need a Google Cloud Storage (GCS) Bucket into which the Firestore Collection(s) and their Document(s) will be exported. I’ll leave this to you to create but, you will need to grant the Firestore managed export|import service permission to this bucket:
PROJECT="[YOUR-PROJECT]"
BUCKET="[YOUR-GCS-BUCKET]"
NUMBER=$(\
gcloud projects describe ${PROJECT} \
--format="value(projectNumber)")
EMAIL="service-${NUMBER}@gcp-sa-firestore.iam.gserviceaccount.com"
gsutil iam ch serviceAccount:${EMAIL}:admin gs://${BUCKET}
The next step is to determine the gcloud
command for gcloud scheduler jobs create http
. As always, this is clearly documented.
We need a PROJECT
and REGION
(we may also need to enable the cloudscheduler
service?).
I’m not going to recommend a cron schedule because it’s such a black art but I find crontab guru to be very useful. You can take the output from the guru as a string for your SCHEDULE
.
As I mentioned, I’d included collectionIds
in my request body but, if you want ever collection to be included, I think you should omit this field. You need to provide the GCS Bucket URL as the value for outputUriPrefix
and this should point to a GCS Bucket that you’ve created and to which you’ve granted the Firestore managed export|import service permission:
{
"outputUriPrefix": "gs://${BUCKET}"
}
NOTE Don’t forget the
gs://
prefix
Lastly, before we can create the job, we will create a Service Account. We will run the job as this Service Account and so we must also grant it suitable IAM permissions. I’m assigning the Account datastore.importExportAdmin
from the predefined roles:
NOTE Yes
datastore
(notfirestore
) as Datastore underpins Firestore.
ACCOUNT="[YOUR-SERVICE-ACCOUNT]"
EMAIL="${ACCOUNT}@${PROJECT}.iam.gserviceaccount.com"
gcloud iam service-accounts create ${ACCOUNT} \
--project=${PROJECT}
gcloud projects add-iam-policy-binding ${PROJECT} \
--member=serviceAccount:${EMAIL} \
--role=roles/datastore.importExportAdmin
JOB="exporter"
PROJECT="[YOUR-PROJECT]"
REGION="[YOUR-REGION]"
ROOT="https://firestore.googleapis.com/v1"
NAME="projects/${PROJECT}/databases/(default)"
ENDPOINT="${ROOT}/${NAME}:exportDocuments"
BUCKET="[YOUR-GCS-BUCKET]"
BODY="{\"outputUriPrefix\":\"gs://${BUCKET}\"}"
ACCOUNT="[YOUR-SERVICE-ACCOUNT]"
EMAIL="${ACCOUNT}@${PROJECT}.iam.gserviceaccount.com"
gcloud scheduler jobs create http ${JOB} \
--location=${REGION} \
--schedule="${SCHEDULE}" \
--uri="${ENDPOINT}" \
--http-method=post \
--message-body="${BODY}" \
--oauth-service-account-email=${EMAIL} \
--oauth-token-scope="https://www.googleapis.com/auth/datastore"
--project=${PROJECT}
Issuing the above commands, schedules the job. We can then list
and describe
the job:
gcloud scheduler jobs list \
--project=${PROJECT} \
--location=${REGION}
gcloud scheduler jobs describe ${JOB} \
--project=${PROJECT} \
--location=${REGION}
NOTE I was expecting
describe
to include details of runs of the job but it appears to not do so.
You can view your job’s logs:
FILTER="
resource.type=\"cloud_scheduler_job\"
resource.labels.job_id=\"${JOB}\"
resource.labels.location=\"${REGION}\"
"
gcloud logging read "${FILTER}" \
--project=${PROJECT} \
--format=json
And, importantly, check that objects are being written to the GCS bucket:
gsutil ls -r gs://${BUCKET}
And, most importantly, I encourage you to create a test project, enable Firestore in it and import one of the backups to ensure that it restores correctly.
That’s all!