Firestore Export & Import
- 3 minutes read - 579 wordsI’m using Firestore to maintain state in my “thing”.
In an attempt to ensure that I’m able to restore the database, I run (Cloud Scheduler) scheduled backups (see Automating Scheduled Firestore Exports and I’ve been testing imports to ensure that the process works.
It does.
I thought I’d document an important but subtle consideration with Firestore exports (which I’d not initially understood).
Google facilitates that backup process with the sibling commands:
I’ve been exporting to a bucket (${BUCKET}
) and not specifying --collection-ids
. The command says that, when this flag is omitted “all collections are included” but, importantly, while true it also means that you can only import all collections too.
BUCKET=...
PREFIX="2022-11-10T10:23:00.00000"
OUTPUT="${BUCKET}/${PREFIX}"
gcloud storage ls gs://${OUTPUT} \
--recursive
NOTE Google is (finally!) adding Cloud Storage functionality to
gcloud
and there’s now astorage
command group which is functionally equivalent to a subset ofgsutil
and saves some of the impedance mismatch in having to usegcloud
for everything other than storage. I wanted to be able to usegcloud storage objects list
instead of the quirkygcloud storage ls
in the above butgcloud storage objects list
doesn’t work (correctly).
Here’s an example of the objects created:
{OUTPUT}/{PREFIX}.overall_export_metadata
{OUTPUT}/all_namespaces/all_kinds/all_namespaces_all_kinds.export_metadata
{OUTPUT}/all_namespaces/all_kinds/output-0
There are 3 files:
*.overall_export_metadata
(1)all_namespaces/all_kinds/*
(2)
If you then try to gcloud firestore import ${OUTPUT}
, the command will succeed but you cannot gcloud firestore import ${OUTPUT} --collection-ids={COLLECTION-FOO}
.
Google documents the reason for this under Export Data:
Note: You must export specific collections if you plan to:
- Import only specific collections
Here’s an example exporting (all) collections using the flag:
gcloud firestore export gs://${OUTPUT} \
--collection-ids={FOO},{BAR},...{BAZ}
Repeating the previous gcloud storage
command against this export yields differently structured output:
{OUTPUT}/{PREFIX}.overall_export_metadata
{OUTPUT}/all_namespaces/kind_{FOO}/all_namespaces_kind_{FOO}.export_metadata
{OUTPUT}/all_namespaces/kind_{FOO}/output-0
{OUTPUT}/all_namespaces/kind_{BAR}/all_namespaces_kind_{BAR}.export_metadata
{OUTPUT}/all_namespaces/kind_{BAR}/output-1
{OUTPUT}/all_namespaces/kind_{BAZ}/all_namespaces_kind_{BAZ}.export_metadata
{OUTPUT}/all_namespaces/kind_{BAZ}/output-2
NOTE In the previous export without the
--collection-ids
flag, there was an object prefix ofall_kinds
. In this export, with the--collections-ids
flag specified, there is an object prefix for each collection. Presumably (!) this is the mechanism that then permitsgcloud firestore import
to import specific collections.
IMPORTANT CAVEAT Google’s documentation for exporting specific collections is ambiguous. It refers to “collection groups” and says that “The collection group includes all collections and subcollections (at any path) with the specified collection ID”.
If you
gcloud firestore export --collection-ids={FOO}
and{FOO}
contains subcollections, the subscollections are not exported. The documentation references[SUBCOLLECTION_ID_1]
in its example command but does not explain how a subcollection is referenced. Through trial and error, it appears that you must specify the subcollections (I’ve only tried subcollections of collections not anything deeper) as if there were root collections, i.e. if{FOO}
contains a subcollection{BAR}
, then you can:gcloud firestore exporter gs://${OUTPUT} \ --collection-ids=${FOO},${BAR}
In my “thing” Firestore database, document IDs are all computed (from other fields in the document). I don’t use autogenerated fields. It’s unclear whether this would impact the effectiveness of imports.
In a similar vein, I’ve decided to be more deterministic with the output path prefixes of gcloud firestore export
. In the above scenarios, where I’ve provided a bucket (${BUCKET}
) to the export i.e.
gcloud firestore export ${BUCKET} --collection-ids=...
, the export process constructs an object path referencing the export’s datatime. This approach ensures uniqueness but it makes it more difficult to script exports and imports.
The current format is YYYY-MM-DDTHH:MM:SS_DDDDD
I’ve decided to reduce the resolution to the day and prefer the simpler YYMMDD
format, i.e.:
PREFIX="$(date +%y%m%d)"
gcloud firestore export gs://${BUCKET}/${PREFIX} \
--collection-ids={FOO},{BAR}...{BAZ}
That’s all!