I am thinking about how to model a collection where each document is a building with a geolocation. I know I should use Geohashes, but what I am worried about is that I would get for each query dozens (if not hundreds) of documents read.
Would using a single document as a cluster of dozens of buildings be a bad idea? Is there a better solution to this problem?
2
Answers
According to the official documentation:
If you think that you’ll have a large number of document reads, then you should consider studying the Realtime Database, which has a different billing mechanism.
No, as you long as stay below the maximum 1 MiB limitation.
There is no "perfect", "the best" or "the correct" solution for structuring a Cloud Firestore database. In your case, you have to do some maths in order to see which solution out of the above three fits best your needs.
I have at some point experimented with storing multiple geohash values into a single document, with the document ID being the common prefix of those hashes and then for each point we keep the exact lat/lon for the corresponding point and its document ID.
Doing this for all documents, you end up with multiple additional documents (I called them geo-index documents) with extra metadata.
Here’s an example of such a document:
So here, the common prefix is
cu
, meaning that the document contains the metadata for all geohashes starting withcu
.To execute a geoquery, I’d then calculate the geohash ranges that contained potential matching documents, read the geo-index documents for those ranges, and qualify/disqualify the actual documents based on the lat/lon from the geo-index documents.
With this approach I was able to drastically reduce the number of documents that had to be read to determine the matches. But on the other hand, it required me create new data structures, essentially building my own index type for the database.
You’ll have to determine whether that trade-off of complexity vs cost is worth it for your use-case and other requirements.