I’m working with 2 Firestore collections:
- "sold-items", where each document has two fields "sellerId" (sales person’s id) and "buyerId" (buyer’s id).
- and "sellers-xyz", where each seller has "userId".
I need to retrieve (for an API call) all unique buyers from "sold-items" for each seller, and as "sold-items" collection is very large (approx. 500000 documents), I need to apply pagination. My initial approach was to read "sold-items" where "sellerId" equals "userId" (from "sellers-xyz") and get "buyerId", but how can I retrieve only unique buyers?
For example – API for the 1st page returns 20 unique buyerIds, and then the user goes to the 20th page, as API is stateless, there is no way for me to know which "buyerIds" were already returned previously, in which case I may return the same "buyerIds" over and over again.
I can’t seem to figure out the best optimal solution for this task, any advice is greatly appreciated.
2
Answers
What you’re describing sounds like a sort-of group-by clause, which Firestore doesn’t support.
If you want Firestore to return unique buyer IDs, you will need to store unique buyer IDs somewhere. That’s pretty much the nature of all Firestore operations, and is one of the reasons it scales so well.
If you don’t want to store the (additional) data to allow tracking unique buyer IDs, consider using a database with stronger querying capabilities that match your requirements.
As @FrankvanPuffelen mentioned in his answer:
For that, I recommend you use one of the following solutions.
Store all buyer IDs inside an array in a document in Firestore. That means that all buyer IDs will be by definition unique because when you’re using the
array-union
operator, there is no way you can add duplicate IDs.Store all buyer IDs in the Firebase Realtime Database. Since a node is represented by a key-value pair, you can set the key as a buyer ID and the value as boolean true. That means that you’ll always have unique buyer IDs because if you try to create a key with a buyer ID that already exists, that key will be overwritten. Besides that, the write operation in the Realtime Database is free of charge.
Also remember that, denormalization is a quite common practice when it comes to NoSQL databases like Firestore or the Realtime Database.