skip to Main Content

In a collection, approximately 1000 records are written per day per operating user (approximately 10 operating users in the field), I have a query that obtains the records made during the day in order to keep track of the points covered by mobile users in the field:

const lastActividadGPS = await firestore
          .collection("Actividad_GPS")
          .where("instancia", "==", instancia) //instancia
          .where("user_id", "==", parseInt(ejecutivo.user_id))
          .where("fecha_hora", ">=", dayStart.toDate())
          .where("fecha_hora", "<=", dayEnd.toDate())
          .orderBy("fecha_hora", "desc")
          .get();

The collection:

enter image description here

It happens that when I have 100K+ of documents accumulated in the collection and 1-3 users make queries (they execute this query internally from the web platform) up to 30M readings are made per day, but if I delete all the data it normalizes and I have up to 50K of readings per day. I have not been able to identify the problem with the excessive readings – when the collection has a lot of data – and I do not know if firebase at the time of sorting reads all the documents and then filters it.

We also tried including limit in the query but it didn’t work.

2

Answers


  1. I do not know if firebase at the time of sorting reads all the documents and then filters it

    Firestore queries execute on indexes that you define, only allows queries that it can actually execute on those indexes, and only read (and charge document reads and bandwidth) for documents that are matched by those conditions in your query.

    So if you are charged for more reads than you expect, it’s because those documents are actually matched in the queries you execute.

    Login or Signup to reply.
  2. I can tell you that time series data deserves special handling. It boils down to hardware utilization, how data is stored and accessed at the physical layer. Mongo talks about it. BTW I’m not saying that Firebase/Firestore isn’t appropriate for your needs, just sharing that time series records get special attention / modeling.

    Benefits

    Compared to normal collections, storing time series data in time series collections improves query efficiency and reduces the
    disk usage for time series data and secondary indexes.

    Time series collections use an underlying columnar storage format and
    store data in time-order with an automatically created clustered
    index. The columnar storage format provides the following benefits:

    • Reduced complexity for working with time series data

    • Improved query efficiency

    • Reduced disk usage

    • Reduced I/O for read operations

    • Increased WiredTiger cache usage

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search