MongoDB time series collection fails to retrieve data when queried by timestamp

Tim
November 7, 2024
47 views
0 votes
2 Answers

I’m using MongoDB 7 to store data in a time series collection. When I find documents between a start and end some documents I’m sure are there (I can find them by _id) don’t get retrieved, so I ran .validate() on the collection which returned a warning:

Detected one or more documents in this collection incompatible with time-series specifications. For more info, see logs with log id 6698300.

Looking at the logs, I see messages like:

{
  "t": { "$date": "2024-10-28T20:27:43.194+00:00" },
  "s": "W",
  "c": "STORAGE",
  "id": 6698300,
  "ctx": "conn193873",
  "msg": "Document is not compliant with time-series specifications",
  "attr": {
    "namespace": "mydb.system.buckets.time_series_minute",
    "recordId": "6467198e00ee59c36ca9d9d489",
    "reason": {
      "code": 53,
      "codeName": "InvalidIdField",
      "errmsg": "Mismatch between the embedded timestamp 2024-10-24T00:00:00.000+00:00 in the time-series bucket '_id' field and the timestamp 2024-10-24T04:00:00.000+00:00 in 'control.min' field."
    }
  }
}

I can fix the issue by dumping to JSON then re-importing:

mongoexport --db="timeSeriesTestDB" --collection="time_series_minute" --out="time_series.json" --jsonArray
mongoimport --db="new_db" --collection="time_series_minute" --file="time_series.json" --jsonArray

Then .validate() returns no warnings and my queries return what they should, seemingly confirming that it’s not an issue with the data or query but with how MongoDB internally buckets data — apparently a mismatch between the buckets control.min and the documents timestamp? How does that happen?

The issue can be reproduced by simply spinning up a new MongoDB instance then inserting data:

docker run --name test -p 27017:27017 -d mongo:7
docker exec -it test mongosh

Then in mongosh paste this script:

// Switch to or create a new database for testing
use timeSeriesTestDB;

// Drop the collection if it exists (for repeatability)
db.time_series_minute.drop();

// Create the time-series collection
db.createCollection("time_series_minute", {
  timeseries: {
    timeField: "timestamp",
    metaField: "metadata",
    granularity: "minutes"
  }
});

// Total number of data points to insert
const totalDataPoints = 100_000_000;

// Number of data points per batch
const batchSize = 10_000; // Adjust batch size based on available memory

// Total number of batches
const totalBatches = Math.ceil(totalDataPoints / batchSize);

// Start time for data generation
const startTime = new Date();

// Interval between data points in seconds
const intervalSeconds = 60; // One data point per minute

// Function to generate data batches with unique metadata every 1000 documents
function generateDataBatch(batchNumber, batchSize, startTime, intervalSeconds) {
  const dataBatch = [];
  const batchStartIndex = batchNumber * batchSize;
  for (let i = 0; i < batchSize; i++) {
    const index = batchStartIndex + i;
    // Break if we've reached the total data points
    if (index >= totalDataPoints) break;

    // Generate a unique metadata value every 1000 documents
    const metadataValue = `group_${Math.floor(index / 1000)}`;

    dataBatch.push({
      timestamp: new Date(startTime.getTime() + index * intervalSeconds * 1000),
      value: Math.random() * 100,
      metadata: metadataValue
    });
  }
  return dataBatch;
}

// Insert data in batches
for (let batchNumber = 0; batchNumber < totalBatches; batchNumber++) {
  // Generate the data batch
  const dataBatch = generateDataBatch(batchNumber, batchSize, startTime, intervalSeconds);

  // Insert the data batch into the collection
  db.time_series_minute.insertMany(dataBatch);

  // Log progress every 10 batches
  if (batchNumber % 10 === 0) {
    print(`Inserted batch ${batchNumber + 1} of ${totalBatches}`);
  }
}

// Validate the collection
const validationResults = db.time_series_minute.validate({ full: true });

// Output the validation results
printjson(validationResults);

After running the script the validation should fail and some documents won’t be found when querying by timestamp (even though they can be found when querying by _id or even a different range). For such a glaring bug there doesn’t seem to be anything online about this but the time series collection type doesn’t seem stand up to even the most basic tests e.g. inserting then later retrieving data. Is there something I’m missing about time series collections, or is the feature fundamentally broken in MongoDB?

Tags: mongodb time-series

Answers

Chosen as BEST ANSWER
- Tim
- November 7, 2024 at 10:23 pm
- 0 votes
0
This is an open bug in MongoDB. I opened a ticket which was closed as a duplicate of SERVER-87065, SERVER-86451 SERVER-88043.

(Edit)

- DiegoFreniche
- October 30, 2024 at 6:51 pm
- 0 votes
0
Looking at your code, you’re defining your Time Series collection as:
```
db.createCollection("time_series_minute", {
  timeseries: {
    timeField: "timestamp",
    metaField: "metadata",
    granularity: "minutes"
  }
});
```
So your metaField is metadata. This is the field used to create buckets of data in your Time Series collection.

The problem is you’re just creating one bucket, as all documents have the same value for metadata -> "initial". But the max size for a bucket is 1000 measurements.

You set this value here:
```
dataBatch.push({
      timestamp: new Date(startTime.getTime() + index * intervalSeconds * 1000),
      value: Math.random() * 100,
      metadata: "initial"
});
```
So looking at your code, the problem seems to be this metaField . A good example of a metaField is:
```
{
   timestamp: ISODate("2021-05-18T00:00:00.000Z"),
   metadata: { sensorId: 5578, type: 'temperature' },
   temp: 12,
   _id: ObjectId("62f11bbf1e52f124b84479ad")
}
```
In your example, the bucket id could be a good metadata field
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.