I want to read in a polars dataframe from a json string containing dates in the standard iso-format "yyyy-mm-dd".
When I try to read the string in and set the dtype of the date column witheither schema
or schema_override
this results in only NULL values.
MRE
from datetime import datetime, timedelta
from io import StringIO
import polars as pl
# Generate a list of dates
start_date = datetime.today()
dates = [start_date + timedelta(days=i) for i in range(100)]
date_strings = [date.strftime("%Y-%m-%d") for date in dates]
# Create a Polars DataFrame
df = pl.DataFrame({"dates": date_strings})
df_reread = pl.read_json(
StringIO(df.write_json()),
schema_overrides={"dates": pl.Date},
)
output of print(df_reread)
Error
shape: (100, 1)
┌───────┐
│ dates │
│ --- │
│ date │
╞═══════╡
│ null │
│ null │
│ null │
│ null │
│ null │
│ … │
│ null │
│ null │
│ null │
│ null │
│ null │
└───────┘
Question
Is there anyway to correctly read in the Date dtype from a json string?
2
Answers
pl.read_json() doesn’t automatically infer or correctly handle the Date type when reading from JSON, as JSON doesn’t natively support date types. Dates are typically represented as strings in JSON,you can explicitly cast the column to Date after reading the JSON. try this
After having a bit of a play around, it looks like unfortunately dates being read from a JSON file have a bit of a quirk. It seems to me that currently they must be written in days since the unix epoch (which is how Polars internally represents dates) for things to work as you expect.
I have raised this feature request on their github to hopefully get that improved.
In the mean time,
or do as you say with
df_reread.with_columns(pl.col("dates").cast(pl.Date)