skip to Main Content

I’d like to generate a html output of a dataframe with thousand separators in the output.
However pl.Config does not seem to do anything:

import polars as pl

df = pl.DataFrame({'prod':['apple','banana','melon'], 'price':[7788, 1122, 4400]})
with pl.Config(
    thousands_separator=" "
):
    html = '<html><body><table><tr><th>Product</th><th>Price</th></tr>'
    html += ''.join(df.with_columns((pl.lit('<tr><td>')+
                                     pl.col('prod')+
                                     pl.lit('</td><td class="right">')+
                                     pl.col('price').cast(pl.String)+
                                     pl.lit('</td></tr>')
                                     ).alias('x')
                                    )
                      .get_column('x')
                      .to_list()
                    )
    html += '</table></body></html>'
    print(html)

2

Answers


  1. thousands_separator is only applied to numeric columns.

    with pl.Config(thousands_separator=','):
        pl.select(int = pl.lit(1000), str = pl.lit('1000'))
    
    shape: (1, 2)
    ┌───────┬──────┐
    │ int   ┆ str  │
    │ ---   ┆ ---  │
    │ i32   ┆ str  │
    ╞═══════╪══════╡
    │ 1,000 ┆ 1000 │
    └───────┴──────┘
    

    But these formatting settings only come into play when you "print" the frame/series.

    You could attempt to format the string using Polars expressions.

    def to_string_with_thousands(expr):
        expr = expr.cast(pl.String)
        expr = expr.str.splitn('.', 2).struct.rename_fields(['num', 'frac'])
        return (
            expr.struct['num']
                .str.reverse()
                .str.replace_all('d{3}', '$0,')
                .str.reverse()
                .str.strip_prefix(',')
            + pl.when(expr.struct['frac'].is_not_null())
                .then('.' + expr.struct['frac'])
                .otherwise(pl.lit(''))
        )
    

    Although it may be simpler to just do it outside of Polars using regular Python.

    # Output

    df = (
       pl.select(
          pl.lit(''.join(['1'] + ['0'] * N)).cast(int).alias(str(N)) 
          for N in range(10)
       )
       .melt()
    )
    
    (df.with_columns(float = pl.col('value') + 1.234567)
       .with_columns(
          pl.col('value', 'float').pipe(to_string_with_thousands)
            .name.suffix('_str')
       )
    )
    
    shape: (10, 5)
    ┌──────────┬────────────┬───────────────┬──────────────────────┬───────────────┐
    │ variable ┆ value      ┆ float         ┆ float_str            ┆ value_str     │
    │ ---      ┆ ---        ┆ ---           ┆ ---                  ┆ ---           │
    │ str      ┆ i64        ┆ f64           ┆ str                  ┆ str           │
    ╞══════════╪════════════╪═══════════════╪══════════════════════╪═══════════════╡
    │ 0        ┆ 1          ┆ 2.234567      ┆ 2.234567             ┆ 1             │
    │ 1        ┆ 10         ┆ 11.234567     ┆ 11.234567            ┆ 10            │
    │ 2        ┆ 100        ┆ 101.234567    ┆ 101.234567           ┆ 100           │
    │ 3        ┆ 1000       ┆ 1001.234567   ┆ 1,001.234567         ┆ 1,000         │
    │ 4        ┆ 10000      ┆ 10001.234567  ┆ 10,001.234567        ┆ 10,000        │
    │ 5        ┆ 100000     ┆ 100001.234567 ┆ 100,001.234567       ┆ 100,000       │
    │ 6        ┆ 1000000    ┆ 1.0000e6      ┆ 1,000,001.234567     ┆ 1,000,000     │
    │ 7        ┆ 10000000   ┆ 1.0000e7      ┆ 10,000,001.234567    ┆ 10,000,000    │
    │ 8        ┆ 100000000  ┆ 1.0000e8      ┆ 100,000,001.234567   ┆ 100,000,000   │
    │ 9        ┆ 1000000000 ┆ 1.0000e9      ┆ 1,000,000,001.234567 ┆ 1,000,000,000 │
    └──────────┴────────────┴───────────────┴──────────────────────┴───────────────┘
    
    Login or Signup to reply.
  2. In addition to the answer provided, you are printing python strings, which Polars does not format for you regardless of type.

    with pl.Config(thousands_separator=','):
        df = pl.select(int = pl.lit(1000), str = pl.lit('1000'))
    
        string = ''.join(df.get_column("str").to_list())
        print(f"{type(string) = }, {string = }")
        # type(string) = <class 'str'>, string = '1000'
    
        integer = df.get_column("int").item()
        print(f"{type(integer) = }, {integer = }")
        # type(integer) = <class 'int'>, integer = 1000
    

    Here’s a pure Python solution using f"{number:,}" to add a comma as a thousands separator then replacing it with a space. This may not be what you are after, but I figure I’d put it out there.

    data = {"prod": ["apple", "banana", "melon"], "price": [7788, 1122, 4400]}
    
    html_template = (
        "<html><body><table><tr><th>Product</th><th>Price</th></tr>{}</table></body></html>"
    )
    table_data = "".join(
        f'<tr><td>{prod}</td><td class="right">{price:,}</td></tr>'.replace(",", " ")
        for prod, price in zip(data["prod"], data["price"])
    )
    
    print(html_template.format(table_data))
    

    Polars does have a format function, but it does not support Python format specifiers (e.g., f"{1000:,.2f}")

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search