skip to Main Content

I am trying to filter out data that is stored as strings but represents numeric values.
I filter out non-numeric strings then convert to a float. I am then trying to filter values above or below a specific amount. Below is a simplified version of my query that works with the filters disabled

WITH attributes AS (

    SELECT
        property_id,
        CASE WHEN regexp_replace(value, '[^0-9.]', '')='' THEN
            null
        WHEN regexp_replace(value, '[^0-9.]', '') = '.' THEN NULL
        ELSE
            regexp_replace(value, '[^0-9.]', '')::float
        END AS sqft, 'platform' FROM mv_prop_attributes
        WHERE
            display_name = 'Livable Area'
            and sqft is not null
            and length(sqft)>0
)

    SELECT
            *
        FROM
            attributes
        WHERE true
--          and sqft < 100000 --works
--          and sqft>200 --works
--          and sqft < 100000 and sqft>200 --does not work
--          and sqft between 200 and 100000 --does not work

As noted in the commented portions I can filter above or below a value but not a range. Attempts to do so produce the following error:
ERROR: Invalid digit, Value ‘.’, Pos 0, Type: Double
DETAIL:

error: Invalid digit, Value ‘.’, Pos 0, Type: Double
code: 1207
context: .
query: 438646641
location: :0
process: query0_117_438646641 [pid=0]

This is running on an up to date redshift cluster. Thank you for any suggestions on how to resolve this at the DB level.

2

Answers


  1. I think the problem is likely that you didn’t account for a period (vs. a decimal point). The string "one thousand square feet." would come through your SQL as a single ‘.’ which cannot be converted to a float.

    I suspect that you aren’t getting the error with inequality as Redshift can push down WHERE clauses where it can (but this case seems extreme). Inequality can be applied to string – ‘1’ < ‘2’ – but this doesn’t work the same for ranges. This could be letting the query run but is a long shot.

    ==================================================================

    A way to test if the source data has extra ‘.’s in it would be to run:

    SELECT count(1) 
    FROM mv_prop_attributes
    WHERE value like '%.%.%';
    
    Login or Signup to reply.
  2. Try regexp_replace(value, ‘[^0-9.]’, ”,1,0,’g’) – replace all non numeric symbols and ‘.’, not only first.

    WITH attributes AS (
    
        SELECT
            property_id,
            CASE WHEN regexp_replace(value, '[^0-9.]', '',1,0,'g')='' THEN
                null
            WHEN regexp_replace(value, '[^0-9.]', '',1,0,'g') = '.' THEN NULL
            ELSE
                regexp_replace(value, '[^0-9.]', '',1,0,'g')::float
            END AS sqft, 'platform' FROM mv_prop_attributes
            WHERE
                display_name = 'Livable Area'
                and sqft is not null
                and length(sqft)>0
    )
    
        SELECT
                *
            FROM
                attributes
            WHERE true
    --          and sqft < 100000 --works
    --          and sqft>200 --works
              and sqft < 100000 and sqft>200 --does not work
              and sqft between 200 and 100000 --does not work
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search