Postgresql - Redshift/Postgres between function produces seemingly unrelated error when ">" or "<" both work but not together

darthtye
April 1, 2024
217 views
0 votes
2 Answers

I am trying to filter out data that is stored as strings but represents numeric values.
I filter out non-numeric strings then convert to a float. I am then trying to filter values above or below a specific amount. Below is a simplified version of my query that works with the filters disabled

WITH attributes AS (

    SELECT
        property_id,
        CASE WHEN regexp_replace(value, '[^0-9.]', '')='' THEN
            null
        WHEN regexp_replace(value, '[^0-9.]', '') = '.' THEN NULL
        ELSE
            regexp_replace(value, '[^0-9.]', '')::float
        END AS sqft, 'platform' FROM mv_prop_attributes
        WHERE
            display_name = 'Livable Area'
            and sqft is not null
            and length(sqft)>0
)

    SELECT
            *
        FROM
            attributes
        WHERE true
--          and sqft < 100000 --works
--          and sqft>200 --works
--          and sqft < 100000 and sqft>200 --does not work
--          and sqft between 200 and 100000 --does not work

As noted in the commented portions I can filter above or below a value but not a range. Attempts to do so produce the following error:
ERROR: Invalid digit, Value ‘.’, Pos 0, Type: Double
DETAIL:

error: Invalid digit, Value ‘.’, Pos 0, Type: Double
code: 1207
context: .
query: 438646641
location: :0
process: query0_117_438646641 [pid=0]

This is running on an up to date redshift cluster. Thank you for any suggestions on how to resolve this at the DB level.

Answers

- BillWeiner
- March 30, 2024 at 6:43 am
- 0 votes
0
I think the problem is likely that you didn’t account for a period (vs. a decimal point). The string "one thousand square feet." would come through your SQL as a single ‘.’ which cannot be converted to a float.

I suspect that you aren’t getting the error with inequality as Redshift can push down WHERE clauses where it can (but this case seems extreme). Inequality can be applied to string – ‘1’ < ‘2’ – but this doesn’t work the same for ranges. This could be letting the query run but is a long shot.

==================================================================

A way to test if the source data has extra ‘.’s in it would be to run:
```
SELECT count(1) 
FROM mv_prop_attributes
WHERE value like '%.%.%';
```
Login or Signup to reply.

Try regexp_replace(value, ‘[^0-9.]’, ”,1,0,’g’) – replace all non numeric symbols and ‘.’, not only first.

WITH attributes AS (

    SELECT
        property_id,
        CASE WHEN regexp_replace(value, '[^0-9.]', '',1,0,'g')='' THEN
            null
        WHEN regexp_replace(value, '[^0-9.]', '',1,0,'g') = '.' THEN NULL
        ELSE
            regexp_replace(value, '[^0-9.]', '',1,0,'g')::float
        END AS sqft, 'platform' FROM mv_prop_attributes
        WHERE
            display_name = 'Livable Area'
            and sqft is not null
            and length(sqft)>0
)

    SELECT
            *
        FROM
            attributes
        WHERE true
--          and sqft < 100000 --works
--          and sqft>200 --works
          and sqft < 100000 and sqft>200 --does not work
          and sqft between 200 and 100000 --does not work

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Redshift/Postgres between function produces seemingly unrelated error when ">" or "<" both work but not together

As noted in the commented portions I can filter above or below a value but not a range. Attempts to do so produce the following error: ERROR: Invalid digit, Value ‘.’, Pos 0, Type: Double DETAIL:

error: Invalid digit, Value ‘.’, Pos 0, Type: Double code: 1207 context: . query: 438646641 location: :0 process: query0_117_438646641 [pid=0]

Answers

As noted in the commented portions I can filter above or below a value but not a range. Attempts to do so produce the following error:
ERROR: Invalid digit, Value ‘.’, Pos 0, Type: Double
DETAIL:

error: Invalid digit, Value ‘.’, Pos 0, Type: Double
code: 1207
context: .
query: 438646641
location: :0
process: query0_117_438646641 [pid=0]