skip to Main Content

I have a table of data with multiple columns and containing a column named ‘weight’ set to VARCHAR character type. It contains a mix of characters such as ‘400’, ‘203.12’, and ‘602.35 grams’, and NULL values.

Task is to clean and validate the entire column and set character type to float and replace missing data with average column weight to 2 decimal places.

I have the following CASE statement:

CASE
    WHEN weight LIKE ' grams' 
        THEN CAST(REPLACE(weight, ' grams', '') AS FLOAT) 
    WHEN weight LIKE ' g' 
        THEN CAST(REPLACE(weight, ' g', '') AS FLOAT) 
    WHEN weight IS NULL 
        THEN AVG(weight::FLOAT)  
    ELSE 
        weight::FlOAT
END AS validated_weight,

But it generates this error:

SQL Error [22P02]: ERROR: invalid input syntax for type double precision: "602.61 grams"

The expected column data output should set to float, 2 decimal places

I using PostgreSQL 16. Support required to resolve this error.

2

Answers


  1. You are using the like expression wrong. In order to be different from an equal expression you need to have at least one wildcard in the expression you compare to. Wildcards are % standing for multiple, possibly zero arbitrary characters and _ standing for a single character.

    So your expression should be

    CASE
        WHEN weight LIKE '% grams' 
            THEN CAST(REPLACE(weight, ' grams', '') AS FLOAT) 
        WHEN weight LIKE '% g' 
            THEN CAST(REPLACE(weight, ' g', '') AS FLOAT) 
        WHEN weight IS NULL 
            THEN AVG(weight::FLOAT)  
        ELSE 
            weight::FlOAT
    END AS validated_weight
    
    Login or Signup to reply.
  2. Replacing the CASE expression with the following produces the described results:

    CASE
      WHEN weight IS NULL THEN
        ROUND(AVG(REGEXP_REPLACE(weight,
                                 '^((?:d+(?:.d*)?|.d+))s*(g(rams)?)?',
                                 '1')::NUMERIC) OVER (PARTITION BY NULL),
              2)::FLOAT
      ELSE
        REGEXP_REPLACE(weight,
                            '^((?:d+(?:.d*)?|.d+))s*(g(rams)?)?',
                            '1')::FLOAT
    END
    

    This assumes that the input values are well formed.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search