skip to Main Content

I have an Athena table with a string column which looks like:

+-------------------+
| employee_size     |
+-------------------+
| GREATER THAN 2000 |
+-------------------+
| 500 - 999         |
+-------------------+
| 28.00             |
+-------------------+
| unknown           |
+-------------------+
| 563               |
+-------------------+

I want to convert the column values into integer if possible, else it should be null. So, the desired output should look like:

+---------------+
| employee_size |
+---------------+
|               |
+---------------+
|               |
+---------------+
| 28            |
+---------------+
|               |
+---------------+
| 563           |
+---------------+

I tried to use combination of queries the closest I think I could get was:

SELECT 
    CASE
        WHEN employee_size LIKE '% %' THEN NULL
        WHEN employee_size LIKE '%-%' THEN NULL
        WHEN regexp_like(employee_size,'([A-Za-z]') THEN NULL
        WHEN employee_size LIKE '%.%' THEN CAST(employee_size AS decimal)
        ELSE CAST(employee_size AS integer) 
    END AS employee_size 
FROM 
    "table_name";

But this code results in an error:

INVALID_FUNCTION_ARGUMENT: end pattern with unmatched parenthesis

If you guys have faced something similar, please suggest a solution.

EDIT: I forgot to mention that if there is decimal value like 28.00 or 5.64 it should ignore whatever is there after the decimal and only have 28 or 5

2

Answers


  1. You could try the following logic:

    SELECT
        CASE WHEN REGEXP_LIKE(employee_size, '^[0-9]+(.[0-9]+)?$')
             THEN REGEXP_EXTRACT(employee_size, '^[0-9]+') END AS employee_size
    FROM yourTable;
    
    Login or Signup to reply.
  2. Are you looking for something like that?

    with
     t as (
       select 'GREATER THAN 2000' employee_size
       union all
       select '500 - 999' employee_size
       union all
       select '28.00' employee_size
       union all
       select 'unknown' employee_size
       union all
       select '563' employee_size
       )
    select cast(try_cast(employee_size as double) as integer) from t
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search