skip to Main Content

Im trying to extract an array using array_position to return the position of a specific string. Unfortunately, redshift doesn’t have this function.

2

Answers


  1. Chosen as BEST ANSWER

    To solve this problem, I have created a python UDF.

    create or replace function array_position (anyarray varchar(65535),
    anyelement varchar(65535))
    returns varchar(65535)
    stable as $$
        a_list = str(anyarray).split(',')
        find = anyelement
        indices = []
        for idx, value in enumerate(a_list):
            if value == find:
                indices.append(idx)
        return (str(indices))
    $$ language plpythonu;
    

    Code sample:

    ===>
    select array_position('1,1,2,2,1','1');
     array_position
    ----------------
     [0, 1, 4]
    (1 row)
    
    ===>
    create temp table test (col varchar(100));
    insert into test (col) values ('1,1,1,2,2,2,1');
    select array_position(col,'1') from test;
    
     array_position
    ----------------
     [0, 1, 2, 6]
    (1 row)
    

    Hope, this will help someone to simulate the array_position function in RedShift.


  2. You will need to unroll / unnest / unpivot the array and apply a WHERE clause. See: https://docs.aws.amazon.com/redshift/latest/dg/query-super.html#unnest

    Using an UDF as you propose will function file for small number of rows but Redshift tables often have very large tables and UDF do not perform well on large row counts. Calling a UDF millions of times just is too much overhead. So watch for scaling issues.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search