I have a DF
column which is a long strings with comma separated values, like:
2000,2001,2002:a,2003=b,2004,100,101,500,20
101,102,20
What I want to do is to create a new Array<Int>
column out of it where:
- only values starting with 2 are included
- when a value has additional delimiter then only the first part will be returned (e.g. 2002)
- some specific values will be excluded (let’s say value = 20)
- if the array is empty it should be filled with a default value (let’s say [199])
So basically the 2 test strings should be returned as:
stringColumn | arrayColumn |
---|---|
2000,2001,2002:a,2003=b,2004,100,101,500,20 | [2000,2001,2002,2003,2004] |
101,102,20 | [199] |
2
Answers
Just divide the problem into subproblems:
a) split string to substrings
b) try to find the number in every substring
c) if number is in substring then check if it meets the conditions
e.g.:
I would prefer a custom user defined function to extract and filter values. Here is an approach:
Result