So I need to extract numbers from a varchar column.
create table test (USER_INPUT varchar);
insert into test (USER_INPUT)
values ('222 222'),
('Foo: 333333'),
('Bar blah blah 444444');
But when I try to:
select substring(USER_INPUT, '[ds]+') from test;
The number in the third row does not get parsed. What am I missing?
The output looks like this:
1 222 222
2 333333
3
2
Answers
The issue is your third row has a whitespace prior to the numbers, so that is what is returned in your final result — a single whitespace.
To change the behavior, insist on at least one digit be present:
This means optional whitespace, followed by a digit, followed by any number of whitespace and digits (including none).
That is going to match the first sequence in the string which consists of a non-empty run of digits and spaces. In the "bad" example the first such sequence is the single space (a sequence of one space and zero digits) between ‘Bar’ and ‘blah’, so that is what gets returned.
To force the sequence to have at least one digit, you could use ‘d[ds]*’, but then that would cut off any preceding spaces. If you need to retain those, then ‘s*d[ds]*’