I have a H2 database (version 2.2.220) with data in a JSON
column; I need to migrate the column via a SQL script so that the data is in a conventional relational database structure. The main stumbling block is that on extraction of a JSONValue string value (surrounded by double quotes in JSON format), I am struggling to convert this into a VARCHAR
value that does not include the surrounding double quotes.
I hope the below SQL statements illustrate the issue in the most rudimentary way:
-
create table my_table ( id int generated by default as identity, my_json JSON null default null, ); insert into my_table (my_json) values (JSON '"abcdef"');
The double quotes inside the string constant is necessary, else it is not recognized as valid JSON.
- 1st query (naive):
select my_json from my_table;
This returns
"abcdef"
with quotes included in the string. I would however like the raw string valueabcdef
to be returned without the surrounding quotes. - 2nd query (typecasting):
select CAST(my_json as VARCHAR(255)) from my_table;
This still returns
"abcdef"
with quotes included in the string. - 3rd query ("brute-force" trimming):
select BTRIM(my_json, '"') from my_table;
This achieves the desired result
abcdef
, but is problematic. One way this will cause a bug is when the string value should intentionally include a double quote at the beginning and/or end, e.g.xyz"
(JSON '"xyz""'
) – this will be trimmed to returnxyz
.
I am wondering if there is a "recommended"/cleaner way to do this conversion that is less bug-prone.
2
Answers
Im afraid that the safest method to achieve this in the H2 database is by using a user-defined function (UDF). Unfortunately, there’s no direct way to remove those quotes using H2’s built-in functions, as the H2 database doesn’t allow for querying individual JSON values directly. H2 stores JSON data as plain strings. So, when you save
"abcdef"
into a table (whether you useJSON '"abcdef"'
, or'abcdef'::JSON
), the queryselect my_json from my_table;
always returns the entire JSON object (which is"abcdef"
) rather than just the value (abcdef
), just like if you would query for a JSON object (for exampleselect JSON '{"foo":"bar"}';
), there is no native way to extract thebar
value from thefoo
field.So if you need an individual value, I recommend creating a UDF like this:
And then you can call it like this:
I wouldn’t recomend using
BTRIM
, as it removes all leading and trailing characters, not just the outermost ones. For example, in a rare case where you want to store a value with quotes, BTRIM would remove too much:Use
regexp_replace()
to delete (replace with blank) quotes but only at the start or end of the value:Regex breakdown:
^
means "start of input"|
means "logical OR" (an alternation in regex speak)$
means "end of input"