skip to Main Content

I have ingested XML file in Postgresql DB and I trying to perform XPATH operation on the field with date comparison but it is not working. From Postgresql official site I found out that Postgresql uses Xpath1 and Xpath1 does not support date comparison. I tried changing my Xpath query to use string based comparison but still it is not matching.

Input Xpath:

//val:key[@modifiedDate>"2024-06-06"]

Since Direct date comparison is not available in Xpath1 I tried String based comparison.

Converted Xpath:

//val:key[substring(@modifiedDate, 1, 10) > "2024-06-06"]

But this XPath is also not matching even though there is a matching Xpath. How should I frame my Xpath to match the field with date compaission.

The sample structure of XML ingested in table.

<root>
    <val:key modifiedDate="2024-06-07T12:34:56">Key1</val:key>
    <val:key modifiedDate="2024-06-05T08:21:00">Key2</val:key>
    <val:key modifiedDate="2024-06-10T14:22:33">Key3</val:key>
    <val:key modifiedDate="2024-06-06T09:00:00">Key4</val:key>
</root>

The expected result is we should be able to match this XML becuase it xpaths <val:key modifiedDate="2024-06-07T12:34:56">Key1</val:key> and <val:key modifiedDate="2024-06-10T14:22:33">Key3</val:key>, as their modifiedDate attributes start with dates later than "2024-06-06".

SQL Query for reproducing the problem:

CREATE TABLE xml_data (
    id SERIAL PRIMARY KEY,
    document XML
);

INSERT INTO xml_data (document)
VALUES ('<root>
            <val:key modifiedDate="2024-06-07T12:34:56">Key1</val:key>
            <val:key modifiedDate="2024-06-05T08:21:00">Key2</val:key>
            <val:key modifiedDate="2024-06-10T14:22:33">Key3</val:key>
            <val:key modifiedDate="2024-06-06T09:00:00">Key4</val:key>
        </root>');

Query to filter data:

SELECT 
    id, document
FROM 
    xml_data
WHERE 
    xpath_exists('//val:key[substring(@modifiedDate, 1, 10) > "2024-06-06"]', document);

2

Answers


  1. Considering that

    XPath 1.0 has no ordering comparison operator for strings. Both "cat"
    < "dog" and "cat" > "dog" are false, because each is a numeric
    comparison of two NaNs.

    You could access the values on your XML document and cast to date or timestamp and then make the comparison, this way (fixing a missing namespace xmlns:val="http://some.uri&quot; to make it valid):

    select key, modified_date from
    xmltable(xmlnamespaces('http://some.uri' as val),
    '//val:key'
    passing (select document from xml_data) 
    columns key varchar path 'text()',  
    modified_date timestamp path '@modifiedDate')
    where modified_date>'2024-06-06';
    

    Resulting

    key modified_date
    Key1 2024-06-07 12:34:56.000
    Key3 2024-06-10 14:22:33.000
    Key4 2024-06-06 09:00:00.000

    Fiddle to test

    Login or Signup to reply.
  2. As Pepe N O points out, using the > operator in your XPath expression will convert the date operands from strings into numbers, but since both operands contain non-numeric characters, the conversion will produce NaN in both cases.

    However, you could modify your XPath expression to use the XPath translate function to convert your dates into purely numerical values.

    For example the expression translate('2024-06-07T12:34:56', '-T:', '') equals "20240607123456".

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search