skip to Main Content

I would like to extract number/float value from this code using Xidel:

<p class="price">
    <span class="woocommerce-Price-amount amount">
        <bdi>
            304.00
            <span class="woocommerce-Price-currencySymbol">
                €
            </span>
        </bdi>
    </span>
</p>

I am trying the following command:

xidel -s '<p class="price"><span class="woocommerce-Price-amount amount"><bdi>304.00 <span class="woocommerce-Price-currencySymbol">€</span></bdi></span></p>' -e "//p[@class='price']/translate(normalize-space(substring-before(., '€')),' ','')"

The translate command should replace space, but it’s not working, in the output I still see one space after number "304.00_".

2

Answers


  1. Try changing the xpath expression to

    -e  "substring-before(//p[@class='price']//bdi/normalize-space(.),' ')"
    

    or

     -e "substring-before(//p[@class='price']//bdi/.,' ')"
    

    or use tokenize()

     -e "tokenize(//p[@class='price']//bdi/.,' ')[1]"
    

    The output should be

    '304.00'
    
    Login or Signup to reply.
  2. You’re going to have to process the no-break space separately with one of the following queries:

    -e "//p[@class='price']/span/bdi/substring-before(text(),'&#160;')"
    -e "//p[@class='price']/span/bdi/translate(text(),x:cps(160),'')"
    -e "//p[@class='price']/span/bdi/replace(text(),'&#xA0;','')"
    

    You can’t use normalize-space(), because…

    https://www.w3.org/TR/xpath-functions-31/#func-normalize-space:

    The definition of whitespace is unchanged in [Extensible Markup Language (XML) 1.1 Recommendation]. It is repeated here for convenience:

    S ::= (#x20 | #x9 | #xD | #xA)+
    

    …it processes spaces, tabs, carriage returns and line feeds, but not no-break spaces:

    xidel -s "<x>   test   </x>" -e "x'[{x}]'"
    [   test   ]
    
    xidel -s "<x>   test   </x>" -e "x'[{normalize-space(x)}]'"
    [test]
    
    xidel -s "<x>&nbsp;&nbsp;&nbsp;test&nbsp;&nbsp;&nbsp;</x>" -e "x'[{x}]'"
    [   test   ]
    
    xidel -s "<x>&nbsp;&nbsp;&nbsp;test&nbsp;&nbsp;&nbsp;</x>" -e "x'[{normalize-space(x)}]'"
    [   test   ]
    
    xidel -s "<x>&nbsp;&nbsp;&nbsp;test&nbsp;&nbsp;&nbsp;</x>" -e "x'[{translate(x,'&#160;','')}]'"
    xidel -s "<x>&nbsp;&nbsp;&nbsp;test&nbsp;&nbsp;&nbsp;</x>" -e "x'[{replace(x,x:cps(160),'')}]'"
    xidel -s "<x>&nbsp;&nbsp;&nbsp;test&nbsp;&nbsp;&nbsp;</x>" -e "x'[{replace(x,'&#xA0;','')}]'"
    [test]
    

    Btw, an alternative to get the price on that website:

    xidel -s "https://kenzel.sk/produkt/bicykle/zivotny-styl/signora/" -e ^"^
      parse-json(^
        //body/script[@type='application/ld+json']^
      )//priceSpecification/price^
    "
    304.00
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search