skip to Main Content

I’m trying to create my own personalized RSS Feed for an Italian institutional site. The site in question is this, belonging to an Italian municipality. The idea would be to use the PolitePol online tool, generate the RSS Feed (since the one incorporated into the site doesn’t satisfy me, not always creating new elements if there are already too many) and send everything on a Telegram channel via the RSS-to-Telegram-Bot. I correctly identified the essential elements for my post, namely Title (with td[1]), Description (with td[4]) and finally the Link (with td[5]). However, still in the Title section, I would like to concatenate other information, namely that contained in td[2] and that contained in td[5]. The code I’ve used so far has been this:

concat('Atto numero ', td[1]/child::node(), '                                                                          Tipo: ', td[2]/child::node(), '  ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀                      Data: ',td[5]/child::node())

The solution I adopted is certainly very rudimentary, but for now I haven’t found anything else to get to the bottom of; the result is obviously terrible. At this point I ask you, is it possible to make a code like this to allow me to put the various XPaths in the title but put them in separate lines?

What I would like to achieve:

Atto numero 0203/2024
Tipo: Manifesti
Data: 20/06/2024

REVISIONE SEMESTRALE DELLE LISTE ELETTORALI

What I get instead:

Atto numero 0203/2024 ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ Tipo: Manifesti ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ Data: 20/06/2024

REVISIONE SEMESTRALE DELLE LISTE ELETTORALI

I tried the code above, however the result was not the best, having poorly defined spaces and messy text.

2

Answers


  1. Try this:

    (concat("Atto numero ", //table[@class="tablesorter"]//tr[7]/td[1][string-length(normalize-space())=9]), concat("Tipo: ", //table[@class="tablesorter"]//tr[7]/td[2]), concat("Data: ", //table[@class="tablesorter"]//tr[7]/td[5]), concat("", //table[@class="tablesorter"]//tr[7]/td[4]))
    

    OR xpath 2:

    for $x in //table[@class="tablesorter"]
    return
        (
            concat("Atto numero ", $x//tr[7]/td[1][string-length(normalize-space())=9]),
            concat("Tipo: ", $x//tr[7]/td[2]),
            concat("Data: ", $x//tr[7]/td[5]),
            concat("", $x//tr[7]/td[4])
        )
    

    Result:

    Atto numero 0203/2024
    Tipo: Manifesti
    Data: 20/06/2024
    REVISIONE SEMESTRALE DELLE LISTE ELETTORALI
    

    PS: Change the numbers in the tr, td tags for accurate searches.

    Login or Signup to reply.
  2. I’m not familiar with PolitePol, but if that doesn’t work for you, then why not use an HTML/XML parser, like the command-line tool , to create your own RSS feed?

    With "direct element constructors":

    $ xidel -s "http://alboserrata.asmenet.it/index.php?sez=p" -e '(
      <rss version="2.0">
        <channel>
          <title>{//title/text()}</title>
          <link>{$url}</link>
          {for $x in //table[@class="tablesorter"]/tbody/tr[td[@style=""]] return
          <item>
            <title>{$x/concat(td[1],", ",td[2])}</title>
            <description>{$x/concat(td[3],", ",td[4])}</description>
            <pubDate>{$x/td[5]/text()}</pubDate>
            <link>{resolve-uri($x/td[7]/a/@href)}</link>
          </item>}
        </channel>
      </rss>
    )' --output-node-format=xml --output-node-indent
    

    Or with "computed constructors":

    $ xidel -s "http://alboserrata.asmenet.it/index.php?sez=p" -e '
      element rss {attribute version {"2.0"},
        element channel {
          element title {//title/text()},
          element link {$url},
          for $x in //table[@class="tablesorter"]/tbody/tr[td[@style=""]] return
          element item {
            element title {$x/concat(td[1],", ",td[2])},
            element description {$x/concat(td[3],", ",td[4])},
            element pubDate {$x/td[5]/text()},
            element link {resolve-uri($x/td[7]/a/@href)}
          }
        }
      }
    ' --output-node-format=xml --output-node-indent
    

    You can totally customize it the way you like. The output for these queries:

    <rss version="2.0">
      <channel>
        <title>Albo pretorio on-line Comune di Serrata</title>
        <link>http://alboserrata.asmenet.it/index.php?sez=p</link>
        <item>
          <title>0209/2024, Avvisi vari</title>
          <description>MINISTERO PUBBLICA ISTRUZIONE, (Esami di Stato per l&#x2019;abilitazione all&#x2019;esercizio della libera professione di Perito agrario e di Perito agrario laureato per la sessione 2024)</description>
          <pubDate>28/06/2024</pubDate>
          <link>http://alboserrata.asmenet.it/allegati.php?id_doc=28100615&amp;sez=p&amp;data1=28/06/2024&amp;data2=25/07/2024</link>
        </item>
        <item>
          <title>0208/2024, Delibere di giunta  </title>
          <description>Comune di SERRATA, APPROVAZIONE ORGANIGRAMMA E FUNZIONIGRAMMA DELL&apos;ENTE</description>
          <pubDate>27/06/2024</pubDate>
          <link>http://alboserrata.asmenet.it/allegati.php?id_doc=27120629&amp;sez=p&amp;data1=27/06/2024&amp;data2=12/07/2024</link>
        </item>
        [...]
      </channel>
    </rss>
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search