skip to Main Content

I always do manually define list of IDs in list.txt file and creating a long string.
I want create same long string but to parse a html file and export IDs from links inside.

Note: number of IDs in list.txt or list.html can be different

// list.txt

450814997
463939057

// list.bat

set "my_directory=c:server"
set "list="
for /F "tokens=*" %%A IN ('Type "list.txt"') do (
    set "list=!list!%my_directory%%%A;"
)
echo %list%

// list.html

  <body>
    <div class="mod-list">
      <table>
        <tr data-type="ModContainer">
          <td data-type="DisplayName">CBA_A3</td>
          <td>
            <span class="from-steam">Steam</span>
          </td>
          <td>
            <a href="https://steamcommunity.com/sharedfiles/filedetails/?id=450814997" data-type="Link">https://steamcommunity.com/sharedfiles/filedetails/?id=450814997</a>
          </td>
        </tr>
        <tr data-type="ModContainer">
          <td data-type="DisplayName">ace</td>
          <td>
            <span class="from-steam">Steam</span>
          </td>
          <td>
            <a href="https://steamcommunity.com/sharedfiles/filedetails/?id=463939057" data-type="Link">https://steamcommunity.com/sharedfiles/filedetails/?id=463939057</a>
          </td>
        </tr>
      </table>
    </div>
    <div class="dlc-list">
      <table />
    </div>
    <div class="footer">
      <span>Created by Arma 3 Launcher by Bohemia Interactive.</span>
    </div>
  </body>

expected output:

c:server450814997;c:server463939057;

2

Answers


  1. @ECHO OFF
    SETLOCAL ENABLEDELAYEDEXPANSION 
    set "my_directory=c:server"
    set "list="
    for /F "tokens=3delims=?" %%E IN ('Type "q77047266.txt"') do (
     SET "token3=%%E"
     IF DEFINED token3 set "list=!list!%my_directory%!token3:~3,-4!;"
    )
    echo %list%
    
    GOTO :EOF
    

    I used a file named q77047266.txt containing your HTML data for my testing.

    You don’t specify whether the required string should be extracted from its first or second occurrence on the line. I chose the last.

    Using ? as a delimiter, grab the part of the line after the second ? (token3) then append the result to list,with the decoration but wiyjout the first 3 characters (id=) and last 3 characters ()

    Login or Signup to reply.
  2. I want create same long string but to parse a html file and export IDs from links inside.

    Doing a quick search, that HTML is from https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html, right?

    To parse this HTML-source I’d highly recommend the XML/HTML/JSON parser .

    First the two <tr>-nodes you’re after:

    xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
          -e "//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]"^
          --output-node-format=xml --output-node-indent
    <tr data-type="ModContainer">
      <td data-type="DisplayName">CBA_A3</td>
      <td>
        <span class="from-steam">Steam</span>
      </td>
      <td>
        <a href="http://steamcommunity.com/sharedfiles/filedetails/?id=450814997" data-type="Link">http://steamcommunity.com/sharedfiles/filedetails/?id=450814997</a>
      </td>
    </tr>
    <tr data-type="ModContainer">
      <td data-type="DisplayName">ace</td>
      <td>
        <span class="from-steam">Steam</span>
      </td>
      <td>
        <a href="http://steamcommunity.com/sharedfiles/filedetails/?id=463939057" data-type="Link">http://steamcommunity.com/sharedfiles/filedetails/?id=463939057</a>
      </td>
    </tr>
    
    xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
          -e "//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]/td/a/@href"
    http://steamcommunity.com/sharedfiles/filedetails/?id=450814997
    http://steamcommunity.com/sharedfiles/filedetails/?id=463939057
    

    Next you can use request-decode() to retrieve the ids:

    xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
          -e "//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]/td/a/request-decode(@href)"
    {
      "url": "http://steamcommunity.com/sharedfiles/filedetails/?id=450814997",
      "protocol": "http",
      "host": "steamcommunity.com",
      "path": "sharedfiles/filedetails/",
      "query": "id=450814997",
      "params": {
        "id": "450814997"
      }
    }
    {
      "url": "http://steamcommunity.com/sharedfiles/filedetails/?id=463939057",
      "protocol": "http",
      "host": "steamcommunity.com",
      "path": "sharedfiles/filedetails/",
      "query": "id=463939057",
      "params": {
        "id": "463939057"
      }
    }
    
    xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
          -e "//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]/td/a/request-decode(@href)/params/id"
    450814997
    463939057
    

    Then it’s just a matter of creating the specific string you want. You can do this with concat() of course, or with XPath 4.0 String Templates (provided you’re using an up-to-date Xidel binary):

    xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
          -e "//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]/td/a/request-decode(@href)/params/concat('c:server',id,';')"
    #or
    xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
          -e "//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]/td/a/request-decode(@href)/params/`c:server{id};`"
    c:server450814997;
    c:server463939057;
    

    And finally string-join() or --output-separator='' to put everything on a single line:

    xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
          -e "string-join(//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]/td/a/request-decode(@href)/params/`c:server{id};`)"
    #or
    xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
          -e "//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]/td/a/request-decode(@href)/params/`c:server{id};`"^
          --output-separator=''
    c:server450814997;c:server463939057;
    

    If you want all ids, then simply remove the condition (between [ ]):

    xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
          -e "//table/tbody/tr/td/a/request-decode(@href)/params/`c:server{id};`"^
          --output-separator=''
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search