I always do manually define list of IDs in list.txt file and creating a long string.
I want create same long string but to parse a html file and export IDs from links inside.
Note: number of IDs in list.txt or list.html can be different
// list.txt
450814997
463939057
// list.bat
set "my_directory=c:server"
set "list="
for /F "tokens=*" %%A IN ('Type "list.txt"') do (
set "list=!list!%my_directory%%%A;"
)
echo %list%
// list.html
<body>
<div class="mod-list">
<table>
<tr data-type="ModContainer">
<td data-type="DisplayName">CBA_A3</td>
<td>
<span class="from-steam">Steam</span>
</td>
<td>
<a href="https://steamcommunity.com/sharedfiles/filedetails/?id=450814997" data-type="Link">https://steamcommunity.com/sharedfiles/filedetails/?id=450814997</a>
</td>
</tr>
<tr data-type="ModContainer">
<td data-type="DisplayName">ace</td>
<td>
<span class="from-steam">Steam</span>
</td>
<td>
<a href="https://steamcommunity.com/sharedfiles/filedetails/?id=463939057" data-type="Link">https://steamcommunity.com/sharedfiles/filedetails/?id=463939057</a>
</td>
</tr>
</table>
</div>
<div class="dlc-list">
<table />
</div>
<div class="footer">
<span>Created by Arma 3 Launcher by Bohemia Interactive.</span>
</div>
</body>
expected output:
c:server450814997;c:server463939057;
2
Answers
I used a file named
q77047266.txt
containing your HTML data for my testing.You don’t specify whether the required string should be extracted from its first or second occurrence on the line. I chose the last.
Using
?
as a delimiter, grab the part of the line after the second?
(token3) then append the result tolist
,with the decoration but wiyjout the first 3 characters (id=) and last 3 characters ()Doing a quick search, that HTML is from https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html, right?
To parse this HTML-source I’d highly recommend the XML/HTML/JSON parser xidel.
First the two
<tr>
-nodes you’re after:Next you can use
request-decode()
to retrieve the ids:Then it’s just a matter of creating the specific string you want. You can do this with
concat()
of course, or with XPath 4.0 String Templates (provided you’re using an up-to-date Xidel binary):And finally
string-join()
or--output-separator=''
to put everything on a single line:If you want all ids, then simply remove the condition (between
[
]
):