I am scraping version information from a website. I am able to get the information, but unable to get it without formatting. Currently targeting the DIV tag with Id j_idt19. Is there a way to get the info from the td withing table with id page_footer. I am unable to get to the specific TD with the text.
I would like to place the result into a csv, and just get the text into a text file as Num.NumNum.NumNumNum
# Retrieve the front page of Reddit
$response = Invoke-WebRequest -Uri "https://www.somesite.com/index.xhtml"
# Select the titles and URLs of the top stories
$results1 = $response.ParsedHtml.getElementsByTagName(“Div”) | Where-Object {$_.id -eq “j_idt19”} | Select-Object -Property TextContent
$results2 = $response.ParsedHtml.getElementsByTagName(“Div”) | Where-Object {$_.id -eq “j_idt19”} | Select-Object -Property TextContent | Out-String
Write-Output $results
$results1 | Export-Csv -Path “C:UsersASTRTW3DesktopDavid_ScriptsURL_TEST5.csv"
$results2 | Out-File -FilePath “C:UsersASTRTW3DesktopDavid_ScriptsURL_TEST5.txt"
Html code being scraped
<div id="j_idt19" class="ui-layout-unit ui-widget ui-widget-content ui-corner-all ui-layout-south ui-layout-pane ui-layout-pane-south" style="position: absolute; margin: 0px; inset: auto 5px 0px; width: auto; z-index: 0; height: 26px; display: block; visibility: visible;"><div class="ui-layout-unit-content ui-widget-content" style="position: relative; height: 22px; visibility: visible;">
<table id="page_footer" style="width: 100%; border-top: 1px solid #cbc3be !important;">
<tbody><tr>
<td style="width: 30%;">
</td>
<td style="width: 40%; text-align: center;"><span style="font-weight: bold;">1.14.012</span>
</td>
<td style="width: 15%; text-align: right;"> </td>
<td style="text-align: right; width: 20px; margin-top: 2px;"><div id="j_idt23" style="width:18px;height:18px;position:fixed;right:130px;bottom:2px"><div id="j_idt23_start" style="display:none"><img id="progressBar" src="/CSDB/resources/images/loader_footer.gif"></div><div id="j_idt23_complete" style="display:none"></div></div>
</td>
</tr>
</tbody></table></div></div>
csv result
#TYPE Selected.System.__ComObject
"textContent"
"
1.14.012
?
"
Text result
textContent
-----------
...
expected result
CSV
#TYPE Selected.System.__ComObject
"textContent"
1.14.012
text
1.14.012
2
Answers
I’ll assume what you’re after is always a
version
contained in a<span>
within a<td>
, in which case the code you could use would be:To scrape the specific
td
within the table with idpage_footer
, you can try usingInvoke-WebRequest
to fetch the page, and then drill down to the desired table andtd
using a combination ofParsedHtml.getElementsByTagName
and filtering byid
. Once you’re at the table level, navigate to your targettd
by indexing or additional filtering. PowerShell doesn’t directly support CSS selectors, so you’ll have to step through the DOM elements. For outputting just the text to a CSV or text file, utilize PowerShell’sExport-Csv
andOut-File
commands with the appropriate text content you’ve extracted.