skip to Main Content

I am iterating through a CSV file of URLs and using Invoke-WebRequest to get back the innerHTML and href values for links that match a specified criteria however this only works for some URLs and not for others unless I add the parameter -UseBasicParsing which doesn’t provide the property access and filtering capabilities I need.

A common denominator is that the ones that don’t work all use a www subdomain but a couple of them are still accessible without this but still don’t work and I am not sure this should be an issue anyway as other www URLs do work

As mentioned above, I have tried adding UseBasicParsing which does allow a connection but this restricts the data that I have access to. I have also looked at the http headers for the URLS to try and understand what the differences are but am unsure what the issue is.

This functions correctly and returns the innerHTML text and href for each link on the page

$currentRequest = Invoke-WebRequest -Uri https://moz.com/learn/seo/what- 
is-seo
$currentRequest | Get-Member
$currentRequest = $currentRequest.Links |
Select innerHTML, href |
WHERE innerHTML -like *SEO*
$currentRequest

Using exactly the same code with the following URL, the console just freezes until the script is exited

https://www.redevolution.com/what-is-seo

When I run the script with the working URL I get a pair of values for each link as shown below

innerHTML : Recommended SEO Companies
href      : https://moz.com/community/recommended

With the non working URL as mentioned above the command line just stays at a blinking cursor.

This is just one example and I need to query other data as well so it would be great to understand how I can consistently run Invoke-WebRequest without issues.

Many thanks!!

Mike

2

Answers


  1. Not so much an answer, as a long comment…

    In PowerShell 5.1, Invoke-WebRequest uses the Internet Explorer engine to parse the html into a DOM, which can also cause execution of any scripts on the page, so it’s possible something is going wrong in the script, or the headless Internet Explorer instance doesn’t like the page content for whatever reason.

    There are other reports elsewhere the same problem – for example Invoke-WebRequest hangs in some cases, unless -UseBasicParsing is used

    Adding the -UseBasicParsing switch bypasses Internet Explorer and uses a much simpler internal HTML parser – if you need to extract additional information you can use a HTML parser library like the HtmlAgilityPack or AngleSharp to parse and query the $currentRequest.Content property.

    Note that PowerShell Core 6.0 and up have made the -UseBasicParsing switch the default behaviour and there’s effectively no way to turn it off, so if you want to write future-proof scripts now it’s probably best to find a way to solve your problem using -UseBasicParsing so you dont have to rewrite it if / when you want to move up to PowerShell Core. (See Breaking Changes for PowerShell 6.0 -> Changes to Web Cmdlets)

    See How to parse html in Powershell Core for a related question.

    Login or Signup to reply.
  2. Firstly, in the code that “works”, i.e your first sample code, you are missing -UseBasicParsing. Now, why this is the case. Documentation here explains why: https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/invoke-webrequest?view=powershell-5.1

    To quote: “By default, script code in the web page may be run when the page is being parsed to populate the ParsedHtml property. Use the -UseBasicParsing switch to suppress this.”

    If you look at PowerShell v6, the default parsing has been switched to “-UseBasicParsing” (See here: https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/invoke-webrequest?view=powershell-6)

    It is not great, b/c of the reasons you mentioned and unfortunately there is no relief coming (see the comment from PowerShell dev here: https://twitter.com/Steve_MSFT/status/1153456742719639552?s=20)

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search