Using .NET, how can I get the html code that is shown in a browser (and that can be saved from browsers such as Chrome or Opera through Save As commands) programmatically?
Using HtmlDocument.Load()
or wget
is to no avail – I will not get what I want.
See also the discussion here.
EDIT
Unfortunatelly the .Net WebClient
(or rather the new .Net.Http.HttpClient
) class did not help (see the answer by bdcoder). I got the same result as with HtmlDocument.Load()
or wget
. Not the html code that the browsers save.
let myHtml =
async
{
let client = new System.Net.Http.HttpClient()
let! responseBody =
client.GetStringAsync("https://www.kodis.cz/lines/region?tab=232-293")
|> Async.AwaitTask
return responseBody
} |> Async.RunSynchronously
3
Answers
Another potential solution to my problem is suggested here (an answer by Tomáš Petříček).
Have you tried the .Net WebClient class? You should be able to fetch a page from any URL, save the result and then process the HTML code accordingly.
Hope that helps.
If you look in the network panel of the browser dev tools you can see the endpoint the JavaScript is calling to get the PDF data. You can use the HttpClient to request the same data then parse the JSON to get the pdf links.