skip to Main Content

Using .NET, how can I get the html code that is shown in a browser (and that can be saved from browsers such as Chrome or Opera through Save As commands) programmatically?

Using HtmlDocument.Load() or wget is to no avail – I will not get what I want.

See also the discussion here.

EDIT

Unfortunatelly the .Net WebClient (or rather the new .Net.Http.HttpClient) class did not help (see the answer by bdcoder). I got the same result as with HtmlDocument.Load() or wget. Not the html code that the browsers save.

let myHtml =         
    async 
        {     
            let client = new System.Net.Http.HttpClient()
            let! responseBody = 
                client.GetStringAsync("https://www.kodis.cz/lines/region?tab=232-293")
                |> Async.AwaitTask
            return responseBody
        } |> Async.RunSynchronously

3

Answers


  1. Chosen as BEST ANSWER

    Another potential solution to my problem is suggested here (an answer by Tomáš Petříček).


  2. Have you tried the .Net WebClient class? You should be able to fetch a page from any URL, save the result and then process the HTML code accordingly.

    Hope that helps.

    Login or Signup to reply.
  3. If you look in the network panel of the browser dev tools you can see the endpoint the JavaScript is calling to get the PDF data. You can use the HttpClient to request the same data then parse the JSON to get the pdf links.

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search