Yes, I have searched the web and stackoverflow. I am having trouble extracting data from a table from a website. I can retrieve the full table with code below, but need to extract select data:
Url = "https://www.multpl.com/shiller-pe/table/by-month";
web = new HtmlWeb();
doc = web.Load(Url);
pe = doc.DocumentNode.SelectSingleNode("//*[@id='datatable']").InnerText.ToString();
Console.Write(pe);
Xpath //*[@id=’datatable’]/tbody/tr[3]/td[2] for a data point does not work and throws error.
This also does not work:
Url = "https://www.multpl.com/shiller-pe/table/by-month";
web = new HtmlWeb();
doc = web.Load(Url);
var table = doc.DocumentNode.SelectSingleNode("//*[@id='datatable']");
var tableRows = table.SelectNodes("tr");
var columns = tableRows[0].SelectNodes("th/text()");
for (int i = 1; i < tableRows.Count; i++)
{
for (int e = 0; e < columns.Count; e++)
{
var value = tableRows[i].SelectSingleNode("td[e + 1]");
Console.Write(columns[e].InnerText + ":" + value.InnerText);
}
}
Any direction will help, thank you.
2
Answers
Found a solution finally.
Ok, I found 2 problems.
td[e + 1]
. You try to usee
variable but not using string interpolation. Change your code to next one:th/text()
selector. You want to count columns so change it toth
. Html code inhttps://www.multpl.com/shiller-pe/table/by-month
has a few elements for second column in header but single element in the table so it’ll be right sing to useth
selector.The second column’s header is still specific so you will still have issues with
columns[e].InnerText
. May be it’s better to handle it manually. Column values can be trimmed too because there is line separators in second column. Here is my final code: