I’m trying to make a time schedule telegram bot for my university, in order to do so I used HtmlAgilityPack to get data from university html table into Pair Object(A simple object with Date, Time, Discipline,Lecturers Name, Auditorium properties). The thing is it pulls cells, but I need to compose it into a Pair Object, so that I can then return an object for the users request. I think I need to use LINQ, but I don’t have much experience with it.
Further, the object.Date parameter will be used to compare the current date with the property date to return the whole schedule.
My code is as following:
public List<Pair> Scrape(string groupNumber)
{
//this gets all the cells in an html table
string groupUrl = _websiteUrl + groupNumber + ".xml";
var web = new HtmlWeb();
var doc = web.Load(groupUrl);
var htmlTableCell = from table in doc.DocumentNode.SelectNodes("/html/body/div[6]/div[2]/div/table").Cast<HtmlNode>()
from row in table.SelectNodes("//tr").Cast<HtmlNode>()
from cell in row.SelectNodes("th|td").Cast<HtmlNode>()
select new { CellText = cell.InnerText};
//this shows all the cells in a logger
foreach(var cell in htmlTableCell)
{
_logger.LogCritical(cell.CellText);
}
return _pairs;
}
3
Answers
If anyone has this problem, consider checking out the proper way of how the website table is structured, in my case it was a list of table rows with head dates(one date would break the assignment of values of an object), in order to still get the time table I decided to pull the whole week, with the use of the head date of the week, My code now looks something like this:
As I’ve stated above you really have to understand the way your website is structured in my case the rows had the same value for each row and they hadn’t had any atributes to distinguish them. I pulled every row and after that I started pulling cells, after that I created an if statement which distinguishes between a date which has been ruining the pull in the first place, and created objects for each row with a pair. In terms of code it looks something like this:
Additional information, you don’t need to understand how the website is structured, in fact, I might say that it’s somewhat harmful, because a single change in the html structure of the website will completely break your code, what I’d advice you to do is to use a class of the elements you want to scrape.
Example(the class is somewhat different, but the principle is the same):
Here it uses the class(if you have trouble in figuring out how to scrape certain data, consider putting information you want to scrape into chatGpt)