Elementor - Get main content in a page while web scraping node js, Puppeteer, Cheerio

MahmudulHasanSagar
February 13, 2022
330 views
0 votes
2 Answers

I have a Project with Node JS on web scraping where I will have to scrape Heading and Text from Main Content. But the Problem is I’m not able to Determine which is Main Content When there is No aside or main tag or class/id/role named aside or main. I’m Using Puppeteer and Cheerio Library. I have Tried using Mercury Web Parser But it has its Own problems. Like It doesn’t return any content from Pages that Built with Elementor Theme builder on WordPress. If anyone have any idea on how can I differentiate main content from rest of the web page it will be really helpful.

Answers

- ALEmran
- February 13, 2022 at 2:49 pm
- 0 votes
0
You can checkout Readability JS library from Mozilla. They use for reader view.

Login or Signup to reply.

- EdiImanto
- February 13, 2022 at 5:49 pm
- 0 votes
0
Try to explore more about CSS Selectors and specificity.
If you’re scraping Elementor, be sure to use this trick for the selector:
Use data-elementor-(attributename) attributes for everything in DOM.
```
const mainContent = await page.waitForElement('[data-elementor-type="wp-page"]', {visible: true, timeout: 0})
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Elementor – Get main content in a page while web scraping node js, Puppeteer, Cheerio

Answers