Ok, so I am fairly new to this stuff so I’ll try to be as cohesive as I can with my problem.
I have created a simple web scraper using NodeJS, Express and Axios. This web scraper scrapes text from a class on a defined URL and passes the results in the console as a string using JSON.stringify.
The problem I am having, is I cannot work out a way to display this array text on a HTML page in a p tag.
For now, I have removed the URL and classnames as I want to keep what I am doing fairly low key, but I have replaced it with uppercase text. The MY URL GOES HERE is the URL of the site I am scraping from, and the CLASSNAME is the name of the specific class I am pulling text from.
Node JS index.js file:
const axios = require('axios')
const cheerio = require('cheerio')
const express = require('express')
const app = express()
const url = 'MY URL GOES HERE'
app.use('/', express.static('index.html'))
axios(url)
.then(response => {
const html = response.data
const $ = cheerio.load(html)
const textArray = []
$('.CLASSNAME', html).each(function() {
const myTextArray = $(this).text()
textArray.push({
myTextArray
})
})
const finalResult = JSON.stringify(textArray)
console.log(finalResult)
}).catch(err => console.log(err))
app.listen(PORT, () => console.log(`Server running on ${PORT}`))
index.html:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
<script src="index.js"></script>
</head>
<body>
<h1 class="headline">This is the ingredients list</h1>
<p class="array"></p>
<script>
var results = document.getElementsByClassName('array')
console.log(results)
</script>
</body>
</html>
I am not sure how to get the array from the NodeJS js file and show it within the p tag within my HTML file.
I tried to install Puppeteer as I read this allows NodeJS to use standard JS commands in the DOM but I was getting these errors:
npm ERR! code EEXIST
npm ERR! syscall mkdir
npm ERR! path /Users/MYNAME/.npm/_cacache/content-v2/sha512/88/53
npm ERR! errno EEXIST
npm ERR! Invalid response body while trying to fetch https://registry.npmjs.org/js-yaml: EACCES: permission denied, mkdir '/Users/MYNAME/.npm/_cacache/content-v2/sha512/88/53'
npm ERR! File exists: /Users/MYNAME/.npm/_cacache/content-v2/sha512/88/53
npm ERR! Remove the existing file and try again, or run npm
npm ERR! with --force to overwrite files recklessly.
npm ERR! A complete log of this run can be found in: /Users/MYNAME/.npm/_logs/2023-12-15T01_00_27_889Z-debug-0.log
I am very new to this, so if I have missed anything or you need any extra context please let me know and I’ll do my best to provide information where I can.
2
Answers
I saw you import index.js in your html ,it’s a little scary,index.js can
only run by node ,it’s your service in localhost,I hope you can understand the code I modified below
Since you are trying to implement a web scraping application you can use
puppeteerJs
withnode
Puppeteer is a simple library that scrapes the given URL and projects it to any given destination. Refer to the below usage of puppeteerJs.You dont need to use
axios
in that case.First, install puppeteer :
Create a server script:
Run the server using below command:
Create a client-side HTML file to project fetched data :
Open the created file and you will see the content scrapped.
Note that this is one way of doing it. You don’t need to create an HTML file. You can keep the data inside the puppeteer and manipulate data as well.