skip to Main Content

Ok, so I am fairly new to this stuff so I’ll try to be as cohesive as I can with my problem.

I have created a simple web scraper using NodeJS, Express and Axios. This web scraper scrapes text from a class on a defined URL and passes the results in the console as a string using JSON.stringify.

The problem I am having, is I cannot work out a way to display this array text on a HTML page in a p tag.

For now, I have removed the URL and classnames as I want to keep what I am doing fairly low key, but I have replaced it with uppercase text. The MY URL GOES HERE is the URL of the site I am scraping from, and the CLASSNAME is the name of the specific class I am pulling text from.

Node JS index.js file:

const axios = require('axios')
const cheerio = require('cheerio')
const express = require('express')

const app = express()

const url = 'MY URL GOES HERE'
app.use('/', express.static('index.html'))

axios(url)
    .then(response => {
        const html = response.data
        const $ = cheerio.load(html)
        const textArray = []

        $('.CLASSNAME', html).each(function() {
            const myTextArray = $(this).text()
            textArray.push({
                myTextArray
            })
        })
        const finalResult = JSON.stringify(textArray)
        console.log(finalResult)
    }).catch(err => console.log(err))

app.listen(PORT, () => console.log(`Server running on ${PORT}`))

index.html:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Title</title>
    <script src="index.js"></script>
</head>
<body>

<h1 class="headline">This is the ingredients list</h1>
<p class="array"></p>

<script>

    var results = document.getElementsByClassName('array')
    console.log(results)

</script>
</body>
</html>

I am not sure how to get the array from the NodeJS js file and show it within the p tag within my HTML file.

I tried to install Puppeteer as I read this allows NodeJS to use standard JS commands in the DOM but I was getting these errors:

npm ERR! code EEXIST
npm ERR! syscall mkdir
npm ERR! path /Users/MYNAME/.npm/_cacache/content-v2/sha512/88/53
npm ERR! errno EEXIST
npm ERR! Invalid response body while trying to fetch https://registry.npmjs.org/js-yaml: EACCES: permission denied, mkdir '/Users/MYNAME/.npm/_cacache/content-v2/sha512/88/53'
npm ERR! File exists: /Users/MYNAME/.npm/_cacache/content-v2/sha512/88/53
npm ERR! Remove the existing file and try again, or run npm
npm ERR! with --force to overwrite files recklessly.

npm ERR! A complete log of this run can be found in: /Users/MYNAME/.npm/_logs/2023-12-15T01_00_27_889Z-debug-0.log

I am very new to this, so if I have missed anything or you need any extra context please let me know and I’ll do my best to provide information where I can.

2

Answers


  1. I saw you import index.js in your html ,it’s a little scary,index.js can
    only run by node ,it’s your service in localhost,I hope you can understand the code I modified below

    const axios = require('axios')
    const cheerio = require('cheerio')
    const express = require('express')
    const app = express()
    const PORT = 8080
    app.listen(PORT, () => console.log(`Server running on ${PORT}`))
    const url = 'MY URL GOES HERE'
    app.use('/', express.static('index.html'))
    app.get('/getArray', (req, res) => {
        axios(url)
            .then(response => {
                const html = response.data
                const $ = cheerio.load(html)
                const textArray = []
                $('.CLASSNAME', html).each(function () {
                    const myTextArray = $(this).text()
                    textArray.push({
                        myTextArray
                    })
                })
                const finalResult = JSON.stringify(textArray)
                console.log(finalResult)
                res.send(finalResult)
            }).catch(err => console.log(err))
    })
    <!DOCTYPE html>
    <html lang="en">
    
    <head>
        <meta charset="UTF-8">
        <title>Title</title>
    </head>
    
    <body>
        <h1 class="headline">This is the ingredients list</h1>
        <p class="array"></p>
    
        <!-- <script src="https://unpkg.com/axios/dist/axios.min.js"></script> -->
        <script src="https://cdn.jsdelivr.net/npm/axios/dist/axios.min.js"></script>
        <script>
            axios.defaults.baseURL = 'http://127.0.0.1:your express port';
            axios.get('/getArray')
                .then(function (response) {
                    var results = response.data
                    console.log(results)
                    document.querySelector('.array').innerHTML = results
                })
        </script>
    </body>
    
    </html>
    Login or Signup to reply.
  2. Since you are trying to implement a web scraping application you can use puppeteerJs with node Puppeteer is a simple library that scrapes the given URL and projects it to any given destination. Refer to the below usage of puppeteerJs.

    You dont need to use axios in that case.

    First, install puppeteer :

    npm install puppeteer
    

    Create a server script:

    const express = require('express');
    const puppeteer = require('puppeteer');
    
    const app = express();
    const port = 3000;
    
    app.get('/', async (req, res) => {
      try {
        const browser = await puppeteer.launch();
        const page = await browser.newPage();
    
        // Replace 'https://example.com' with the URL you want to fetch content from
        await page.goto('https://example.com');
        
        // Extract content from the page
        const content = await page.content();
    
        await browser.close();
    
        res.send(content);
      } catch (error) {
        console.error('Error:', error);
        res.status(500).send('Internal Server Error');
      }
    });
    
    app.listen(port, () => {
      console.log(`Server running at http://localhost:${port}`);
    });
    

    Run the server using below command:

    node server.js
    

    Create a client-side HTML file to project fetched data :

    <!DOCTYPE html>
    <html lang="en">
    <head>
      <meta charset="UTF-8">
      <meta name="viewport" content="width=device-width, initial-scale=1.0">
      <title>Fetched Content</title>
    </head>
    <body>
      <div id="content-container"></div>
      <script>
        // Fetch content from the server and display it in the 'content-container' div
        fetch('http://localhost:3000')
          .then(response => response.text())
          .then(content => {
            document.getElementById('content-container').innerHTML = content;
          })
          .catch(error => console.error('Error fetching content:', error));
      </script>
    </body>
    </html>
    

    Open the created file and you will see the content scrapped.
    Note that this is one way of doing it. You don’t need to create an HTML file. You can keep the data inside the puppeteer and manipulate data as well.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search