Javascript - How to get the preceding characters of a given string?

MeltingDog
July 27, 2023
255 views
1 vote
2 Answers

I am creating a scraper in Node JS and I want it to look for all .css files.

I’m passing the HTML of the page as a string and simply using indexOf() to look for instances of .css, eg:

const searchHTMLIndex = htmlString.indexOf(".css");
if (searchHTMLIndex > 0) {
          let tempString = htmlString.substring(0, searchHTMLIndex);
          let lineNumber = tempString.split('n').length;
          jsonObj[getPageId] = pageObj;
          pageObj.pageUrl = url;
          return pageObj.searchTerm[item] = "CSS on line number: " + lineNumber;
}

However, I’d like to get the full CSS file name (and full path) if possible, eg: /assets/css/myCSSfile.css.

How do I get the preceding characters of a given string (up until, say " or =)?

Tags: javascript node.js

Answers

- AlexanderNenashev
- July 27, 2023 at 8:31 am
- 0 votes
0
Use jsdom to parse the HTML:

https://github.com/jsdom/jsdom
```
import {JSDOM} from 'jsdom';

const dom = new JSDOM(htmlString);
const cssUrls = [...dom.window.document.querySelectorAll('link[rel=stylesheet]')].map(link => link.href);
```
Login or Signup to reply.

You could a regexp to extract href from <link rel="stylesheet" href="URL">:

const htmlString = `
    <link rel="stylesheet" type="text/css" href="https://cdn.sstatic.net/Shared/stacks.css?v=312b43e78b51">
    <link rel="stylesheet" type="text/css" href="https://cdn.sstatic.net/Sites/stackoverflow/primary.css?v=134475a13287">
    <link type="text/css" href="https://cdn.sstatic.net/Shared/Channels/channels.css?v=a4d77abedec3" rel="stylesheet">
`;
  
const cssUrls = htmlString.match(/(?<=<link[^>]*(rel="stylesheet")?[^>]+href=")[^"]+(?=([^>]*rel="stylesheet")?)/g);
console.log(cssUrls);

Please signup or login to give your own answer.

Click here to cancel reply.

Javascript – How to get the preceding characters of a given string?

Answers