I am creating a scraper in Node JS and I want it to look for all .css
files.
I’m passing the HTML of the page as a string and simply using indexOf()
to look for instances of .css
, eg:
const searchHTMLIndex = htmlString.indexOf(".css");
if (searchHTMLIndex > 0) {
let tempString = htmlString.substring(0, searchHTMLIndex);
let lineNumber = tempString.split('n').length;
jsonObj[getPageId] = pageObj;
pageObj.pageUrl = url;
return pageObj.searchTerm[item] = "CSS on line number: " + lineNumber;
}
However, I’d like to get the full CSS file name (and full path) if possible, eg: /assets/css/myCSSfile.css
.
How do I get the preceding characters of a given string (up until, say "
or =
)?
2
Answers
Use
jsdom
to parse the HTML:https://github.com/jsdom/jsdom
You could a regexp to extract
href
from<link rel="stylesheet" href="URL">
: