I’m trying to create a regular expression to capture all the possible HTML classes of a file.
Let’s take this content as an example:
<html>
<body>
<div class="bg-secondary-7 -mx-5 p-5 hover:bg-secondary-8"></div>
</body>
<script>
const isTrue = 1 === 1;
const z = `hello ${isTrue ? "is-true" : "is-false"}`;
</script>
</html>
It should match:
bg-secondary-7
-mx-5
p-5
hover:bg-secondary-8
hello
is-true
is-false
I’ve tried this so far:
/(["'`])(?:(?=(\?))2.)*?1/g
It matches:
bg-secondary-7 -mx-5 p-5 hover:bg-secondary-8
hello ${isTrue ? "is-true" : "is-false"}
Then I can separate the strings using string.split(' ')
.
The problem is that for template literals the expressions are not being matched: "is-true"
and "is-false"
are not being matched.
What can be a good solution for this?
Edit:
The goal is to take an HTML or JS file and get all the possible HTML classes that can be used, it’s easy from the HTML because we could just use a DOM parser but it’s important that we parse them with regex because the user can add classes from JS using element.classList.add("class")
.
2
Answers
as mentioned by other users, it's a hard task.
I've found a simple regex that could be helpful.
When using it with the content provided in the original question, it finds any string, and although it includes a lot of false positives (like
html
orconst
), it gives us a grasp of what we're trying to achieve: to get possible class names from a file.Thank you all for taking your time to answer and comment this question. It's appreciated.
In general, avoid use of regex for html, it’s not designed for that. If you can parse DOM, do that instead.
You can use
MutationObserver
to detect whenclass
attribute was changed.