I am creating a query to extract description of customers in mongodb. Unfortunately, the description is in HTML Format. Is there a way to replace all HTML tags and make it as " ". Either replace it with " " or remove HTML Tags.
Below is a sample document
{
"_id" : ObjectId("61f72aefdc85500a8baa6bb8")
"CustomerPin" : "22010871",
"CustomerName" : "TestLastName, TestFirstName",
"Age" : 39.0,
"Gender" : "Male",
"Description" : "<p><span>This will be a test description</span><br/></p>",
}
The output should remove "p", "span", and "br". Is there a function in mongodb to remove them all at once without repeating $project
This is the expected output:
{
"_id" : ObjectId("61f72aefdc85500a8baa6bb8")
"CustomerPin" : "22010871",
"CustomerName" : "TestLastName, TestFirstName",
"Age" : 39.0,
"Gender" : "Male",
"Description" : "This will be a test description",
}
Thanks!
2
Answers
One way to do it is by removing all tags by regex in pre hook of save method
See hooks here
If you use Mongo 4.2 then you have to find the exact regex which will extract content from HTML. Below you can find an aggregate pipeline and the regex also.