skip to Main Content

I am creating a query to extract description of customers in mongodb. Unfortunately, the description is in HTML Format. Is there a way to replace all HTML tags and make it as " ". Either replace it with " " or remove HTML Tags.

Below is a sample document

{ 
        "_id" : ObjectId("61f72aefdc85500a8baa6bb8")
        "CustomerPin" : "22010871", 
        "CustomerName" : "TestLastName, TestFirstName", 
        "Age" : 39.0, 
        "Gender" : "Male", 
        "Description" : "<p><span>This will be a test description</span><br/></p>", 
}

The output should remove "p", "span", and "br". Is there a function in mongodb to remove them all at once without repeating $project

This is the expected output:

{ 
        "_id" : ObjectId("61f72aefdc85500a8baa6bb8")
        "CustomerPin" : "22010871", 
        "CustomerName" : "TestLastName, TestFirstName", 
        "Age" : 39.0, 
        "Gender" : "Male", 
        "Description" : "This will be a test description", 
}

Thanks!

2

Answers


  1. One way to do it is by removing all tags by regex in pre hook of save method

    Description.replace(/(<([^>]+)>)/gi, "");
    

    See hooks here

    Login or Signup to reply.
  2. If you use Mongo 4.2 then you have to find the exact regex which will extract content from HTML. Below you can find an aggregate pipeline and the regex also.

    db.getCollection("name_of_your_collection").aggregate({
        $set: {
            contentRegex: {
                $regexFind: { input: "$Description", regex: /([^<>]+)(?!([^<]+)?>)/gi }
            }
        }
    },
        {
            $set: {
                content: { $ifNull: ["$contentRegex.match", "$Description"] }
            }
        },
        {
            $unset: [ "contentRegex" ]
        }
    )
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search