skip to Main Content

I use DOMPurify library (https://github.com/cure53/DOMPurify) to clean up html code copied from google docs.

I would like to remove the span tags from the copied text but keep the text inside the tags as well as any strong tags included in the deleted span tags.

I manage to remove the span tags while keeping the text inside the tags but any strong tags are also removed.

Example

DOMPurify.sanitize("<p>
<span style="background-color:transparent;color:#000000;"><strong>Some strong text</strong></span>
</p>", {
    ALLOWED_TAGS: ['p','strong']
})

Output

<p>Some strong text</p>

Expected output

<p><strong>Some strong text</strong></p>

I also tried with this kind of hook

DOMPurify.addHook("afterSanitizeAttributes", function (node, data, config) {
  if (node.nodeName === "SPAN") {
    node.replaceWith(node.textContent ?? "");
  }
});

But the output is the same, <strong> tags inside <span> are also deleted.

Can you please help me to keep (sub) <strong> tags after “sanitize”?

Many thanks

2

Answers


  1. Chosen as BEST ANSWER

    IT goldman, Thanks for your reply and especially thanks for pointing out that my code worked. Your comment prompted me to further my testing. In fact I use DOMPurify library in addition to ckeditor 5 (https://ckeditor.com/ckeditor-5/). Your comment pushed me to test my DOMPurify code outside the ckeditor context and indeed I realized by doing this that my code worked as expected.

    So I then investigated to understand what was happening by continuing testing in ckeditor context.

    In fact I had left the “font” plugin active in ckeditor configuration and because of that ckeditor was constantly adding this unwanted <span> tag after DOMPurify sanitization. What disturbed me.

    <span style="background-color:transparent;color:#000000;"></span>
    

    This post helped me identify and fix my problem https://github.com/ckeditor/ckeditor5/issues/6492

    After removing the font plugin in ckeditor configuration everything works as expected.

    In fact I don't even need to add DOMPurify anymore because ckeditor's Paste from Office/Paste from Google Docs feature cleans up google docs code perfectly for my needs.

    Thanks again for your help and for taking the time to answer me because it allowed me to take a step back and fix my problem.

    Thanks also for the idea of ​​the temporary conversion of strong to strong. I will keep this logic in my back pocket because it will certainly be useful to me in other situations.

    I will modify the title of my post and add the label ckeditor because in fact the problem concerned more ckeditor than DOMPurify.

    Thanks again


  2. First of all, your code does work.

    But a workaround would have been be to replace all <strong> with [strong], purify, then replace back. You can add logic to check beforehand if the string contains [strong] to make it robust. I will assume it doesn’t exist.

    function replaceStrong(text) {
      return text.replace(/</?strong>/gi, match =>
        match.startsWith('</') ? '[/strong]' : '[strong]'
      )
    }
    
    function restoreStrong(text) {
      return text.replace(/[/?(strong)]/g, match =>
        match.startsWith('[/') ? '</strong>' : '<strong>'
      )
    }
    
    function myPurify(html) {
      var stronger = replaceStrong(html)
      var replaced = DOMPurify.sanitize(stronger, {
        ALLOWED_TAGS: ['p', 'strong']
      })
      return restoreStrong(replaced)
    }
    
    var result = myPurify(`<p>
      <span style="background-color:transparent;color:#000000;"><strong>Some strong text</strong></span>
    </p>`)
    
    console.log(result)
    <script src="https://cdnjs.cloudflare.com/ajax/libs/dompurify/3.1.7/purify.js" integrity="sha512-QrJgumdAGShrxG5uB7fPRQmjs4cQU6FJee9lspckHT6YYwpilnoUuj2+v5L29mmA/kvLQBjkphsROlRPjtC61w==" crossorigin="anonymous" referrerpolicy="no-referrer"></script>
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search