skip to Main Content

I have lots of html image elements as a string but they often contain rubbish I don’t need. How can I remove titles, height, class etc?

Eg. <img class="img-fruit" src="apple.png" title="apple" height="25" width="25">

Would become
<img src="apple.png">

The order of attributes varies.

Struggling to think of an easy solution, any ideas?

I have tried searching for specific attributes and trying to calculate the lengths to remove them but it’s messy

2

Answers


  1. You can use a regular expression to remove unwanted attributes from the HTML image elements string. Here’s a simple example in JavaScript 👇

    const htmlString = '<img src="image.jpg" alt="Image" title="Title" height="200" class="img-thumbnail">';
    
    const cleanedString = htmlString.replace(/(s*(?:title|height|class)=['"][^'"]*['"])/g, '');
    
    console.log(cleanedString); // <img src="image.jpg" alt="Image">
    
    Login or Signup to reply.
  2. To clean up HTML tags in a string using C# and remove unwanted attributes while retaining only the src attribute, you can use the HtmlAgilityPack library. This library makes it easier to parse and manipulate HTML.

    Here’s how you can achieve this in C#:

    Step-by-Step Solution

    Install HtmlAgilityPack:
    You can install the HtmlAgilityPack library via NuGet Package Manager.

    Install-Package HtmlAgilityPack
    

    Define a Function:
    Create a function that takes the HTML string, finds all tags, and removes unwanted attributes, keeping only the src attribute.

    using System;
    using HtmlAgilityPack;
    
    class Program
    {
    static void Main()
    {
        string htmlString = "<img class="img-fruit" src="apple.png" title="apple" height="25" width="25">";
        string cleanedHtml = CleanImgTags(htmlString);
        Console.WriteLine(cleanedHtml);
    }
    
    static string CleanImgTags(string html)
    {
        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(html);
    
        foreach (HtmlNode img in doc.DocumentNode.SelectNodes("//img"))
        {
            string srcValue = img.GetAttributeValue("src", null);
    
            if (srcValue != null)
            {
                // Remove all attributes
                img.Attributes.RemoveAll();
    
                // Add only the src attribute back
                img.SetAttributeValue("src", srcValue);
            }
        }
    
        return doc.DocumentNode.OuterHtml;
    }
    

    }

    Load HtmlAgilityPack:
    Include using HtmlAgilityPack at the top of your file.
    Parse the HTML string using HtmlDocument.

    Select and Clean Tags:
    Use XPath

    //img

    to select all elements.
    For each image

    Element stores the src attribute value.
    Remove all attributes using img.Attributes.RemoveAll().
    Reassign the src attribute back to the element.

    Return the Cleaned HTML:
    Convert the modified HTML document back to a string using doc.DocumentNode.OuterHtml.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search