skip to Main Content

I need to make an ASP.net C# function for removing all image attributes, except "src", "align", "alt" and "title". The function must only remove content inside image tags. The input is html used for displaying articles, where I need to clean up image attributes.

public static string FixImageAttributes(string html-string)
{
    // Remove all attribues in the html-string here, except: "src", "align", "alt" and "title".

    return html-string;
}

Example:

If function input (html-string) is this:

<html>
<body>
<div>
<h1>Some html here</h1>
<p><img align="right" title="" border="0" hspace="7" alt="" vspace="7" src="/upload/content/images/bla/bla/test.jpg"></p>
</div>
<div>
<h2>Lorem impum</h2>
<p><img src="/upload/content/test/blah/image.jpg" width="624" height="255" alt="Text here" title="Hello" border="0" vspace="0" hspace="0"></p>
</div>
</body>
</html>

The function output should be this:

<html>
<body>
<div>
<h1>Some html here</h1>
<p><img align="right" title="" alt="" src="/upload/content/images/bla/bla/test.jpg"></p>
</div>
<div>
<h2>Lorem impum</h2>
<p><img src="/upload/content/test/blah/image.jpg" alt="Text here" title="Hello"></p>
</div>
</body>
</html>

2

Answers


  1. You can use HtmlAgilityPack for this and write something like this:

    public string RemoveAllAttributesFromEveryNode(string html)
    {
        var htmlDocument = new HtmlAgilityPack.HtmlDocument();
        htmlDocument.LoadHtml(html);
        var filterList = new List<string>{"src", "align", "alt", "title"};
        
        foreach (var node in htmlDocument.DocumentNode.SelectNodes("//*"))
        {
           var toRemove = node.Attributes.Where(x => !filterList.Contains(x)).ToList();
           foreach (var attribute in toRemove)
           {
               attribute.Remove();
           }
        }
    
        html = htmlDocument.DocumentNode.OuterHtml;
    
        return html;
    }
    

    More about can be found here HtmlAgilityPack can be found here:

    https://html-agility-pack.net/?z=codeplex

    Login or Signup to reply.
  2. I modified Ran Turner’s answer a bit:

    public static string RemoveAllAttributesFromImgNode(string html)
    {
        var htmlDocument = new HtmlDocument();
        htmlDocument.LoadHtml(html);
        string[] filter  = { "src", "align", "alt", "title" };
        var nodes = htmlDocument.DocumentNode.SelectNodes("//img");
        foreach (var node in nodes)
        {
            var attributes = node.Attributes.Where(x => !filter.Contains(x.Name.ToString())).ToList();
            foreach (var attribute in attributes)
            {
                node.Attributes.Remove(attribute);
            }
        }
        html = htmlDocument.DocumentNode.OuterHtml;
        return html;
    } 
    

    Output from console:

    Old HTML:
    
    <html>
    <body>
    <div>
    <h1>Some html here</h1>
    <p>
    <img align="right" title="" border="0" hspace="7" alt="" vspace="7" src="/upload/content/images/bla/bla/test.jpg">
    </p>
    </div>
    <div>
    <h2> Lorem impum</h2 >
    <p>
    <img src="/upload/content/test/blah/image.jpg" width="624" height="255" alt="Text here" title ="Hello" border="0" vspace="0" hspace="0">
    </p>
    </div>
    </body>
    </html>
    
    ===============================
    
    New HTML:
    
    <html>
    <body>
    <div>
    <h1>Some html here</h1>
    <p>
    <img align="right" title="" alt="" src="/upload/content/images/bla/bla/test.jpg">
    </p>
    </div>
    <div>
    <h2> Lorem impum</h2>
    <p>
    <img src="/upload/content/test/blah/image.jpg" alt="Text here" title="Hello">
    </p>
    </div>
    </body>
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search