How can I strip attributes from a html image string and only keep the source?

Elbusta
May 13, 2024
98 views
0 votes
2 Answers

I have lots of html image elements as a string but they often contain rubbish I don’t need. How can I remove titles, height, class etc?

Eg. <img class="img-fruit" src="apple.png" title="apple" height="25" width="25">

Would become
<img src="apple.png">

The order of attributes varies.

Struggling to think of an easy solution, any ideas?

I have tried searching for specific attributes and trying to calculate the lengths to remove them but it’s messy

Tags: c#html string

Answers

- ParsaHeshmati
- May 13, 2024 at 10:13 pm
- 0 votes
0
You can use a regular expression to remove unwanted attributes from the HTML image elements string. Here’s a simple example in JavaScript 👇
```
const htmlString = '<img src="image.jpg" alt="Image" title="Title" height="200" class="img-thumbnail">';

const cleanedString = htmlString.replace(/(s*(?:title|height|class)=['"][^'"]*['"])/g, '');

console.log(cleanedString); // <img src="image.jpg" alt="Image">
```
Login or Signup to reply.

- AICoding
- May 13, 2024 at 10:55 pm
- 0 votes
0
To clean up HTML tags in a string using C# and remove unwanted attributes while retaining only the src attribute, you can use the HtmlAgilityPack library. This library makes it easier to parse and manipulate HTML.

Here’s how you can achieve this in C#:

Step-by-Step Solution

Install HtmlAgilityPack:
You can install the HtmlAgilityPack library via NuGet Package Manager.
```
Install-Package HtmlAgilityPack
```
Define a Function:
Create a function that takes the HTML string, finds all tags, and removes unwanted attributes, keeping only the src attribute.
```
using System;
using HtmlAgilityPack;

class Program
{
static void Main()
{
    string htmlString = "<img class="img-fruit" src="apple.png" title="apple" height="25" width="25">";
    string cleanedHtml = CleanImgTags(htmlString);
    Console.WriteLine(cleanedHtml);
}

static string CleanImgTags(string html)
{
    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(html);

    foreach (HtmlNode img in doc.DocumentNode.SelectNodes("//img"))
    {
        string srcValue = img.GetAttributeValue("src", null);

        if (srcValue != null)
        {
            // Remove all attributes
            img.Attributes.RemoveAll();

            // Add only the src attribute back
            img.SetAttributeValue("src", srcValue);
        }
    }

    return doc.DocumentNode.OuterHtml;
}
```
}

Load HtmlAgilityPack:
Include using HtmlAgilityPack at the top of your file.
Parse the HTML string using HtmlDocument.

Select and Clean Tags:
Use XPath

//img

to select all elements.
For each image

Element stores the src attribute value.
Remove all attributes using img.Attributes.RemoveAll().
Reassign the src attribute back to the element.

Return the Cleaned HTML:
Convert the modified HTML document back to a string using doc.DocumentNode.OuterHtml.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.