Html - Regex to transform link tag to img tag

Samuele
September 20, 2023
250 views
4 votes
4 Answers

I need a regex to find all the occurrences ( could be multiple ) of an tag with the text: "Graphic source" and transform it to an img tag with the src attribute that contains the href url.

So FROM

<small><a href="https://www.url.com/image.png" target="_blank" rel="noopener">Graphic source</a></small>

<img src="https://www.url.com/image.png"/>

So for example:

Some text
Other tag <b>test</b>
<small><a href="https://www.url.com/name1.png" target="_blank" rel="noopener">Graphic source</a></small>test
<small><a href="https://www.url.com/name2.jpg" target="_blank" rel="noopener">Graphic source</a></small>Text text<small><a href="www.url.com">Do not transform</a></small>

Needs to be transformed as:

Some text
Other tag <b>test</b>
<img src="https://www.url.com/name1.png"/>test
<img src="https://www.url.com/name2.jpg"/>Text text<small><a href="www.url.com">Do not transform</a></small>

I almost got it working:
<small.*?href="(.*?)"

I don’t understand how to NOT include the a tag that do not contains the words Graphic source as text and how to NOT include all the other attributes of the a tag when transformed to img tag.

https://regex101.com/r/OReOCd/1

Tags: html regex

Answers

- Dimava
- September 20, 2023 at 12:44 am
- 0 votes
0
Obligatory disclaimer: Stop Parsing (X)HTML with Regular Expression
```
<a href="(.*?)"[^>]*?>Graphic source</a>
```
https://regex101.com/r/2Wd9le/1
Login or Signup to reply.

- AztecCodes
- September 20, 2023 at 12:46 am
- 0 votes
0
This should do the job:
```
<a href="(https?://[^"]+)"[^>]+>Graphic source</a>
```
For your replacement you could do:
```
<img src="$1"/>
```
Login or Signup to reply.

Don’t use `regex` to parse `HTML/XML`

Better use a programming language and proper libraries to parse HTML.

With one of the most used language, Python:

import requests
from lxml import html

res = requests.get('https://sputnick.fr/downloads/regex-to-transform-link-tag-to-img-tag.html')
tree = html.fromstring(res.text)
    
# using proper XPath query language:
elts = tree.xpath('//a[text()="Graphic source"]')

for a_elt in elts:
    img_elt = html.Element("img", src=a_elt.get("href"))
    a_elt.getparent().replace(a_elt, img_elt)

transformed_html = html.tostring(tree, encoding="unicode")

print(transformed_html)

Output

<html lang="en">
  <head>
    <title>Example</title>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1, user-scalable=0">
  </head>
  <body>
Some text
Other tag <b>test</b>
<small><img src="https://www.url.com/name1.png"></small>test
<small><img src="https://www.url.com/name2.jpg"></small>Text text<small><a href="www.url.com">Do not transform</a></small>
  </body>
</html>

- Reilas
- September 20, 2023 at 4:17 am
- 0 votes
0
"… I need a regex to find all the occurrences … and transform [them] to an img tag with the src attribute that contains the href url. …"

The regex pattern itself won’t replace any values, it simply matches.
You’ll need to use a program or programming language.

"… I don’t understand how to NOT include the a tag that do not contains the words Graphic source as text …"

Assert that the text following the > is "Graphic source<"
```
<.+?hrefs*=s*("|')(.+?)(?<!\)1.+?>Graphic source<.+>
```
The substitution text would be,
```
<img src="$2"/>
```
Also, I presume you could use s* preceding and following the text.
```
<.+?hrefs*=s*("|')(.+?)(?<!\)1.+?>s*Graphic sources*<.+>
```
"… and how to NOT include all the other attributes of the a tag when transformed to img tag. …"

In this type of situation, where there are repeated keys and values, you can use the lazy-quantifier, ?, to match up to the first encountered quotation-mark.

For example,
```
="(.+?)"
```
Here is an example output
```
Some text
Other tag test
<img src="https://www.url.com/name1.png"/>test
<img src="https://www.url.com/name2.jpg"/>
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Html – Regex to transform link tag to img tag

Answers

Don’t use regex to parse HTML/XML

With one of the most used language, Python:

Output

Don’t use `regex` to parse `HTML/XML`