Here is a test file contains links and names within the <a></a>
tags.
/tmp/test_html.txt
<tr>
<td><a href="http://www.example.com/link1">example link 1</a></td>
</tr>
<tr>
<td><a href="http://www.example.com/link2">example link 2</a></td>
</tr>
<tr>
<td><a href="http://www.example.com/link3">example link 3</a></td>
</tr>
<tr>
<td><a href="https://www.example.com/4/0/1/40116601-1FDC-real-world-link/bar" target="_blank" class="real-world-class">Real World Link</a> </td>
</tr>
The following command can find out all links from the file, but it can’t print the link
and name
together:
# sed -n 's/.*href="([^"]*).*/1/p' /tmp/test_html.txt
I want the command can print all links line by line
with the name
first, and then following the href
.
Here is the expected output:
# sed <...command....> /tmp/test_html.txt
example link 1 | http://www.example.com/link1
example link 2 | http://www.example.com/link2
example link 3 | http://www.example.com/link3
Real World Link | https://www.example.com/4/0/1/40116601-1FDC-real-world-link/bar
How to write the sed
command?
2
Answers
This solution seems to work; please mark as correct or post a comment to explain why it is not correct; thanks!
This might work for you (GNU sed):
Filter lines using the
-n
option and make regexp easier using-E
option.Match on lines containing
href
followed by inner text and format as required using back references.