I would like to read the href
url that is stored next to the <td>blue</td>
element:
<html>
<body>
<table>
<tr>
<td>
<a href="localhost/url1">url1</a>
</td>
<td>blue</td>
</tr>
<tr>
<td>
<a href="localhost/url2">url2</a>
</td>
<td>green</td>
</tr>
</table>
</body>
</html>
I first tried to capture the surrounding <tr>
tag, but even this does not work:
#!/bin/bash
HTML_FILE="html_content.html"
tr_tag=$(grep -o '<tr>.*blue.*</tr>' "$HTML_FILE")
echo $tr_tag
My output is always blank. Why?
2
Answers
If you are using GNU grep which supports PCRE, try this:
Using any awk in any shell on every Unix box and only reading 1 line at a time into memory, assuming your input is always formatted exactly as shown in your question:
Regarding:
blue
is surrounded by<td>
, not<tr>
, tags.