How to return the whole line where a string match is found in a webpage using python request - Telegram API

rbutrnz
August 9, 2021
176 views
0 votes
2 Answers

I am working with grabbing the line that matches for the strings I am searching in a webpage. I tried some approach but it reads and displayed everything. Below is the partial snippet.

import requests
url = "https://bscscan.com/address/0x88c20beda907dbc60c56b71b102a133c1b29b053#code"
queries = ["Website", "Telegram", "Submitted"]

r = requests.get(url)
for q in queries:
    q = q.lower()
    if q in r.text.lower():
        print(q, 'Found')
    else:
        print(q, 'Not Found')

Current Output:

    website Found
    telegram Found
    submitted Found

Wanted Output:

    Submitted Found - *Submitted for verification at BscScan.com on 2021-08-08
    Website Found - *Website: www.shibuttinu.com
    Telegram Found - *Telegram: https://t.me/Shibuttinu

Answers

- JackFleeting
- August 9, 2021 at 4:01 am
- 0 votes
0
requests is returning an html page which you have to parse with an html parser. One problem is that your target outputs are stuck in the middle of a long string which, after parsing, you have to extract using some string manipulation.

You can parse the html with either beautifulsoup with css selectors, or lxml with xpath:

First, using lxml:
```
import lxml.html as lh

doc = lh.fromstring(r.text)

loc = doc.xpath('//pre[@class="js-sourcecopyarea editor"]')[0]
targets = list(loc.itertext())[0].split('*')
for target in targets:
    for query in queries:
           if query in target:
                print(target)
```
With beautifulsoup:
```
from bs4 import BeautifulSoup as bs

soup = bs(r.text,'lxml')

pre = soup.select_one('pre.js-sourcecopyarea.editor')
ss = (list(pre.stripped_strings)[0]).split('*')
for s in ss:
       for query in queries:
            if query in s:
                print(s)
```
Output, in either case:
```
Submitted for verification at BscScan.com on 2021-08-08

Website: www.shibuttinu.com
 
Telegram: https://t.me/Shibuttinu
```
Login or Signup to reply.

- AndreGoulart
- August 9, 2021 at 4:10 am
- 0 votes
0
You are only printing q. Which is you query. You want to print the request r

In short, you should try: print(q, 'Found', r)
```
import requests
url = "https://bscscan.com/address/0x88c20beda907dbc60c56b71b102a133c1b29b053#code"
queries = ["Website", "Telegram", "Submitted"]

req = requests.get(url).text

for r in req:
    if any(q.lower() in r.lower() for q in queries):
        print(q, 'Found in', r)
```
Lastly, in this website you won’t find any result because the text you are looking for is not inside a Text tag. You probably want to filter your request looking for a div with class="ace_line_group".
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

How to return the whole line where a string match is found in a webpage using python request – Telegram API

Answers