How to use Requests.Get to produce a list of status codes for a large number of sites - SEO

sleepy
April 4, 2022
133 views
0 votes
2 Answers

I’m a novice python programmer and I’m trying to use the requests library to find http status codes for a large list of urls, put those status codes into their own array, then add the status code array back to the dataframe as a new column.

Here’s a very basic version of the code that I’m using.

import requests
import pandas as pd

targets =pd.read_csv('/file/path.csv',header=None)
targetList =targets.values
for i in targetList:
    r = requests.get (f"{i}")
    r.status_code

I’m not concerned about the dataframe manipulation, that seems simple enough. And I can get the requests to work as discrete incidents

r=requests.get(targetList.item(0))
code=r.status_code
code

200

When I try to run the for loop however I get the following error.

InvalidSchema: No connection adapaters were found for "['https://www.google.com']"

Clearly the program is at least getting far enough to understand that the items in the list are strings, and understands the contents of those strings. But there’s a disconnect happening that I don’t understand.

Answers

- keramat
- April 4, 2022 at 4:46 am
- 0 votes
0
Use:
```
targetList.item(0)[2:-2]
```
The following code reproduces your error for me:
```
import requests
u = "['https://www.google.com']"
r=requests.get(u)
code=r.status_code
code
```
and the following returns 200:
```
import requests
u = "['https://www.google.com']"
r=requests.get(u[2:-2])
code=r.status_code
code
```
Login or Signup to reply.

variable i gives you list with all values in row – even if you have only one column – and you have to get single value from this list – ie. i[0]

import pandas as pd

data = {
    'urls': ['url1','url2','url2'], 
} 

df = pd.DataFrame(data)
 
for row in df.values:
    url = row[0]
    #print('row:', f'{row}')
    #print('url:', f'{url}')
    print('row:', row)
    print('url:', url)

    #requests.get(url)

    print('---')

Result:

row: ['url1']
url: url1
---
row: ['url2']
url: url2
---
row: ['url2']
url: url2
---

Or you should select single column – df['urls']

for url in df['urls']:
    #print('url:', f'{url}')
    print('url:', url)

    #requests.get(url)

    print('---')

Result:

url: url1
---
url: url2
---
url: url2
---

Please signup or login to give your own answer.

Click here to cancel reply.

How to use Requests.Get to produce a list of status codes for a large number of sites – SEO

Answers