Python print csv column value before output of each result without repeating - SEO

chappers
March 11, 2021
141 views
2 votes
3 Answers

I have a Python script that imports a list of url’s from a CSV named list.csv, scrapes them and outputs any anchor text and href links found on each url from the csv:

(For reference the list of urls in the csv are all in column A)

from requests_html import HTMLSession
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
import pandas
import csv

contents = []
with open('list.csv','r') as csvf: # Open file in read mode
    urls = csv.reader(csvf)
    for url in urls:
        contents.append(url) # Add each url to list contents
    

for url in contents: 
    page = urlopen(url[0]).read()
    soup = BeautifulSoup(page, "lxml")

    for link in soup.find_all('a'):
        if len(link.text)>0:
            print(url, link.text, '-', link.get('href'))

The output results look something like this where https://www.example.com/csv-url-one/ and https://www.example.com/csv-url-two/ are the url’s in column A in the csv:

['https://www.example.com/csv-url-one/'] Creative - https://www.example.com/creative/
['https://www.example.com/csv-url-one/'] Web Design - https://www.example.com/web-design/
['https://www.example.com/csv-url-two/'] PPC - https://www.example.com/ppc/
['https://www.example.com/csv-url-two/'] SEO - https://www.example.com/seo/

The issue is i want the output results to look more like this i.e not repeatedly print the url in the CSV before each result AND have a break after each line from the CSV:

['https://www.example.com/csv-url-one/'] 
Creative - https://www.example.com/creative/
Web Design - https://www.example.com/web-design/

['https://www.example.com/csv-url-two/'] 
PPC - https://www.example.com/ppc/
SEO - https://www.example.com/seo/

Is this possible?

Thanks

Tags: beautifulsoup python

Answers

- MaciejM
- March 11, 2021 at 3:29 pm
- 0 votes
0
It is possible.

Simply add n at the end of print.
n is a break line special character.
```
for url in contents: 
    page = urlopen(url[0]).read()
    soup = BeautifulSoup(page, "lxml")

    for link in soup.find_all('a'):
        if len(link.text)>0:
            print(url, ('n'), link.text, '-', link.get('href'), ('n'),)
```
Login or Signup to reply.

- moritzgvt
- March 11, 2021 at 3:34 pm
- 0 votes
0
Does the following solve your problem?
```
for url in contents: 
    page = urlopen(url[0]).read()
    soup = BeautifulSoup(page, "lxml")
    print('n','********',', '.join(url),'********','n')
    for link in soup.find_all('a'):
        if len(link.text)>0:
            print(link.text, '-', link.get('href'))
```
Login or Signup to reply.

- Abhi_J
- March 11, 2021 at 3:52 pm
- 0 votes
0
To add a separation between urls add a n before printing each url.

If you want to print the urls only if it has valid links ieif len(link.text)>0:, use the for loop to save valid links to a list, and only print url and links if this list is not empty.

try this:
```
for url in contents: 
    page = urlopen(url[0]).read()
    soup = BeautifulSoup(page, "lxml")
    
    valid_links = []
    for link in soup.find_all('a'):
        if len(link.text)>0:
            valid_links .append(link.text)

    if len (valid_links ):
        print('n', url)
        for item in valid_links :
            print(item.text, '-', item.get('href')))
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Python print csv column value before output of each result without repeating – SEO

Answers