beautifulsoup Questions

BeautifulSoup not finding xml tag, how do i fix this? – Shopify

December 31, 2018
ahmadafiquddin
4 Answers

Tried using beautifulsoup to scrape a shopify site, using findAll('url') returns an empty list. How do I retrieve the desired content? import requests from bs4 import BeautifulSoup as soupify import lxml webSite = requests.get('https://launch.toytokyo.com/sitemap_pages_1.xml') pageSource = webSite.text webSite.close() pageSource =…

VIEW QUESTION

how to get few characters after a string so that able to identify the string is in head tag or list item? – Artificial Intelligence

November 29, 2018
9113303
2 Answers

I have collected all the head tags in the data given by heads=str(soup.find_all(re.compile('^h[1-6]$'))). Then i am collecting data in between the head tags. A portion of source code is given. import bs4 import re data = ''' <html> <body> <div…

VIEW QUESTION

python request missing part of the content – Artificial Intelligence

November 21, 2018
Aimee Huang
3 Answers

I'm scraping job content from a website(https://www.104.com.tw/job/?jobno=66wee). As I send request, only part of the content in the 'p' element are returned.I want all the div class="content" part. my code : import requests from bs4 import BeautifulSoup payload = {'jobno':'66wee'}…

VIEW QUESTION

How to get all external links found on a page using BeautifulSoup? – Artificial Intelligence

September 23, 2018
Rishabh Chopra
2 Answers

I'm reading the book, Web Scraping with Python which has the following function to retrieve external links found on a page: #Retrieves a list of all external links found on a page def getExternalLinks(bs, excludeUrl): externalLinks = [] #Finds all…

VIEW QUESTION

To remove base url – Artificial Intelligence

August 14, 2018
S.Nandhini
2 Answers

I wrote a python script to extract the href value from all links on a given web page: from BeautifulSoup import BeautifulSoup import urllib2 import re html_page = urllib2.urlopen("http://kteq.in/services") soup = BeautifulSoup(html_page) for link in soup.findAll('a'): print link.get('href') When I…

VIEW QUESTION

Python JSONDecoderError – SEO

June 27, 2018
Kyle
2 Answers

I am not to sure what I am doing wrong. I am trying to parse the specific contents within JavaScript. This is the output of "s" (for the code below it): <script type="text/javascript">window._sharedData = {"activity_counts":{"comment_likes":0,"comments":0,"likes":0,"relationships":0,"usertags":0},"config":{"csrf_token":"OIXAF5a6FwMQJj3vCaUQXCGUGL3sFb0Z","viewer":{"allow_contacts_sync":false,"biography":"Follow for the best social media…

VIEW QUESTION

Ebay API – Scraping a custom ebay search with BeautifulSoup. How to handle pagination?

June 1, 2018
kir.pir
3 Answers

I am trying to scrape a custom eBay search that shows 200 items on a single page. I need to get the title of the item, the price and the link to the said item. So far so good. But…

VIEW QUESTION

How to crawl href – Python & beautifulsoup – SEO

April 25, 2018
Serious Ruffy
2 Answers

I am currently crawling a web page (https://www.klook.com/city/30-kyoto/?p=1) using Python 3.4 and bs4 in order to collect the deeplinks of the respective activities. I found that the links are located in the html source like this: <a class="j_activity_item_link" href="/activity/1031-arashiyama-rickshaw-tour-kyoto/" class="j_activity_item_link"…

VIEW QUESTION

Combining multiple generated dataframes into a single dataframe – Ebay API

November 29, 2017
Charlie Frankum
3 Answers

I'm wanting to construct a dataframe by taking data from each page of an api (100 rows per page limit). Currently the code below returns all the data but it is structured wrong. There are 17 headers, therefore I require…

VIEW QUESTION

How click on dynamic buttons link "#" from selenium and splinter? – Facebook api

September 27, 2017
Aaditya Ura
2 Answers

I am trying to scrap something from website (example facebook(not using graph api just doing for learning), so I successfully login and land on front page, where I want to scrap some data, but the problem is when I land…

VIEW QUESTION