I’m wanting to construct a dataframe by taking data from each page of an api (100 rows per page limit). Currently the code below returns all the data but it is structured wrong.
There are 17 headers, therefore I require the data in 17 columns. However, it outputs a dataframe of [100 rows x 1700 columns], where I need [10000 rows x 17 columns].
I’m unsure of how I can go about achieving this – any help would be greatly appreciated.
from ebaysdk.finding import Connection as finding
from bs4 import BeautifulSoup
import pandas as pd
x = []
for i in range(1,101):
print(type(i))
api = finding(siteid='EBAY-GB',appid='some_id',config_file=None)
response = api.execute('findItemsByKeywords', {'keywords': 'phone', 'outputSelector' : 'SellerInfo',
'paginationInput': {'entriesPerPage': '2','pageNumber': ' '+str(i)}})
soup = BeautifulSoup(response.content, 'lxml')
items = soup.find_all('item')
headers = ['itemid','title','categoryname','categoryid','postalcode','location','sellerusername','feedbackscore','positivefeedbackpercent','topratedseller','shippingservicecost','buyitnowavailable','currentprice','starttime','endtime','watchcount','conditionid']
for object in headers:
values = [element.text for element in soup.find_all(object)]
x.append(values)
df = pd.DataFrame(x)
df = df.T
print(x)
#[['152668959069', '252999725410'], ['Samsung GALAXY Ace GT-S5830i (Unlocked) Smartphone Android Phone- ALL COLOURS UK', '8GB 3G Unlocked Android 5.1 Quad Core Smartphone Mobile Phone 2 SIM GPS qHD'], ['Mobile & Smart Phones', 'Mobile & Smart Phones'], ['9355', '9355'], ['RM137PP'], ['Rainham,United Kingdom', 'United Kingdom'], ['deals4u_shop', 'smartlife2017'], ['15700', '456'], ['99.9', '98.5'], ['true', 'true'], ['0.0', '0.0'], ['false', 'false'], ['32.49', '48.9'], ['2017-08-18T18:36:28.000Z', '2017-06-19T09:04:40.000Z'], ['2017-12-16T18:36:28.000Z', '2017-12-16T09:04:40.000Z'], ['272', '134'], ['1000', '1000']]
print(df)
0 1
0 152668959069 Samsung GALAXY Ace GT-S5830i (Unlocked) Smartp...
1 252999725410 8GB 3G Unlocked Android 5.1 Quad Core Smartpho...
2 3 4 5
0 Mobile & Smart Phones 9355 RM137PP Rainham,United Kingdom
1 Mobile & Smart Phones 9355 None United Kingdom
6 7 8 9 ... 24 25 26 27 28 29
0 deals4u_shop 15700 99.9 true ... 456 98.5 true 0.0 false 48.9
1 smartlife2017 456 98.5 true ... 456 98.5 true 0.0 false 48.9
30 31 32 33
0 2017-06-19T09:04:40.000Z 2017-12-16T09:04:40.000Z 214 1000
1 2017-06-19T09:04:40.000Z 2017-12-16T09:04:40.000Z 182 1000
edit: added more code and printed x for the first 2 entries from the first page and df for first 2 entries from 2 pages.
3
Answers
This is the answer I came to if anyone's interested. It works but the column order gets changed for some reason and I'm not sure why. Thanks for all your help!
This should work better.
Dictionary comprehension version:
Loop version:
Consider iteratively appending to a list of dataframes with final concatenation: