python sliceCsv.py
Total no. of rows: 329375
Field names are:
Note : Handle is the first column
Handle, Command, Title, Body (HTML), Vendor, Type, Tags, Published, Option1 Name, Option1 Value, Option2 Name, Option2 Value, Option3 Name, Option3 Value, Variant SKU, Variant Grams, Variant Inventory Tracker, Variant Inventory Qty, Variant Inventory Policy, Variant Fulfillment Service, Variant Price, Variant Compare At Price, Variant Requires Shipping, Variant Taxable, Variant Barcode, Image Src, Image Position, Image Alt Text, Gift Card, Google Shopping / MPN, Google Shopping / Age Group, Google Shopping / Gender, Google Shopping / Google Product Category, SEO Title, SEO Description, Google Shopping / AdWords Grouping, Google Shopping / AdWords Labels, Google Shopping / Condition, Google Shopping / Custom Product, Google Shopping / Custom Label 0, Google Shopping / Custom Label 1, Google Shopping / Custom Label 2, Google Shopping / Custom Label 3, Google Shopping / Custom Label 4, Variant Image, Variant Weight Unit, Variant Tax Code, Cost per item, Metafield: theme.Applications [string], Metafield: theme.Applications with Dilutions [string], Metafield: theme.Background [string], Metafield: theme.Notes [string], Metafield: theme.Category [string], Metafield: theme.Clonality [string], Metafield: theme.Clone Number [string], Metafield: theme.Concentration [string], Metafield: theme.Conjugation [string], Metafield: theme.Cross Reactive Species [string], Metafield: theme.Predicted Reactivity [string], Metafield: theme.Datasheet URL [string], Metafield: theme.Excitation Emission [string], Metafield: theme.Gene ID [string], Metafield: theme.Gene ID Link [string], Metafield: theme.Host [string], Metafield: theme.images [string], Metafield: theme.Immunogen Sequence [string], Metafield: theme.Isotype [string], Metafield: theme.Lead Time [string], Metafield: theme.Modification [string], Metafield: theme.Modification Site [string], Metafield: theme.Product URL [string], Metafield: theme.publications [string], Metafield: theme.Purification [string], Metafield: theme.Size [string], Metafield: theme.Source [string], Metafield: theme.Specificity [string], Metafield: theme.Storage [string], Metafield: theme.Storage Buffer [string], Metafield: theme.Storage Condition [string], Metafield: theme.Subcellular Locations [string], Metafield: theme.Swiss Prot [string], Metafield: theme.Swiss Prot Link [string], Metafield: theme.Synonyms [string], Metafield: theme.Target Protein [string], Metafield: theme.AllConjugations,
and
First row are:
bs-0637R MERGE P38 MAPK Polyclonal Antibody Company Primary Antibody, P38 MAPK, target-protein_P38 MAPK, RK, p38, CSBP, EXIP, Mxi2, CSBP1, CSBP2, CSPB1, PRKM14, PRKM15, SAPK2A, p38ALPHA, Mitogen-activated protein kinase 14, MAP kinase 14, MAPK 14, Cytokine suppressive anti-inflammatory drug-binding protein, CSAID-binding protein, MAP kinase MXI2, MAX-interacting protein 2, Mitogen-activated protein kinase p38 alpha, MAP kinase p38 alpha, Stress-activated protein kinase 2a, MAPK14, Human, cross-reactive-species_Human, Mouse, cross-reactive-species_Mouse, Rat, cross-reactive-species_Rat, Rabbit, cross-reactive-species_Rabbit, Others, cross-reactive-species_Others, Dog, predicted-reactivity_Dog, Unmodified, modification_Unmodified, Rabbit, host_Rabbit, Polyclonal, clonality_Polyclonal, IgG, isotype_IgG, WB, applications_WB, ELISA, applications_ELISA, FCM, applications_FCM, IHC-P, applications_IHC-P, IHC-F, applications_IHC-F, IF(IHC-P), applications_IF(IHC-P), IF(IHC-F), applications_IF(IHC-F), IF(ICC), applications_IF(ICC), ICC, applications_ICC, Unconjugated, conjugation_Unconjugated TRUE Volume 100ul bs-0637R 45 shopify 100 continue manual 340 TRUE TRUE FALSE lb WB, ELISA, FCM, IHC-P, IHC-F, IF(IHC-P), IF(IHC-F), IF(ICC), ICC WB(1:300-5000), ELISA(1:500-1000), FCM(1:20-100), IHC-P(1:200-400), IHC-F(1:100-500),...
As you can see, there are a lot of columns. How can I list only specific columns? Handle
I’ve tried (sliceCsv.py)
# importing csv module
import sys
import csv
# Fix this error : _csv.Error: field larger than field limit (131072)
csv.field_size_limit(sys.maxsize)
# csv file name
filename = "/Users/code/Desktop/shopify.csv"
keepColumns = ['Handle']
# initializing the titles and rows list
fields = []
rows = []
print('n====================================================================================n')
# reading csv file
with open(filename, 'r') as csvfile:
# creating a csv reader object
csvreader = csv.reader(csvfile)
# extracting field names through first row
fields = next(csvreader)
# extracting each data row one by one
for row in csvreader:
# print(row)
# return False # return False # SyntaxError: 'return' outside function
# break
rows.append(row)
# get total number of rows
print("Total no. of rows: %d"%(csvreader.line_num))
print('n-------------------------------------------------------------------n')
# printing the field names
print('Field names are: nn' + ', '.join(field for field in fields))
print('n-------------------------------------------------------------------n')
# printing first 5 rows
print('nFirst row are:nn')
for row in rows[:1
]:
# parsing each column of a row
for col in row:
print("%10s"%col),
print('n')
My ideal check would be sth like this… I’m not sure how to do it in Python or CSV parser.
if(row.header == "Handle") {
rows.append(row)
}
3
Answers
In a similar project, I applied the below strategy to get a specific column. I have changed the code a little bit as per your needs. The loop goes over each row and adds the specific column value to the handle_data list. You can access any row in this loop or multiple rows as well.
You don’t need to save the data in
rows
if you are not planning to write it somewhere else. You can useDictReader
if you need access to columns by their names.PS: This is pretty basic stuff, I suggest you to go over the fundamentals again to have a stronger grasp.
If all you need is a general tool for slicing up a CSV, and don’t want or need to write a program, I recommend downloading one of GoCSV’s prebuilt binaries and using its select command:
You can use any valid column name or column number,
gocsv select -c 1 ...
will get you the same result.You can direct the output to a file,
... -c 'Handle' YOUR-FILE.csv > just_handle.csv
, or pipe the output to:... YOUR-FILE.csv | gocsv view
... | gocsv stats
It’s fast, and usually has low memory usage.
I’ve written many custom one-off CSV Python scripts and I’m very grateful I found this tool.