skip to Main Content

python sliceCsv.py

Total no. of rows: 329375

Field names are:

Note : Handle is the first column

Handle, Command, Title, Body (HTML), Vendor, Type, Tags, Published, Option1 Name, Option1 Value, Option2 Name, Option2 Value, Option3 Name, Option3 Value, Variant SKU, Variant Grams, Variant Inventory Tracker, Variant Inventory Qty, Variant Inventory Policy, Variant Fulfillment Service, Variant Price, Variant Compare At Price, Variant Requires Shipping, Variant Taxable, Variant Barcode, Image Src, Image Position, Image Alt Text, Gift Card, Google Shopping / MPN, Google Shopping / Age Group, Google Shopping / Gender, Google Shopping / Google Product Category, SEO Title, SEO Description, Google Shopping / AdWords Grouping, Google Shopping / AdWords Labels, Google Shopping / Condition, Google Shopping / Custom Product, Google Shopping / Custom Label 0, Google Shopping / Custom Label 1, Google Shopping / Custom Label 2, Google Shopping / Custom Label 3, Google Shopping / Custom Label 4, Variant Image, Variant Weight Unit, Variant Tax Code, Cost per item, Metafield: theme.Applications [string], Metafield: theme.Applications with Dilutions [string], Metafield: theme.Background [string], Metafield: theme.Notes [string], Metafield: theme.Category [string], Metafield: theme.Clonality [string], Metafield: theme.Clone Number [string], Metafield: theme.Concentration [string], Metafield: theme.Conjugation [string], Metafield: theme.Cross Reactive Species [string], Metafield: theme.Predicted Reactivity [string], Metafield: theme.Datasheet URL [string], Metafield: theme.Excitation Emission [string], Metafield: theme.Gene ID [string], Metafield: theme.Gene ID Link [string], Metafield: theme.Host [string], Metafield: theme.images [string], Metafield: theme.Immunogen Sequence [string], Metafield: theme.Isotype [string], Metafield: theme.Lead Time [string], Metafield: theme.Modification [string], Metafield: theme.Modification Site [string], Metafield: theme.Product URL [string], Metafield: theme.publications [string], Metafield: theme.Purification [string], Metafield: theme.Size [string], Metafield: theme.Source [string], Metafield: theme.Specificity [string], Metafield: theme.Storage [string], Metafield: theme.Storage Buffer [string], Metafield: theme.Storage Condition [string], Metafield: theme.Subcellular Locations [string], Metafield: theme.Swiss Prot [string], Metafield: theme.Swiss Prot Link [string], Metafield: theme.Synonyms [string], Metafield: theme.Target Protein [string], Metafield: theme.AllConjugations, 

and

First row are:


  bs-0637R      MERGE P38 MAPK Polyclonal Antibody                 Company            Primary Antibody, P38 MAPK, target-protein_P38 MAPK, RK, p38, CSBP, EXIP, Mxi2, CSBP1, CSBP2, CSPB1, PRKM14, PRKM15, SAPK2A, p38ALPHA, Mitogen-activated protein kinase 14, MAP kinase 14, MAPK 14, Cytokine suppressive anti-inflammatory drug-binding protein, CSAID-binding protein, MAP kinase MXI2, MAX-interacting protein 2, Mitogen-activated protein kinase p38 alpha, MAP kinase p38 alpha, Stress-activated protein kinase 2a, MAPK14, Human, cross-reactive-species_Human, Mouse, cross-reactive-species_Mouse, Rat, cross-reactive-species_Rat, Rabbit, cross-reactive-species_Rabbit, Others, cross-reactive-species_Others, Dog, predicted-reactivity_Dog, Unmodified, modification_Unmodified, Rabbit, host_Rabbit, Polyclonal, clonality_Polyclonal, IgG, isotype_IgG, WB, applications_WB, ELISA, applications_ELISA, FCM, applications_FCM, IHC-P, applications_IHC-P, IHC-F, applications_IHC-F, IF(IHC-P), applications_IF(IHC-P), IF(IHC-F), applications_IF(IHC-F), IF(ICC), applications_IF(ICC), ICC, applications_ICC, Unconjugated, conjugation_Unconjugated       TRUE     Volume      100ul                                               bs-0637R         45    shopify        100   continue     manual        340                  TRUE       TRUE                                                  FALSE                                                                                                                                                                                         lb                       WB, ELISA, FCM, IHC-P, IHC-F, IF(IHC-P), IF(IHC-F), IF(ICC), ICC WB(1:300-5000), ELISA(1:500-1000), FCM(1:20-100), IHC-P(1:200-400), IHC-F(1:100-500),...

As you can see, there are a lot of columns. How can I list only specific columns? Handle

enter image description here

I’ve tried (sliceCsv.py)

# importing csv module
import sys
import csv

# Fix this error : _csv.Error: field larger than field limit (131072)
csv.field_size_limit(sys.maxsize)
    
# csv file name
filename = "/Users/code/Desktop/shopify.csv"

keepColumns = ['Handle']


# initializing the titles and rows list
fields = []
rows = []


print('n====================================================================================n')



# reading csv file
with open(filename, 'r') as csvfile:
    # creating a csv reader object
    csvreader = csv.reader(csvfile)

    # extracting field names through first row
    fields = next(csvreader)

    # extracting each data row one by one
    for row in csvreader:


        # print(row)
        # return False # return False # SyntaxError: 'return' outside function
        # break
        rows.append(row)

    # get total number of rows
    print("Total no. of rows: %d"%(csvreader.line_num))


print('n-------------------------------------------------------------------n')

# printing the field names
print('Field names are: nn' + ', '.join(field for field in fields))

print('n-------------------------------------------------------------------n')


#  printing first 5 rows
print('nFirst row are:nn')
for row in rows[:1

]:
    # parsing each column of a row
    for col in row:
        print("%10s"%col),
    print('n')

My ideal check would be sth like this… I’m not sure how to do it in Python or CSV parser.

if(row.header == "Handle") {
    rows.append(row)
}

3

Answers


  1. In a similar project, I applied the below strategy to get a specific column. I have changed the code a little bit as per your needs. The loop goes over each row and adds the specific column value to the handle_data list. You can access any row in this loop or multiple rows as well.

    import pandas as pd
    df = pd.read_csv(f)
    handle_data = []
    for id, row in df.iterrows():
        handle_data.append(row['Handle'])
    
    Login or Signup to reply.
  2. You don’t need to save the data in rows if you are not planning to write it somewhere else. You can use DictReader if you need access to columns by their names.

    import csv
    
    with open("example.csv") as csv_file:
        csv_reader = csv.DictReader(csv_file)
        for row in csv_reader:
            print(row["Handle"])
    
      bs-0637R      MERGE P38 MAPK Polyclonal Antibody                 Company            Primary Antibody
      bs-0637R      MERGE P38 MAPK Polyclonal Antibody                 Company            Primary Antibody
    ...
    

    PS: This is pretty basic stuff, I suggest you to go over the fundamentals again to have a stronger grasp.

    Login or Signup to reply.
  3. If all you need is a general tool for slicing up a CSV, and don’t want or need to write a program, I recommend downloading one of GoCSV’s prebuilt binaries and using its select command:

    gocsv select -c 'Handle' YOUR-FILE.csv
    

    You can use any valid column name or column number, gocsv select -c 1 ... will get you the same result.

    You can direct the output to a file, ... -c 'Handle' YOUR-FILE.csv > just_handle.csv, or pipe the output to:

    • its built-in table viewer to visually inspec, ... YOUR-FILE.csv | gocsv view
    • its stats command to get an overview of the kind of data in the column, ... | gocsv stats

    It’s fast, and usually has low memory usage.

    I’ve written many custom one-off CSV Python scripts and I’m very grateful I found this tool.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search