skip to Main Content

I’m newer to Python. I’m using openpyxl for a SEO project for my brother and I’m trying to get a number of rows that contain a specific value in them.

I have a spreadsheet that looks something like this:
sample spreadsheet

I want to write a program that will get the keywords and parse them to a string by state, so like:
Missouri = "search item 1, search item 2, search item 5, search item 6"
Illinois = "search item 3, search item 4"

I have thus far created a program like this:

    #first, import openpyxl
    import openpyxl

    #next, give location of file
    path = "testExcel.xlsx"

    #Open workbook by creating object
    wb_object = openpyxl.load_workbook(path)

    #Get workbook active sheet object
    sheet_object = wb_object.active

    #Getting the value of maximum rows
    #and column
    row = sheet_object.max_row
    column = sheet_object.max_column
    print("Total Rows:", row)
    print("Total Columns:", column)

    #printing the value of forth column, state
    #Loop will print all values
    #of first column
    print("nValue of fourth column")
    for i in range(4, row + 1):
        cell_object = sheet_object.cell(row=i, column=4)
        split_item_test = cell_object.value.split(",")
        split_item_test_result = split_item_test[0]
        state = split_item_test_result
        print(state)
        if (state == 'Missouri'):
            print(state.count('Missouri'))
    print("All good")

The problem is after doing this, I see that it prints 1 repeatedly, but not a total number for Missouri. I would like a total number of mentions of the state, and then eventually get it to a string with each search criteria.

Is this possible with openpyxl? Or will I need a different library?

3

Answers


  1. ranemirusG is right, there are several ways to obtain the same result. Here’s another option…I attempted to preserve your thought process, good luck.

    print("nValue of fourth column")
    
    missouri_list = [] # empty list
    illinois_list = [] # empty list
    
    for i in range(2, row+1): # It didn't look like "4, row+1" captured the full sheet, try (2, row+1)
        cell_object = sheet_object.cell(row=i, column=4)
        keyword = sheet_object.cell(row=i, column=1)
        keyword_fmt = keyword.value # Captures values in Keyword column
        split_item_test = cell_object.value.split(",")
        split_item_test_result = split_item_test[1] # 1 captures states
        state = split_item_test_result
        print(state)
    
        # simple if statement to capture results in a list
        if 'Missouri' in state:
            missouri_list.append(keyword_fmt)
        if 'Illinois' in state:
            illinois_list.append(keyword_fmt)
    print(missouri_list)
    print(len(missouri_list)) # Counts the number of occurances
    print(illinois_list)
    print(len(illinois_list)) # Counts the number of occurances
    print("All good")
    
    Login or Signup to reply.
  2. Yes, it’s possible with openpyxl.
    To achieve your real goal try something like this:

    states_and_keywords  = {}
    for i in range(4, row + 1):
        cell_object = sheet_object.cell(row=i, column=4)
        split_item_test = cell_object.value.split(",")
        split_item_test_result = split_item_test[1] #note that the element should be 1 for the state
        state = split_item_test_result.strip(" ") #trim whitespace (after comma)
        keyword = cell_object.offset(0,-3).value #this gets the value of the keyword for that row
        if state not in states_and_keywords:
            states_and_keywords[state] = [keyword]
        else:
            states_and_keywords[state].append(keyword) 
    print(states_and_keywords)
    
    Login or Signup to reply.
  3. Ok another option
    This will create a dictionary ‘state_dict’ in the format per your question

    Missouri = "search item 1, search item 2, search item 5, search item
    6"
    Illinois = "search item 3, search item 4"

    ...
    print("nValue of fourth column")
    state_dict = {}
    for row in sheet_object.iter_rows(min_row=2, max_row=sheet_object.max_row):
        k = row[3].value.split(',')[1].strip()
        v = row[0].value
        if k in state_dict:
            state_dict[k] += [v]
        else:
            state_dict[k] = [v]
    
    ### Print values
    for key, value in state_dict.items():
        print(f'{key}, Total {len(value)}', end='; ')
        for v in value:
            print(f'{v}', end=', ')
        print('')
    

    Will create the dictionary ‘state_dict’ as so;

    'Missouri' = {list: 4} ['search item 1', 'search item 2', 'search item 5', 'search item 6']
    'Illinois' = {list: 2} ['search item 3', 'search item 4']
    'Alabama' = {list: 1} ['search item 7']
    'Colorado' = {list: 1} ['search item 8']
    

    Print output

    Value of fourth column
    Missouri = Total 4; search item 1, search item 2, search item 5, search item 6, 
    Illinois = Total 2; search item 3, search item 4, 
    Alabama = Total 1; search item 7, 
    Colorado = Total 1; search item 8, 
    

    ###————–Additional Information ———————–###
    Updated the state_dict to include the rank details for each item.
    The output display now shows each items for each state in rank order. You have two options on how the data may be restricted;
    Maximum rank to show, the variable rank_max = 100 determines the highest rank the output will display so if it’s set to 5 then only ranks 1, 2, 3, 4 and 5 will be displayed if the State has items with those ranks.
    The total_ranks_to_display determines number of ranks to display. So regardless of the rank_max value this will restrict the number of ranks shows with rank 1 as top. Example; if you set rank_max to 10 and this would show 8 rows for Missouri then setting the total_ranks_to_display to 4 will mean only the top 4 ranks will show.
    You can use either or both to achieve what you need I think.

    ...
    print("nValue of fourth column")
    state_dict = {}
    for row in sheet_object.iter_rows(min_row=2, max_row=sheet_object.max_row):
        k = row[3].value.split(',')[1].strip()
        v = row[0].value
        r = row[2].value
        if k in state_dict:
            if r in state_dict[k]:
                state_dict[k][r] += [v]
            else:
                state_dict[k].update({r: [v]})
        else:
            state_dict[k] = {r: [v]}
    
    
    rank_max = 10
    total_ranks_to_display = 4
    for key, value in state_dict.items():
        print(f'{key}')
        top_count = 0
        for i in range(1, rank_max):
            if i in state_dict[key]:
                top_count += 1
                cur_rank = state_dict[key][i]
                total = len(cur_rank)
                print(f'Rank: {i} Total: {total} {cur_rank}')
            if top_count == total_ranks_to_display:
                break
    

    Example Output
    Max rank is 10 and the total ranks to display is 4

    Value of fourth column
    Missouri
    Rank: 1 Total: 1 ['search item 19']
    Rank: 2 Total: 3 ['search item 5', 'search item 13', 'search item 18']
    Rank: 3 Total: 2 ['search item 14', 'search item 20']
    Rank: 4 Total: 1 ['search item 22']
    Alabama
    Rank: 1 Total: 2 ['search item 26', 'search item 28']
    Rank: 2 Total: 2 ['search item 3', 'search item 12']
    Rank: 3 Total: 1 ['search item 6']
    Rank: 5 Total: 1 ['search item 15']
    Illinois
    Rank: 1 Total: 2 ['search item 11', 'search item 17']
    Rank: 2 Total: 1 ['search item 24']
    Rank: 3 Total: 1 ['search item 4']
    Rank: 6 Total: 1 ['search item 23']
    Colorado
    Rank: 2 Total: 1 ['search item 21']
    Rank: 3 Total: 3 ['search item 25', 'search item 27', 'search item 29']
    Rank: 4 Total: 1 ['search item 30']
    Rank: 6 Total: 2 ['search item 8', 'search item 16']
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search