skip to Main Content

Hi I’m fairly new to Python and needed help with extracting strings from a list. I am using Python on Visual Studios.

I have hundreds of similar strings and I need to extract specific information so I can add it to a table in columns – the aim is to automate this task using python. I would like to extract the data between the headers ‘Names’, ‘Ages’ and ‘Jobs’. The issue I am facing is that the number of entries of names, ages and jobs varies a lot within all the lists and so I would like to write unique code which could apply to all the lists.

list_x = ['Names','Ashley','Lee','Poonam','Ages', '25', '35', '42' 'Jobs', 'Doctor', 'Teacher', 'Nurse']

I am struggling to extract

['Ashley', 'Lee', 'Poonam'] 

I have tried the following:

for x in list_x:
      if x == 'Names':
           for y in list_x:
                 if y == 'Ages':
                      print(list_x[x:y])

This however comes up with the following error:
"Exception has occurred: typeError X

slice indices must be integers or None or have an index method"

Is there a way of doing this without specifying exact indices?

2

Answers


  1. As the comment suggested editing the data is the easiest way to go, but if you have to…

    newList = oldList[oldList.index('Names') + 1:oldList.index("Ages")]
    

    It just finds the indices of "Names" and "Ages" in the list, and extracts the bit between.

    Lots can (and will) go wrong with this method though – if there’s a name which is "Names", or if they are misspelt, etc.

    Login or Signup to reply.
  2. For completeness sake, it might be not a bad idea to use an approach similar to the below.

    First, build a list of indices of each of the desired headers:

    list_x = ['Names', 'Ashley', 'Lee', 'Poonam', 'Ages', '25', '35', '42', 'Jobs', 'Doctor', 'Teacher', 'Nurse']
    headers = ('Names', 'Ages', 'Jobs')
    
    header_indices = [list_x.index(header) for header in headers]
    print('indices:', header_indices)  # [0, 4, 8]
    

    Then, create a list of values for each header, which we can infer from the positions where each header shows up in the list:

    values = {}
    for i in range(len(header_indices)):
        header = headers[i]
        start = header_indices[i] + 1
        try:
            values[header] = list_x[start:header_indices[i + 1]]
        except IndexError:
            values[header] = list_x[start:]
    

    And finally, we can display it for debugging purposes:

    print('values:', values)
    # {'Names': ['Ashley', 'Lee', 'Poonam'], 'Ages': ['25', '35', '42'], 'Jobs': ['Doctor', 'Teacher', 'Nurse']}
    
    assert values['Names'] == ['Ashley', 'Lee', 'Poonam']
    

    For better time complexity O(N), we can alternatively use an approach like below so that we only have one for loop over the list to build a dict object with the values:

    from collections import defaultdict
    
    values = defaultdict(list)
    header_idx = -1
    
    for x in list_x:
        if x in headers:
            header_idx += 1
        else:
            values[headers[header_idx]].append(x)
    
    print('values:', values)
    # defaultdict(<class 'list'>, {'Names': ['Ashley', 'Lee', 'Poonam'], 'Ages': ['25', '35', '42'], 'Jobs': ['Doctor', 'Teacher', 'Nurse']})
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search