skip to Main Content

I am working on extracting hash tags from data I have in json files that are of list type. This works for some of my files but for others that contain a ‘dict’ in the list it fails. Is there anyway I can modify my code to accommodate for this? I have included an example where it works and an example where it doesn’t.

file_name = 'twitter1.json'
with open(file_path + file_name) as json_file:
    data = json.load(json_file)
data
['http://b8nicktof280.com/skoex/po2.php?l=deof', 
'http://dwillow100bc.com/skoex/po2.php?l=deof',
'#ursnif', '#malspam']

type(data)
list

#Extract the tags for use in api post assignment
tags = [tag for tag in data if tag.startswith('#')]
tags
['#ursnif','#malspam']

This extracts the tags with no problem.

But for the next example the data type is a list as well, but has {} in it causing an error: AttributeError: 'dict' object has no attribute 'startswith'

file_name = 'twitter2.json'
with open(file_path + file_name) as json_file:
    data = json.load(json_file)
data
['t.co', '', '103.126.6.93', '#twitter', {'Address': '103.126.6.93'}]

type(data)
list

#Extract the tags for use in api post assignment
tags = [tag for tag in data if tag.startswith('#')]
AttributeError: 'dict' object has no attribute 'startswith'

2

Answers


  1. The simplest solution is to ignore any item in data that isn’t a string:

    tags = [tag for tag in data if isinstance(tag, str) and tag.startswith('#')]
    
    Login or Signup to reply.
  2. Check for the datatype of tag in the last list comprehension and append it accordingly.

    tags = [tag if isinstance(tag, list) else list(tag.values())[0] for tag in data]    
    

    Then use startswith() in tags list:

    li = [tag for tag in tags if tag.startswith(‘#’)].   
    

    For tags I assume a single value in dictionary, if that isn’t the case, we can make a string after joining all dict.values()

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search