I am working on extracting hash tags from data I have in json files that are of list type. This works for some of my files but for others that contain a ‘dict’ in the list it fails. Is there anyway I can modify my code to accommodate for this? I have included an example where it works and an example where it doesn’t.
file_name = 'twitter1.json'
with open(file_path + file_name) as json_file:
data = json.load(json_file)
data
['http://b8nicktof280.com/skoex/po2.php?l=deof',
'http://dwillow100bc.com/skoex/po2.php?l=deof',
'#ursnif', '#malspam']
type(data)
list
#Extract the tags for use in api post assignment
tags = [tag for tag in data if tag.startswith('#')]
tags
['#ursnif','#malspam']
This extracts the tags with no problem.
But for the next example the data type is a list as well, but has {} in it causing an error: AttributeError: 'dict' object has no attribute 'startswith'
file_name = 'twitter2.json'
with open(file_path + file_name) as json_file:
data = json.load(json_file)
data
['t.co', '', '103.126.6.93', '#twitter', {'Address': '103.126.6.93'}]
type(data)
list
#Extract the tags for use in api post assignment
tags = [tag for tag in data if tag.startswith('#')]
AttributeError: 'dict' object has no attribute 'startswith'
2
Answers
The simplest solution is to ignore any item in
data
that isn’t a string:Check for the datatype of tag in the last list comprehension and append it accordingly.
Then use startswith() in tags list:
For tags I assume a single value in dictionary, if that isn’t the case, we can make a string after joining all dict.values()