for context, I’m looking at a dataset of data scientist job titles and job descriptions and I’m trying to identify how much is each degree level is cited in those job description.
I was able to get the code to work on one particular job description, but now I need to do a “for loop” or equivalent to iterate through the ‘description column’ and count cumulatively the amount of times each level of education was cited.
sentence = set(data_scientist_filtered.description.iloc[30].split())
degree_level = {'level_1':{'bachelors','bachelor','ba'},
'level_2':{'masters','ms','m.s',"master's",'master of science'},
'level_3':{'phd','p.h.d'}}
results = {}
for key, words in degree_level.items():
results[key] = len(words.intersection(sentence))
results
Sample string would be something like this:
data_scientist_filtered.description.iloc[30]=
'the team: the data science team is a newly formed applied research team within s&p global ratings that will be responsible for building and executing a bold vision around using machine learning, natural language processing, data science, knowledge engineering, and human computer interfaces for augmenting various business processes.nnthe impact: this role will have a significant impact on the success of our data science projects ranging from choosing which projects should be undertaken, to delivering highest quality solution, ultimately enabling our business processes and products with ai and data science solutions.nnwhat’s in it for you: this is a high visibility team with an opportunity to make a very meaningful impact on the future direction of the company. you will work with senior leaders in the organization to help define, build, and transform our business. you will work closely with other senior scientists to create state of the art augmented intelligence, data science and machine learning solutions.nnresponsibilities: as a data scientist you will be responsible for building ai and data science models. you will need to rapidly prototype various algorithmic implementations and test their efficacy using appropriate experimental design and hypothesis validation.nnbasic qualifications: bs in computer science, computational linguistics, artificial intelligence, statistics, or related field with 5+ years of relevant industry experience.nnpreferred qualifications:nms in computer science, statistics, computational linguistics, artificial intelligence or related field with 3+ years of relevant industry experience.nexperience with financial data sets, or s&p’s credit ratings process is highly preferred.
Sample dataframe:
position company description location
data scientist Xpert Staffing this job is for.. Atlanta, GA
data scientist Cotiviti great opportunity of.. Atlanta, GA
2
Answers
I’d suggest using the isin() method here, then getting the sum.
Edit
The for loop can be replaced by a comprehension, just FYI.
Edit 2
With you showing what the df looks like, now I see what the issue is.
You need to filter() the
df
then get thecount()
.Something like that should work