Below code is from Using word2vec to classify words in categories and I need some help on input and return saveing. Any help would be greatly appreciated.
# Category -> words
data = {
'Names': ['john','jay','dan','nathan','bob'],
'Colors': ['yellow', 'red','green', 'oragne', 'purple'],
'Places': ['tokyo','bejing','washington','mumbai'],
}
# Words -> category
categories = {word: key for key, words in data.items() for word in words}
# Load the whole embedding matrix
embeddings_index = {}
with open('glove.6B.100d.txt', encoding='utf-8') as f:
for line in f:
values = line.split()
word = values[0]
embed = np.array(values[1:], dtype=np.float32)
embeddings_index[word] = embed
print('Loaded %s word vectors.' % len(embeddings_index))
# Embeddings for available words
data_embeddings = {key: value for key, value in embeddings_index.items() if key in categories.keys()}
# Processing the query
def process(query):
query_embed = embeddings_index[query]
scores = {}
for word, embed in data_embeddings.items():
category = categories[word]
dist = query_embed.dot(embed)
dist /= len(data[category])
scores[category] = scores.get(category, 0) + dist
return scores
# Testing
print(process('jonny'))
print(process('green'))
print(process('park'))
And the return looks like:
Loaded 400000 word vectors.
{'Names': 7.965438079833984, 'Places': -0.3282392770051956, 'Colors': 1.803783965110779}
{'Names': 11.360316085815429, 'Places': 3.536876901984215, 'Colors': 21.82199630737305}
{'Names': 10.234728145599364, 'Places': 8.739515662193298, 'Colors': 10.761297225952148}
Below are the changes I want to make to this scrip but keep failing 🙁 Please help.
Question 1: The order or category (data) is Names, Colors, and Places. But why does the retun has Name, Place, Color order instead? This is not important but was wondering why.
Question 2: Instead of using print(process(‘jonny’)), how can I input list of text from text file?
Question 3: Lets suppose name of input text file is TEST.txt. How can I save the return in TEST.JSON or TEST.csv file? Basically input and output as same name.
Thank yo so much!
2
Answers
Thanks a lot, @Driftr95
The below code allows to input of multiple text files and then saving the return in individual json files.
It’s probably because of how the contents of ‘glove.6B.100d.txt’ are ordered/arranged.
Assuming ‘TEST.txt’ has an input in each line like
Then you could read them into a list of strings to loop through and apply
process
to:To save as CSV, you could use pandas
.to_csv
(view examples)and to save as JSON, you can use
json.dump
(view examples: op1, op2)Added EDIT:
[Using
f'{inpf[:-4]}.json'
assumes all file names ininpFiles
end with ‘.txt’]