Word2vec is an integral part of python nltk. The underlying idea is clear enough:
one uses a numerical representation for words.
At its simplest, one-hot representation assigns a 1 to our word and 0 to all
others. Elegant but impossible to scale.
So we will look to define a word by the most common contextual words found with it.
Interestingly, one is then free to perform vector math on our words. using the
distance between man and king in our (training)corpus, one then looks at this same distance
from 'woman'. Any guesses!?
* * *
Small issues can soemtimes cause a lot of problems. The lecture code was impossible
to run until I changed vectors.py file-opening line to this:
Below, the vector code for 'city'
And 'banana':
No comments:
Post a Comment