Monday, February 7, 2022

Vectors

 

Word2vec is an integral part of python nltk. The underlying idea is clear enough:

one uses a numerical representation for words.

At its simplest, one-hot representation assigns a 1 to our word and 0 to all

others. Elegant but impossible to scale.


So we will look to define a word by the most common contextual words found with it.


Yes, a neural network is used.

                                       

Interestingly, one is then free to perform vector math on our words. using the

distance between man and king in our (training)corpus, one then looks at this same distance

from 'woman'. Any guesses!?

*     *     *

Small issues can soemtimes cause a lot of problems. The lecture code was impossible

to run until I changed vectors.py  file-opening line  to this:


Below, the vector code for 'city'


And 'banana':


 


No comments: