# Lecture 2 | Word Vector Representations: word2vec

CS224N/Ling 284: Lecture 2 | Word Vectors and Word Senses" discusses the foundational concepts of natural language processing with deep learning, focusing on word vectors and the Word2vec model introduced by Google in 2013. The video also addresses the limitations of traditional taxonomic resources like WordNet in capturing word meanings and emphasizes the shift towards using continuous representations for words in modern NLP. The video discusses the limitations of one-hot vector encodings in natural language processing, which do not capture relationships between words. The concept of distributional similarity is introduced, where the meaning of a word is represented by looking at the contexts in which it appears. Word2vec is then presented as a model for learning neural word embeddings by predicting the context of a center word. The goal is to change word representations to minimize loss in predicting surrounding words. The video discusses the concept of word vectors and the word2vec model, which aims to learn low-dimensional vector representations of words based on their meaning. The word2vec model uses a skip-gram method to predict context words for a given center word, and seeks to maximize the probability of these predictions using vector representations of words. The video covers the objective function and the training process for word2vec, highlighting the significance of word vectors in natural language processing. The video discusses the skip-gram model, a method for learning word embeddings, which involves predicting the probability distribution of context words given a center word. It explains the use of word vectors and the Softmax function to calculate this probability distribution, and mentions that each word has two vector representations, one for when it is a center word and another for when it is a context word. The model is illustrated as a generative model that predicts the probability of each word appearing in the context given a center word, and the objective is to learn parameters, such as word vectors, to maximize the prediction performance. The video also addresses some questions and considerations related to the model. A Simple but Tough-to-beat Baseline for Sentence Embeddings" is a paper from Princeton that introduces a simple unsupervised method for sentence representation using weighted bag-of-words and removing special directions. The approach outperforms other sophisticated models in tasks such as sentence similarity and classification. The method involves computing the average of word vectors with separate weights for each word and then subtracting the projections onto the first principal component. The paper also provides a probabilistic interpretation for the approach and demonstrates its effectiveness in various tasks. The video discusses the chain rule and its application in neural networks, specifically in the context of backpropagation. It delves into the derivatives and calculations involved, emphasizing the use of the chain rule to efficiently compute derivatives and optimize parameters in the model. The concept of expectation and probability distribution is also explained in the context of the softmax function. Overall, the video provides a detailed walkthrough of the mathematical concepts involved in neural network optimization. In this lecture, word vector representations and similarity measures in natural language processing are discussed. The use of dot product as a similarity measure and the application of stochastic gradient descent for optimization in word vector learning are explained.