摘要
神经网络概率语言模型是一种新兴的自然语言处理算法,该模型通过学习训练语料获得词向量和概率密度函数,词向量是多维实数向量,向量中包含了自然语言中的语义和语法关系,词向量之间余弦距离的大小代表了词语之间关系的远近,词向量的加减代数运算则是计算机在"遣词造句".近年来,神经网络概率语言模型发展迅速,Word2vec是最新技术理论的合集.首先,重点介绍Word2vec的核心架构CBOW及Skip-gram;接着,使用英文语料训练Word2vec模型,对比两种架构的异同;最后,探讨了Word2vec模型在中文语料处理中的应用.
Word2vec is a combination of neural probabilistic language model,which includes CBOW model and Skipgram model in terms of architecture. This paper will introduce the technology of Word2 vec. Firstly,the paper will elaborate the theory of Word2 vec architecture; secondly,an English corpus which is extracted from Wikipedia will be used to train the model,and a set of results will be shown; lastly,the application of Word2 vec in the language of Chinese will be explored,a result will also be presented precisely.
出处
《南京师范大学学报(工程技术版)》
CAS
2015年第1期43-48,共6页
Journal of Nanjing Normal University(Engineering and Technology Edition)