期刊文献+

一种多义词词向量计算方法 被引量:7

Polysemous Word Multi-embedding Calculation
下载PDF
导出
摘要 语义相似度计算在自然语言处理领域有着非常重要的作用,近年来随着深度学习技术的兴起,利用词向量的进行语义相似度计算的技术得到广泛应用.人们提出了许多计算词向量的模型和方法,但这些模型中一个词仅对应一个词向量,而自然语言中存在着大量的多义词,因此这些模型不能很好的表示多义词语义特征.本文提出一种多义词词向量计算方法,引入主题模型对多义词进行语义标注,将标注后的词语视为新词进行词向量计算,可得到一个多义词的多个词向量.在中英文两种语料上进行了实验,实验结果表明,该方法能准确计算出多义词不同含义的词向量,语义相似度计算的准确性明显提高. Semantic similarity calculation plays a very important role in the area of natural language processing. In recent years,with the development of Deep Learning,the technology that using the word embedding to compute the semantic similarity has been widely used. At the same time,a lot of models that computing word embedding have been proposed,and these models correspond one word to a single word embedding. But there are many polysemous words in natural language processing,so these models cannot capture the characteristics of those words properly. We propose a polysemous word embedding calculation model that combines topic model and normal word embedding calculation model. First,we use topic model to do semantic annotation on the corpus,then we regard the annotation words as a newword and proceed normal word embedding calculation method on the corpus,finally we get multi word embedding for a polysemous word. We conduct our experiment on both Chinese and English corpus,the results of our experiment showthat our model can get multi word embedding for polysemous words and the semantic similarity calculation accuracy has been improved significantly.
出处 《小型微型计算机系统》 CSCD 北大核心 2016年第7期1417-1421,共5页 Journal of Chinese Computer Systems
基金 数学工程与先进计算国家重点实验室开放基金面上项目(2013A02)资助
关键词 词向量 多义词 主题模型 语义相似度 word embedding polysemous words topic model semantic similarity
  • 相关文献

参考文献14

  • 1Jiang J J, Courath D W. Semantic similarity based on corpus statistics and lexical taxonomy [ J]. ArXiv Preprint Cmp-lg/9709008,1997.
  • 2Seco N, Veale T, Hayes J. An intrinsic information content metric for semantic similarity in WordNet[ C ]. 16th European Conference on Artificial Intelligence (ECAI) ,2004,16 : 1089.
  • 3Collobert R, Weston J, Bottou L, et al. Natural language processing (almost) from scratch [ J ]. The Journal of Machine Learning Re- search ,2011,12:2493-2537.
  • 4Tomas Mikolov, Kai Chen, Greg Corrado, et al. Efficient estimation of word representations in vector space [ C ]. Proceedings of Work- shop at International Conference on Learning Representations (ICLR) ,2013.
  • 5Pennington J, Socher R, Manning C D. Glove:global vectors for word representation [ J ]. Proceedings of the Empiricial Methods in Natural Language Processing ( EMNLP 2014 ), 2014, 12 : 1532- 1543.
  • 6Huang E H, Socher R, Manning C D, et al. Improving word repre- sentations via global context and multiple word prototypes[ C]. The 50th Annual Meeting of the Association for Computational Linguis- tics:Long Papers-Volume 1, Association for Computational Lin- guistics, 2012 : 873 -882.
  • 7Tian F,Dal H,Bian J,et al. A probabilistic model for learning multi- prototype word embeddings [ C ]. The 25th International Conference on Computational Linguistics (COLING) ,2014:151-160.
  • 8Blei D M,Ng A Y,Jordan M I. Latent dirichlet allocation[ J]. The Journal of Machine Learning Research ,2003,3:993-1022.
  • 9Dnmais S, Fumas G, Landauer T, et al. Latent semantic indexing [ C]. Proceedings of the Text Retrieval Conference, 1995.
  • 10Hofmann T. Probabilistic latent semantic indexing [ C ]. The 22nd Annual International ACM SIGIR Conference on Research and De- velopment in Information Retrieval, ACM, 1999:50-57.

同被引文献24

引证文献7

二级引证文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部