期刊文献+

一种基于词语多原型向量表示的句子相似度计算方法 被引量:4

A sentence similarity computation method based on word multi-prototype vector representation
下载PDF
导出
摘要 针对词语向量化表示的问题,根据词语词向量表示的思想以及借助多义词词典,在K-means聚类多义词语上下文表示的基础上,获得词语的多原型向量表示。对句子中的多义词语,通过计算词语多原型向量表示与词语上下文表示的相似度来进行词义消歧,根据2个句子集中共有词语和差异词语的词义相似度,给出一种基于词语多原型向量表示的句子相似度计算方法,实验结果显示了该方法的有效性。 In view of vectorized representation of word,according to the idea of Word Embedding as well as the use of external polysemy dictionary,on the basis of polysemous words context representation based on K-means clustering algorithm, the paper presents a method for obtaining a word ' s multi-prototype vector representation. Word sense disambiguation is performed on polysemous words in sentences by calculating the similarity between the word multi-prototype vector representation and the words context representation. According to the semantic similarity of the common words and the difference words in the two sentence sets,a sentence similarity computation method based on multi-prototype vector representation is given. The experimental results show the effectiveness of the method.
作者 郭鸿奇 李国佳 GUO Hongqi;LI Guojia(School of Electric Power, North China University of Water Resources and Electric Power, Zhengzhou 450045, China;School of Software, North China University of Water Resources and Electric Power, Zhengzhou 450045, China)
出处 《智能计算机与应用》 2018年第2期38-42,共5页 Intelligent Computer and Applications
基金 华北水利水电大学2017年创新创业计划项目(2017XB136)
关键词 词语多原型向量表示 词义消歧 句子相似度 multi-prototype vector representation word sense disambiguation sentence similarity
  • 相关文献

参考文献1

二级参考文献14

  • 1Jiang J J, Courath D W. Semantic similarity based on corpus statistics and lexical taxonomy [ J]. ArXiv Preprint Cmp-lg/9709008,1997.
  • 2Seco N, Veale T, Hayes J. An intrinsic information content metric for semantic similarity in WordNet[ C ]. 16th European Conference on Artificial Intelligence (ECAI) ,2004,16 : 1089.
  • 3Collobert R, Weston J, Bottou L, et al. Natural language processing (almost) from scratch [ J ]. The Journal of Machine Learning Re- search ,2011,12:2493-2537.
  • 4Tomas Mikolov, Kai Chen, Greg Corrado, et al. Efficient estimation of word representations in vector space [ C ]. Proceedings of Work- shop at International Conference on Learning Representations (ICLR) ,2013.
  • 5Pennington J, Socher R, Manning C D. Glove:global vectors for word representation [ J ]. Proceedings of the Empiricial Methods in Natural Language Processing ( EMNLP 2014 ), 2014, 12 : 1532- 1543.
  • 6Huang E H, Socher R, Manning C D, et al. Improving word repre- sentations via global context and multiple word prototypes[ C]. The 50th Annual Meeting of the Association for Computational Linguis- tics:Long Papers-Volume 1, Association for Computational Lin- guistics, 2012 : 873 -882.
  • 7Tian F,Dal H,Bian J,et al. A probabilistic model for learning multi- prototype word embeddings [ C ]. The 25th International Conference on Computational Linguistics (COLING) ,2014:151-160.
  • 8Blei D M,Ng A Y,Jordan M I. Latent dirichlet allocation[ J]. The Journal of Machine Learning Research ,2003,3:993-1022.
  • 9Dnmais S, Fumas G, Landauer T, et al. Latent semantic indexing [ C]. Proceedings of the Text Retrieval Conference, 1995.
  • 10Hofmann T. Probabilistic latent semantic indexing [ C ]. The 22nd Annual International ACM SIGIR Conference on Research and De- velopment in Information Retrieval, ACM, 1999:50-57.

共引文献6

同被引文献31

引证文献4

二级引证文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部