一种多义词词向量计算方法被引量：7

Polysemous Word Multi-embedding Calculation

下载PDF

导出

摘要语义相似度计算在自然语言处理领域有着非常重要的作用,近年来随着深度学习技术的兴起,利用词向量的进行语义相似度计算的技术得到广泛应用.人们提出了许多计算词向量的模型和方法,但这些模型中一个词仅对应一个词向量,而自然语言中存在着大量的多义词,因此这些模型不能很好的表示多义词语义特征.本文提出一种多义词词向量计算方法,引入主题模型对多义词进行语义标注,将标注后的词语视为新词进行词向量计算,可得到一个多义词的多个词向量.在中英文两种语料上进行了实验,实验结果表明,该方法能准确计算出多义词不同含义的词向量,语义相似度计算的准确性明显提高. Semantic similarity calculation plays a very important role in the area of natural language processing. In recent years,with the development of Deep Learning,the technology that using the word embedding to compute the semantic similarity has been widely used. At the same time,a lot of models that computing word embedding have been proposed,and these models correspond one word to a single word embedding. But there are many polysemous words in natural language processing,so these models cannot capture the characteristics of those words properly. We propose a polysemous word embedding calculation model that combines topic model and normal word embedding calculation model. First,we use topic model to do semantic annotation on the corpus,then we regard the annotation words as a newword and proceed normal word embedding calculation method on the corpus,finally we get multi word embedding for a polysemous word. We conduct our experiment on both Chinese and English corpus,the results of our experiment showthat our model can get multi word embedding for polysemous words and the semantic similarity calculation accuracy has been improved significantly.

作者曾琦周刚兰明敬王濛

机构地区信息工程大学网络空间安全学院数学工程与先进计算国家重点实验室

出处《小型微型计算机系统》 CSCD 北大核心 2016年第7期1417-1421,共5页 Journal of Chinese Computer Systems

基金数学工程与先进计算国家重点实验室开放基金面上项目(2013A02)资助

关键词词向量多义词主题模型语义相似度 word embedding polysemous words topic model semantic similarity

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献14

1Jiang J J, Courath D W. Semantic similarity based on corpus statistics and lexical taxonomy [ J]. ArXiv Preprint Cmp-lg/9709008,1997.
2Seco N, Veale T, Hayes J. An intrinsic information content metric for semantic similarity in WordNet[ C ]. 16th European Conference on Artificial Intelligence (ECAI) ,2004,16 : 1089.
3Collobert R, Weston J, Bottou L, et al. Natural language processing (almost) from scratch [ J ]. The Journal of Machine Learning Re- search ,2011,12:2493-2537.
4Tomas Mikolov, Kai Chen, Greg Corrado, et al. Efficient estimation of word representations in vector space [ C ]. Proceedings of Work- shop at International Conference on Learning Representations (ICLR) ,2013.
5Pennington J, Socher R, Manning C D. Glove:global vectors for word representation [ J ]. Proceedings of the Empiricial Methods in Natural Language Processing ( EMNLP 2014 ), 2014, 12 : 1532- 1543.
6Huang E H, Socher R, Manning C D, et al. Improving word repre- sentations via global context and multiple word prototypes[ C]. The 50th Annual Meeting of the Association for Computational Linguis- tics:Long Papers-Volume 1, Association for Computational Lin- guistics, 2012 : 873 -882.
7Tian F,Dal H,Bian J,et al. A probabilistic model for learning multi- prototype word embeddings [ C ]. The 25th International Conference on Computational Linguistics (COLING) ,2014:151-160.
8Blei D M,Ng A Y,Jordan M I. Latent dirichlet allocation[ J]. The Journal of Machine Learning Research ,2003,3:993-1022.
9Dnmais S, Fumas G, Landauer T, et al. Latent semantic indexing [ C]. Proceedings of the Text Retrieval Conference, 1995.
10Hofmann T. Probabilistic latent semantic indexing [ C ]. The 22nd Annual International ACM SIGIR Conference on Research and De- velopment in Information Retrieval, ACM, 1999:50-57.

同被引文献24

1曹恬,周丽,张国煊.一种基于词共现的文本相似度计算[J].计算机工程与科学,2007,29(3):52-53. 被引量：14
2吴奎,周献中,王建宇,赵佳宝.基于贝叶斯估计的概念语义相似度算法[J].中文信息学报,2010,24(2):52-57. 被引量：12
3李俊林,符红光.改进的基于核密度估计的数据分类算法[J].控制与决策,2010,25(4):507-514. 被引量：9
4魏韡,向阳,陈千.计算术语间语义相似度的混合方法[J].计算机应用,2010,30(6):1668-1670. 被引量：7
5潘谦红,王炬,史忠植.基于属性论的文本相似度计算[J].计算机学报,1999,22(6):651-655. 被引量：63
6谷琼,袁磊,熊启军,宁彬,李文新.基于非均衡数据集的代价敏感学习算法比较研究[J].微电子学与计算机,2011,28(8):146-149. 被引量：30
7李青,陈阳,谢浩然,蒙圣光.一种基于文本相似度矩阵运算的非结构化海量投诉数据分类算法[J].计算机工程与科学,2012,34(1):103-107. 被引量：5
8吴微,彭华,张帆.FastICA和RobustICA算法在盲源分离中的性能分析[J].计算机应用研究,2014,31(1):95-98. 被引量：18
9朱敏,贾真,左玲,吴安峻,陈方正,柏玉.中文微博实体链接研究[J].北京大学学报（自然科学版）,2014,50(1):73-78. 被引量：12
10尹坤,尹红风,杨燕,贾真.基于SimRank的百度百科词条语义相似度计算[J].山东大学学报（工学版）,2014,44(3):29-35. 被引量：10

引证文献7

1马晓军,郭剑毅,王红斌,张志坤,线岩团,余正涛.融合词向量和主题模型的领域实体消歧[J].模式识别与人工智能,2017,30(12):1130-1137. 被引量：8
2孙晶涛,张秋余.不均衡大数据集下的文本特征基因提取方法[J].电子科技大学学报,2018,47(1):125-131. 被引量：7
3郭鸿奇,李国佳.一种基于词语多原型向量表示的句子相似度计算方法[J].智能计算机与应用,2018,8(2):38-42. 被引量：4
4王瑞,李弼程,杜文倩.基于上下文词向量和主题模型的实体消歧方法[J].中文信息学报,2019,33(11):46-56. 被引量：12
5王云云,张云华.基于Multi-TWE模型的短文本分类研究[J].智能计算机与应用,2020,10(3):58-62. 被引量：1
6蒋胜臣,王红斌,余正涛,线岩团,王红涛.基于关系指数和表示学习的领域集成实体链接[J].自动化学报,2021,47(10):2376-2385. 被引量：1
7阮怀伟,胡松华,陈艳平,邹乐.基于本体的兴趣模型和语义相似度计算方法的研究[J].电脑知识与技术,2018,0(7Z):183-185. 被引量：1

二级引证文献33

1姜丽婷,古丽拉·阿东别克,马雅静.基于混合卷积网络的短文本实体消歧[J].中文信息学报,2021,35(11):101-108. 被引量：3
2徐智威.基于《知网》的词语相似度计算算法研究[J].包装世界,2018,0(4):101-102.
3李国佳,赵莹地,郭鸿奇.一种基于多义词向量表示的词义消歧方法[J].智能计算机与应用,2018,8(4):52-56. 被引量：4
4唐善成,马付玉,张镤月,陈熊熊.采用Seq2Seq模型的非受限词义消歧方法[J].西北大学学报（自然科学版）,2019,49(3):351-355. 被引量：5
5杨肖楠,花季伟.互联网中非法文本特征自适应提取仿真研究[J].计算机仿真,2019,36(6):434-437. 被引量：1
6王瑞,李弼程,杜文倩.基于上下文词向量和主题模型的实体消歧方法[J].中文信息学报,2019,33(11):46-56. 被引量：12
7王文婷,井俊双,张昊.基于集对分析的网络流量大数据均衡调度方法[J].自动化与仪器仪表,2020,0(1):66-69. 被引量：2
8张玉霖.基于主成分分析的网络时延特征数据提取仿真[J].计算机仿真,2020,37(3):301-304. 被引量：7
9王岩.大数据中心存储信息分层分类优化提取仿真[J].计算机仿真,2020,37(4):406-409. 被引量：3
10国帅,司海平.基于本体的农村信息化服务平台用户模型[J].江苏农业科学,2020,48(13):251-256.

1口语中的“热”词HOT[J].中学英语之友（新教材初二版）,2011(8):27-27.
2每个手指戴戒指的不同含义[J].河南科技（乡村版）,2009(9):43-43.
3英国科学家发明智能电视能“听懂”语音指令[J].创新科技,2011(8):31-31.
4匈牙利科学家开发出狗吠翻译机[J].奇闻怪事,2008(4):49-49.
5“一年质保”的不同含义解读多彩键鼠的全新售后服务[J].微型计算机,2011(9):111-111.
6肖伟跃.模糊规则中的不确定性推理研究[J].应用科学学报,2002,20(1):94-98. 被引量：8
7郭丽伟.图像分割算法的探究[J].沈阳师范大学学报（自然科学版）,2013,31(3):417-420. 被引量：1
8庄梓新.宽带对推动我国经济社会发展具有重要战略意义[J].办公自动化（综合月刊）,2009(7):4-7.
9陈兴无.《C语言程序设计》教学中的几个疑难问题的探讨[J].恩施职业技术学院学报（综合版）,2006,0(4):81-84. 被引量：2
10黄剑,戴丽华.茶的多维意义及中英茶文化比较[J].农业考古,2013(2):310-312. 被引量：20

小型微型计算机系统

2016年第7期

浏览历史

内容加载中请稍等...

一种多义词词向量计算方法被引量：7

参考文献14

同被引文献24

引证文献7

二级引证文献33

相关作者

相关机构

相关主题

浏览历史

一种多义词词向量计算方法 被引量：7

参考文献14

同被引文献24

引证文献7

二级引证文献33

相关作者

相关机构

相关主题

浏览历史

一种多义词词向量计算方法被引量：7