期刊文献+

结合LDA与Word2vec的文本语义增强方法 被引量:22

Text Semantic Enhancement Method Combining LDA and Word2vec
下载PDF
导出
摘要 文本的语义表示是自然语言处理和机器学习领域的研究难点,针对目前文本表示中的语义缺失问题,基于LDA主题模型和Word2vec模型,提出一种新的文本语义增强方法Sem2vec(semantic to vector)模型。该模型利用LDA主题模型获得单词的主题分布,计算单词与其上下文词的主题相似度,作为主题语义信息融入到词向量中,代替one-hot向量输入至Sem2vec模型,在最大化对数似然目标函数约束下,训练Sem2vec模型的最优参数,最终输出增强的语义词向量表示,并进一步得到文本的语义增强表示。在不同数据集上的实验结果表明,相比其他经典模型,Sem2vec模型的语义词向量之间的语义相似度计算更为准确。另外,根据Sem2vec模型得到的文本语义向量,在多种文本分类算法上的分类结果,较其他经典模型可以提升0.58%~3.5%,同时也提升了时间性能。 Text semantic representation is one of the most difficulty problems in natural language processing and machine learning.To solve the problem of semantic loss in text representation,this paper proposes a new text semantic representation method named Sem2vec(semantic to vector)model which is based on the LDA topic model and the Word2vec model.The topic similarity is calculated according to the word topic distribution obtained by the LDA model.Then the topic semantic word vectors are inputted into the Sem2vec model instead of the one-hot vector.Constrained by maximizing log-likelihood objective function,the parameters of the Sem2vec model are optimized.Finally,the semantic word vectors are learned by the Sem2vec model and the semantic representation of the text is further obtained.The experimental results on different datasets show that compared with the other classic models,the Sem2vec model is more accurate in calculating semantic similarity between words.Moreover,in different classification algorithms,the text semantic vectors generated by the Sem2vec model can improve the text classification results by 0.58%~3.5%and promote the time performance compared with the other classic models.
作者 唐焕玲 卫红敏 王育林 朱辉 窦全胜 TANG Huanling;WEI Hongmin;WANG Yulin;ZHU Hui;DOU Quansheng(School of Computer Science and Technology,Shandong Technology and Business University,Yantai,Shandong 264005,China;Co-innovation Center of Shandong Colleges and Universities:Future Intelligent Computing,Yantai,Shandong 264005,China;Key Laboratory of Intelligent Information Processing in Universities of Shandong(Shandong Technology and Business University),Yantai,Shandong 264005,China;School of Information and Electronic Engineering,Shandong Technology and Business University,Yantai,Shandong 264005,China;Shanghai Conversation Intelligence Co.Ltd.,Shanghai 200120,China)
出处 《计算机工程与应用》 CSCD 北大核心 2022年第13期135-145,共11页 Computer Engineering and Applications
基金 国家自然科学基金(61976124,61976125,62176140,61873177,61972235,82001775)。
关键词 LDA主题模型 Word2vec模型 语义词向量 语义相似度 文本分类 LDA topic model Word2vec model semantic word vector semantic similarity text categorization
  • 相关文献

参考文献8

二级参考文献54

  • 1罗枭.基于深度学习的自然语言处理研究综述[J].智能计算机与应用,2020(4):133-137. 被引量:13
  • 2汪劲,耿立大.机器翻译及其基本概念和常用方法[J].情报科学,1988,9(2):31-38. 被引量:2
  • 3唐焕玲,孙建涛,陆玉昌.文本分类中结合评估函数的TEF-WA权值调整技术[J].计算机研究与发展,2005,42(1):47-53. 被引量:26
  • 4董振东,董强,郝长伶.知网的理论发现[J].中文信息学报,2007,21(4):3-9. 被引量:99
  • 5Seeger M.Leaming with labeled and unlabeled data[ R]. University of Edinburgh, Edinburgh, UK 2001.
  • 6Blum A, Mitchell T. Combining labeled and unlabeled data with co-training[ A]. In Proceedings of the Workshop on Computational Learning Theory[ C] .New York: ACM Press, 1998.92- 100.
  • 7Nigam K, Ghani R. Analyzing the effectiveness and applicability of co-training[ A] . In Proceedings of ninth International Conference on Information and Knowledge Management[ C ]. New York: ACM Press, 2000.86 - 93.
  • 8Balcan M-F,Blum A.A PAC-style model for learning from labeled and unlabeled data[A]. In Proceedings of the 18th Annual Conference on Learning Theory [ C ]. Berlin Heidelberg: Springer-Verlag, 2005.111 - 126.
  • 9Zhou Y, Goldman S. Democratic co-learning [ A ]. In Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence [ C ]. Washington, DC: IEEE Computer Society Press, 2004. 594 - 602.
  • 10Zhou Z-H, Li M. Tri-training: exploiting unlabeled data using three classifiers[ J ]. IEEE, Transactions on Knowledge and Data Engineering,2005,17(11 ):1529 - 1541.

共引文献93

同被引文献326

引证文献22

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部