期刊文献+

基于LF-LDA和Word2vec的文本表示模型研究 被引量:4

Text Representation Model Based on LF-LDA and Word2vec
原文传递
导出
摘要 LDA(Latent Dirichlet Allocation)在训练的过程中没有结合词向量训练,而LF-LDA(Latent FeatureLDA)在训练过程中利用Word2vec词向量改善了文档的主题分布。但是,文档用主题分布进行表示,没有结合特征词的上下文信息。为此,本文提出利用LF-LDA生成的主题向量结合Word2vec词向量,对文本进行表示。另外,文章还提出了利用LF-LDA生成的主题向量对文档进行表示。在Stack Overflow短文本数据集上的分类结果表明,LF-LDA结合Word2vec的文本表示优于LDA结合Word2vec的文本表示和LF-LDA主题分布的文本表示。基于主题向量的文本表示模型优于LDA模型。 LDA(Latent Dirichlet Allocation) does not consider word vector in training process while LF-LDA(Latent Feature-LDA) uses Word2 vec to improve the distribution of topics. However, document represented by the distribution of topics, which is not combined with context information of feature words. Thus, we propose LF-LDA combined with Word2 vec, which utilize topic vector generated by LF-LDA and word vector generated by Word2 vec to represent text. In addition, we also propose text representation which adopts topic vector generated by LF-LDA. Experimental result on data set of Stack Overflow show that LF-LDA combined with Word2vec's text representation is superior to LDA combined with Word2vec's text representation and LF-LDA's text representation. Text representation model based on topic vector is superior to LDA model.
作者 陈磊 李俊
出处 《电子技术(上海)》 2017年第7期1-5,共5页 Electronic Technology
关键词 文本表示 LDA Word2vec LF-LDA 文本分类 Text Representation LDA Word2vec LF-LDA Text Categorization
  • 相关文献

参考文献1

二级参考文献28

  • 1Baeza-Yates R,Ribeiro-Neto B.Modern Information Retrieval[M].New York:ACM press,1999.
  • 2Manning C D,Schütze H.Foundations of Statistical NaturalLanguage Processing [M].Cambridge:MIT press,1999.
  • 3Hwang M,Choi C,Youn B,et al.Word Sense Disambiguation Based on Relation Structure[C]∥International Conference on Advanced Language Processing and Web Information Technology.2008:15-20.
  • 4Wang X,Mccallum A,Wei X.Topical N-Grams:Phrase andTopic Discovery,with an Application to Information Retrieval [C]∥IEEE International Conference on Data Mining.IEEE Computer Society,2007:697-702.
  • 5Haruechaiyasak C,Jitkrittum W,Sangkeettrakarn C,et al.Im-plementing News Article Category Browsing Based on Text Categorization Technique [C]∥2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.IEEE Computer Society,2008:143-146.
  • 6Mikolov T,Sutskever I,Chen K,et al.Distributed Representations of Words and Phrases and their Compositionality [J].Advances in Neural Information Processing Systems,2013,26:3111-3119.
  • 7Mikolov T,Chen K,Corrado G,et al.Efficient Estimation of Word Representations in Vector Space [C]∥ICLR 2013.2013.
  • 8Joachims T.A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization [M].Springer US,1997:143-151.
  • 9Hinton G E.Learning distributed representations of concepts[C]∥Proceedings of CogSci.1986:1-12.
  • 10Socher R,Bauer J,Manning C D,et al.Parsing with Compositional Vector Grammars [C]∥Meeting of the Association for Computational Linguistics.2013:455-465.

共引文献144

同被引文献28

引证文献4

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部