期刊文献+

基于贝叶斯网的短文本特征扩展方法 被引量:2

Short Text Feature Extension Method Based on Bayesian Networks
下载PDF
导出
摘要 针对短文本特征词稀疏、表示能力不足等问题,提出了一种基于贝叶斯网的短文本特征扩展方法。该方法根据短文本中特征词之间的依赖关系构建语义贝叶斯网,定义特征词与短文本之间的关联度。基于贝叶斯网的推理计算关联度,将与短文本关联密切的特征词扩展到短文本中,以达到降低短文本的噪声、改善特征稀疏的目的。在此基础上,以短文本分类作为基本的文本分析任务,分析所提方法的可行性和有效性。在Amazon评论数据集上进行实验,结果表明所提方法是可行和有效的。 Aiming at the problems of feature sparsity and insuffcient representation ability in short text,this paper proposed a feature extension method based on Bayesian networks.Firstly,the semantic Bayesian network is constructed by defining the dependencies between the feature words in the short texts.Then,the correlation degree is defined between the feature word and the short text,and the feature words closely related to the short text are selected.These words are further extended to the short text to reduce the noise and sparsity of short texts.Finally,this paper analyzed the feasibility and effectiveness of the proposed method with the short text classification as the basic task of text analysis.The experimental results on the Amazon product dataset show that the proposed method is feasible and effective.
作者 刘慧清 郭延哺 李红灵 李维华 LIU Hui-qing;GUO Yan-bu;LI Hong-ling;LI Wei-hua(School of Information,Yunnan University,Kunming 650500,China)
出处 《计算机科学》 CSCD 北大核心 2019年第S11期66-71,共6页 Computer Science
基金 云南省应用基础研究计划重点项目(2016FA026) 国家自然科学基金项目(61762090) 云南大学研究生科研创新基金项目(2018226)资助
关键词 文本分析 短文本 特征扩展 贝叶斯网 Text analysis Short text Feature extension Bayesian network
  • 相关文献

参考文献7

二级参考文献89

  • 1樊兴华,孙茂松.一种高性能的两类中文文本分类方法[J].计算机学报,2006,29(1):124-131. 被引量:70
  • 2张华平.计算所汉语词法分析系统ICTCLAS[EB/OL].[2002-08-16].http://www.nip.org.cn/project/project.php?pwj_id=6.
  • 3搜狗实验室.文本分类语料库[EB/OL].[2008-07-20].http://www.sogou.com/labs/dl/c.html.
  • 4Gupta V, Lehal G S. A survey of text mining tech?niques and applications[J].Journal of Emerging Tech?nologies in Web Intelligence, 2009, 1 ( 1 ) ; 60 -76.
  • 5Alexander P, Patrick P. Twitter as a corpus for senti- ment analysis and opinion mining[CJ / / Proceedings of the Seventh International Conference on Language Re?sources and Evaluation. Valletta, Malta, 20 10 ; 19 - 21.
  • 6Navigli R. Word sense disambiguation; a survey[J] . ACM Computing Surveys, 2009, 41 (2); 1 - 6.
  • 7Zhang W, Yoshida T, Tang X. Text classification based on multi-word with support vector machine[J] . Knowledge-Based Systems, 2008, 21 (8) ; 879 - 886.
  • 8Sun A. Short text classification using very few words[CJ / / Proceedings of the 35th International ACM SI?GIR Conference on Research and Development in Infor?mation Retrieval. New York, USA, 2012; 1145 - 1146.
  • 9Cilibrasi R L, Vitanyi P M B. The google similarity distance[J]. IEEE Transactions on Knowledge and Da?ta Engineering, 2007 , 19 (3) ; 370 - 383.
  • 10Hu X, Zhang X, Lu C, et al. Exploiting Wikipedia as external knowledge for document clustering[CJ / / Pro?ceedings of the 15th ACM SIGKDD International Con?ference on Knowledge Discovery and Data Mining. Par?is, France, 2009; 389 - 396.

共引文献142

同被引文献15

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部