期刊文献+

词间相关性在贝叶斯文本分类中的应用研究 被引量:4

Research on application of word correlation in Naive Bayes text classification
下载PDF
导出
摘要 针对朴素贝叶斯分类的属性独立性假设的不足,讨论了相关性及多变量相关的概念,给出词间相关度的定义。在TAN分类器的词间相关性分析基础上,提出一种文档特征词相关度估计公式及其在改进朴素贝叶斯分类模型中应用的算法,在Reuters-21578文本数据集上的实验表明,改进算法简单易行,能有效改进贝叶斯分类性能。 Aiming at the deficiency of Naive Bayes' attribute independence assumption,the concept of correlation and that between multi-variations were discussed,and the definition of correlation degree between terms was presented.Based on the analysis of the correlation between terms of TAN classifier,authors proposed a fomula to evaluate the correlation degree between document feature words and the algorithm of its application to ameliorating Naive Bayesian classifier.The experiments on Reuters- 21578 collection show the improvement of algorithm to be simple,effective and easy to implement.
出处 《计算机工程与应用》 CSCD 北大核心 2009年第16期159-161,共3页 Computer Engineering and Applications
关键词 文本分类 朴素贝叶斯 事件相关 相关度 树扩展型朴素贝叶斯分类器 text classification Naive Bayes event correlation correlation degree Tree Augmented Naive Bayes(TAN) classifier
  • 相关文献

参考文献9

  • 1Langley P,Iba W,Thompson K.An analysis of Bayesian classifiers[C]//Proceedings of the Tenth National Conference on Artificial Intelligence.Menlo Park,USA:AAA I Press,1992:223-228.
  • 2Fried N,Geiger D,Goldszmidt M.Bayesian network classifiers[J]. Machine Learning, 1997,29(2/3 ) : 131-163.
  • 3Ramoni M,Sebastiani P.Robust Bayes classifiers[J].Artificial Intelligence, 2001,125(122) :209-226.
  • 4Cheng J, Greiner R.Comparlng Bayesian network classifiers [C]// Laskey K B,Prade H.Proc of the 15th Conf on Uncertainty in Artificial Intelligence.San Francisco:Morgan Kaufmann Publishers, 1999:101-108.
  • 5Susumu T.A study on multi relation coefficient among variables[J]. Proceedings of the School of Information Technology and Electronics of Tokai University, 2004,4( 1 ) : 67-72.
  • 6Bocchieri E, Mark B.Subspace distribution clustering hidden Markov model[J].IEEE Transactions on Speech and Audio Processing, 2001,9( 3 ) : 264-275.
  • 7Lewis DD.Reuters-21578 text categorization test collection distribution 1.0[EB/OL]. ( 1997 -09 ).http://www.daviddlewis.com/resources/ testcollections/reuters21578.
  • 8代六玲,黄河燕,陈肇雄.中文文本分类中特征抽取方法的比较研究[J].中文信息学报,2004,18(1):26-32. 被引量:228
  • 9Sebastiani F.Machlne learning in automated text categorization[J]. ACM Computer Survey, 2002,34( 1 ) : 1-47.

二级参考文献4

共引文献227

同被引文献25

  • 1高琰,谷士文,谭立球,费耀平.基于Lucene的搜索引擎设计与实现[J].微机发展,2004,14(10):27-30. 被引量:23
  • 2孟庆利.故障管理系统中事件相关性分析的运用[J].世界电信,2004,17(10):43-44. 被引量:4
  • 3孟庆利.NFM系统的事件相关性分析机制[J].电信技术,2004(11):77-78. 被引量:1
  • 4张燕,傅建明.垃圾短信的识别与追踪研究[J].计算机应用研究,2006,23(3):245-247. 被引量:21
  • 5Miguel E Ruiz. Padmini Srinivasan Hierarchical Text Categorization Using Neural Networks [J] .InformationR, etrieval,2002, 5(1): 87-118.
  • 6Shasha Liao, Minghu Method Based On Concept Classification [J]. New M Jiang. A New Feature Selection Extraction In Automatic Chinese Text athematics and Natural Computation (NMNC), 2007, 03(03): 331-347.
  • 7Salton G, WongA. On the specification of term value in automatic indexing[J]. Journal of Documentation,1973, 29(4): 351-372.
  • 8Yang Y, Pedersen J Q. A comparative study on feature selection in text categorization[A] / / Proceed nternational Conference on Machine Learning( Francisco: Morgan Kaufmann Publishers. 1997 ings of the'14th CML' 97)[c]. San 412-420.
  • 9Lanckriet G,Oristianini N,Bartlett P,et al. Learning the kernel matrix with semidefinite programming[J]. Journal of Machine Learning R, esearch,2004,5(12): 27-72.
  • 10AMARI S, WU S. Improving Support Vector Machine Classifiers by Modifying Kernel Functions[J].Neural Networks,1999, 12(6): 785-789.

引证文献4

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部