期刊文献+

一种改进的贝叶斯文本分类方法 被引量:7

Improved Naive Bayes Text Classification Algorithm
下载PDF
导出
摘要 朴素贝叶斯分类(naive Bayes)有一个“独立性假设”:给定一个实例的类标签,实例中的每个属性的出现都独立于实例中其他属性的出现,而在实际应用中这种条件并不易满足,另外由于文本的特殊性,相关的特征项可能会产生新的语义信息。因此在训练文本时,对特征选择后产生的特征集用一种可行的方法考察它们之间的相关性,然后对相关程度高的特征进行合并处理。实验数据表明,这个改进的方法能提高朴素贝叶斯的算法精确度。 There is an "independence hypothesis" in Bayesian classifier method.examples of the emergence of each attribute are independent from the examples of other attributes appear ,the practical application of such conditions are not easily satisfied because the special version of the related characters may have new meaning in a special text. Therefore,while training the text,the characters of higher relevant can be amalgamated. The experimental data indicates, that this improved method can improve the algorithm accuracy appreciably.
出处 《广西师范大学学报(自然科学版)》 CAS 北大核心 2007年第2期206-209,共4页 Journal of Guangxi Normal University:Natural Science Edition
基金 重庆市自然科学基金资助项目(CSTC2006BB2021)
关键词 文本分类 独立性假设 相关性 text classification independence hypothesis relativity
  • 相关文献

参考文献7

  • 1Fabrizio Sebastiani.Machine learning in automated text categorization[J].ACM Computing Curveys,2002,34(1):11-12,32-33.
  • 2王灏,黄厚宽,田盛丰.文本分类实现技术[J].广西师范大学学报(自然科学版),2003,21(A01):173-179. 被引量:15
  • 3McCALLUM A,NIGAM K.A comparison of event models for Naive Bayes text retrival[J].Information Processing and Management,1998,24(5):513-523.
  • 4KONONENKO I.Semi-Naive Bayesian classifiers[C]//Proceedings of European Conference on Artificial Intelligence.Berlin:Springer-Verlag,1991:206-219.
  • 5FRIEDMAN N,GEIGER D,GOLDSZMIDT M.Bayesian network classifiers[J].Machine Learning,1997,29(2/3):131-163.
  • 6石洪波,王志海,黄厚宽.一种基于TAN的文本分类方法[J].广西师范大学学报(自然科学版),2003,21(1):81-85. 被引量:4
  • 7YANG Yi-ming.An evaluation of statistical approach to text categorization[J].Information Retrieval,1999,1(1/2):69-90.

二级参考文献15

  • 1[1]Langley P,Iba W,Thompson K.An analysis of bayesian classifiers[A].Proceedings tenth national conference on artificial intelligence[C].Menlo Park,CA:AAAI Press,1992.223-228.
  • 2[2]Friedman N,Geiger D,Goldszmidt M.Bayesian network classifiers[J].Machine Learning,1997,29:131-163.
  • 3[3]Pearl J.Probabilistic reasoning in intelligent systems:Networks of plausible inference[M].San Francisco:Morgan Kaufman Publishers,1988.122-150.
  • 4[4]Chickering D M.Learning bayesian networks is NP-complete[A].Horvitz Eric,Jensen Finn V.Proceedings of the 12th conference on uncertainty in artificial intelligence[C].San Francisco:Morgan Kaufmann Publishers,1996.210-216.
  • 5[5]Dumais S,Platt J,Heckerman D,et al.Inductive learning algorithms and representations for text categorization[A].Makki K,Bouganim L.Proceedings international conference on information and knowledge management[C].New York:ACM Press,1998.148-155.
  • 6[6]Yang Y.An evaluation of statistical approaches to text categorization[J].Journal of Information Retrieval,1999,1(1/2):67-88.
  • 7[7]Lam W,Ho C Y.Using a generalized instance set for automatic text categorization[A].Moffat Alistair,Wilkinson Ross.Proceedings of the 21th annual international ACM SIGIR conference on research and development in information retrieval[C].New York:ACM Press,1998.81-89.
  • 8[8]Han E H,Karypis G,Kumar V.Text categorization using weight adjusted k-nearest neighbor classification[A].Cheung D,Williams G J,Li Q.Proceedings of the 5th Pacific Area conference on knowledge discovery and data mining (PAKDD 2001).Lecture notes in artificial intelligence (LNAI)[C].Berlin:Springer,2001.53-65.
  • 9[9]Yang Y,Chute C G.An application of least squares fit mapping to text information retrieval[A].Korfhage Robert,Rasmussen Edie,Willett Peter.Proceedings of 16th annual international ACM SIGIR conference on research and development in information retrieval[C].New York:ACM Press,1993.281-290.
  • 10[10]Mccallum A,Nigam K.A comparison of event models for naive bayes text classification[DB/OL].http://citeseer.nj.nec.com/mccallum98comparison.html.1999.

共引文献17

同被引文献45

引证文献7

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部