期刊文献+

用于不完整数据的选择性贝叶斯分类器 被引量:11

Selective Bayes Classifiers for Incomplete Data
下载PDF
导出
摘要 选择性分类器通过删除数据集中的无关属性和冗余属性可以有效地提高分类精度和效率.因此,一些选择性分类器应运而生.然而,由于处理不完整数据的复杂性,它们大都是针对完整数据的.由于各种原因,现实中的数据通常是不完整的并且包含许多冗余属性或无关属性.如同完整数据的情形一样,不完整数据集中的冗余属性或无关属性也会使分类性能大幅下降.因此,对用于不完整数据的选择性分类器的研究是一项重要的研究课题.通过分析以往在分类过程中对不完整数据的处理方法,提出了两种用于不完整数据的选择性贝叶斯分类器:SRBC和CBSRBC.SRBC是基于一种鲁棒贝叶斯分类器构建的,而CBSRBC则是在SRBC基础上利用χ2统计量构建的.在12个标准的不完整数据集上的实验结果表明,这两种方法在大幅度减少属性数目的同时,能显著提高分类准确率和稳定性.从总体上来讲,CBSRBC在分类精度、运行效率等方面都优于SRBC算法,而SRBC需要预先指定的阈值要少一些. Selective classifiers have been proved to be a kind of algorithms that can effectively improve the accuracy and efficiency of classification by deleting irrelevant or redundant attributes of a data set. Though some selective classifiers have been proposed, most of them deal with complete data, which is due to the complexity of dealing with incomplete data. Yet actual data sets are often incomplete and have many redundant or irrelevant attributes because of various kinds of reason. Similar to the case of complete data, irrelevant or redundant attributes of an incomplete data set can also sharply reduce the accuracy of a classifier established on this data set. So constructing selective classifiers for incomplete data is an important problem. With the analysis of main methods of processing incomplete data for classification, two selective Bayes classifiers for incomplete data, which are denoted as SRBC and CBSRBC respectively, are presented. While SRBC is constructed by using the robust Bayes classifiers, CBSRBC is based on SRBC and chisquared statistics. Experiments on twelve benchmark incomplete data sets show that these two algorithms can not only enormously reduce the number of attributes, but also greatly improve the accuracy and stability of classification as well. On the whole, CBSRBC is more efficient than SRBC and its classification accuracy is higher than that of SRBC. But some thresholds necessary to CBSRBC can be avoided by SRBC.
出处 《计算机研究与发展》 EI CSCD 北大核心 2007年第8期1324-1330,共7页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60503017 60673089)
关键词 贝叶斯方法 分类 特征选择 不完整数据 X2统计量 Bayesian method classification feature selection incomplete data chi-squared statistics
  • 相关文献

参考文献17

  • 1P Langley,S Sage.Induction of selective Bayesian classifiers[C].In:Proc of the 10th Conf on Uncertainty in Artificial Intelligence.San Francisco:Morgan Kaufmann,1994.399-406.
  • 2M Singh,G M Provan.Efficient learning of selective Bayesian network classifiers[C].In:Proc of the 13th Int'l Conf on Machine Learning.San Francisco:Morgan Kaufman,1996.
  • 3尚文倩,黄厚宽,刘玉玲,林永民,瞿有利,董红斌.文本分类中基于基尼指数的特征选择算法研究[J].计算机研究与发展,2006,43(10):1688-1694. 被引量:38
  • 4J R Quinlan.C4.5:Programs for Machine Learning[M].San Francisco:Morgan Kaufmann,1993.
  • 5R Kohavi,B Becker,D Sommerfield.Improving simple Bayes[C].In:M van Someren,G Widmer,eds.Poster Papers of the ECML-97.Prague:Charles University,1997.78-87.
  • 6N Friedman,D Geiger,M Goldszmidt.Bayesian network classifiers[J].Machine Learning,1997,29(2-3):131-163.
  • 7S L Lauritzen.The EM algorithm for graphical association models with missing data[J].Computational Statistics and Data Analysis,1995,19(2):191-201.
  • 8S Russell,J Binder,D Koller,et al.Local learning in probabilistic networks with hidden variables[C].In:Proc of IJCAI-95.San Francisco:Morgan Kaufmann,1995.1146-1151.
  • 9S Geman,D Geman.Stochastic relaxation,Gibbs distributions and the Bayesian restoration of images[J].IEEE Trans on Pattern Analysis and Machine Intelligence,1984,6(6):721-741.
  • 10R J A Little,D B Rubin.Statistical Analysis with Missing Data[M].New York:Wiley,1987.

二级参考文献25

  • 1李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量:95
  • 2T M Cover,P E Hart.Nearest neighbor pattern classification[J].IEEE Trans on Information Theory,1967,IT-13(1):21-27
  • 3Y Yang.An evaluation of statistical approaches to text categorization[J].Information Retrieval,1999,1(1/2):67 -88
  • 4Y Yang,X Lin.A re-examination of text categorization methods[C].The 22nd Annual Int'l ACM SIGIR Conf on Research and Development in the Information Retrieval,Berkeley,California,USA,1999
  • 5B Masand,G Lino,D Waltz.Classifying news stories using memory based reasoning[C].The 15th Annual Int'l ACM SIGIR Conf on Research and Development in Information Retrieval,Copenhagen,Denmark,1992
  • 6D D Lewis.Naive (Bayes) at forty:The independence assumption in information retrieval[C].The 10th European Conf on Machine Learning,Heidelberg,Germany,1998
  • 7A Mccallum,K Nigam.A comparison of event models for naive bayes text classification[C].AAAI-98 Workshop on Learning for Text Categorization,Madison,Wisconsin,1998
  • 8D D Lewis,M Ringuette.Comparison of two learning algorithms for text categorization[C].The 3rd Annual Symp on Document Analysis and Information Retrieval,Las Vegas,1994
  • 9C Apte,F Damerau,S Weiss.Text mining with decision rules and decision trees[C].The Conf on Automated Learning and Discovery,Workshop 6:Learning from Text and the Web,Pittsburgh,PA,1998
  • 10T Joachims.Text categorization with support vector machines:Learning with many relevant features[C].The 10th European Conf on Machine Learning,Heidelberg,Germany,1998

共引文献37

同被引文献118

引证文献11

二级引证文献57

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部