期刊文献+

基于频繁项集挖掘的贝叶斯分类算法 被引量:12

Bayesian Classifier Based on Frequent Item Sets Mining
下载PDF
导出
摘要 朴素贝叶斯分类器是一种简单而且高效的分类学习算法,但是它所要求的属性独立性假设在真实世界应用中经常难以满足.为了放松属性独立性约束以提高朴素贝叶斯分类器的泛化能力,研究人员进行了大量的工作.提出了一种基于频繁项集挖掘技术的贝叶斯分类学习算法FISC(frequent item sets classifier).在训练阶段,FISC找到所有频繁项集并计算可能用到的概率估值.在测试阶段,FISC对于测试样本包含的每个项集构造一个分类器,通过集成这些分类器来给出预测结果.实验结果验证了FISC的有效性. ANaive Bayesian classifier provides a simple and effective way to classifier learning, but its assumption on attribute independence is often violated in real-world applications. To alleviate this assumption and improve the generalization ability of Naive Bayesian classifier, many works have been done cy researchers. AODE ensembles some one-dependence Bayesian classifiers and LB selects and combines long item sets providing new evidence to compute the class probability. Both of them achieve good performance, but higher order dependence relations may contain useful information for classification and limiting the number of item sets used in classifier may restricts the benefit of item sets. For this consideration, a frequent item sets mining-based Bayesian classifier, FISC (frequent item sets classifier), is proposed. At the training stage, FISC finds all the frequent item sets satisfying the minimum support threshold min_sup and computes all the probabilities that may be used at the classification time. At the test stage, FISC constructs a classifier for each frequent item set contained in the test instance, and then classifies the instance by ensembling all these classifiers. Experiments validate the effectiveness of FISC and show how the perform'ance of FISC varies with different min_sup. Based on the experiment result, an experiential selection for min-s_up is suggested.
出处 《计算机研究与发展》 EI CSCD 北大核心 2007年第8期1293-1300,共8页 Journal of Computer Research and Development
基金 国家自然科学基金项目(60635030) 江苏省自然科学基金项目(BK2005412)
关键词 机器学习 贝叶斯分类 半朴素贝叶斯分类 频繁项集挖掘 集成学习 machine learning Bayesian classification semi-naive Bayesian classification frequent item sets mining ensemble learning
  • 相关文献

参考文献22

  • 1P Domingos,M Pazzani.Beyond independence:Conditions for the optimality of the simple Bayesian classifier[C].The 13th Int'l Conf on Machine Learning,San Francisco,CA,1996.
  • 2G I Webb,J R Boughton,Z J Wang.Not so naive Bayes:Aggregating one-dependence estimators[J].Machine Learning,2005,58(1):5-24.
  • 3D Meretakis,B Wuthrich.Extending naive Bayes classification using long itemsets[C].The 5th ACM SIGKDD Int'l Conf on Knowledge Discovery and Data Mining,San Diego,CA,1999.
  • 4R Agrawal,M Srikant.Fast algorithms for mining association rules in large databases[R].IBM Almaden Research Center,Tech Rep:RJ9839,1994.
  • 5R Agrawal,M Srikant.Fast algorithms for mining association rules[C].The 20th Int'l Conf on Very Large Data Bases,Santiago,Chile,1994.
  • 6H Mannila,H Toivonen,A I Verkamo.Efficient algorithms for discovering association rules[C].The AAAI'94 Workshop on Knowledge Discovery in Database,Seattle,WA,1994.
  • 7P Langley,S Sage.Induction of selective Bayesian classifiers[C].The 10th Conf on Uncertainty in Artificial Intelligence,Seattle,WA,1994.
  • 8M J Pazzani.Constructive induction of Cartesian product attributes[C].The Conf on Information,Statistics and Induction in Science' 96,Singapore,1996.
  • 9G I Webb,M J Pazzani.Adjusted probability naive Bayesian induction[C].The 11th Australian Joint Conf on Artificial Intelligence,Brisbane,Australia,1998.
  • 10P Langley.Induction of recursive Bayesian classifiers[C].The 6th European Conf on Machine Learning,Vienna,Austria,1993.

二级参考文献14

  • 1R E Schapire. The strength of weak learnability. Machine Learning, 1990, 5(2): 197~227
  • 2R E Schapire, Y Freund, P Bartlett et al. Boosting the margin: A new explanation for the effectiveness of voting methods. In: Douglas H Fisher eds. Proc of the 14th Int'l Conf on Machine Learning. San Francisco: Morgan Kaufmann, 1997. 322~330
  • 3Y Freund, R E Schapire. Experiments with a new Boosting algorithm. In: Lorenza Saitta ed. Proc of the 13th Int'l Conf on Machine Learning. San Francisco: Morgan Kaufmann, 1996. 148~156
  • 4Y Freund. Boosting a weak learning algorithm by majority. Information and Computation, 1995, 121(2): 256~285
  • 5Y Freund. An adaptive version of the Boost by majority algorithm. In: Shai Ben-David, Phil Long eds. Proc of the 12th Annual Conf on Computational Learning Theory. New York: ACM Press, 1999. 102~113
  • 6J R Quinlan. Bagging, Boosting, and C4.5. In: Ben-Eliyahu, Rachel eds. Proc of the 13th National Conf on Artificial Intelligence. Menlo Park, CA: AAAI Press, 1996. 725~730
  • 7E Bauer, R Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 1999, 36(1/2): 105~139
  • 8K M Ting, Z Zheng. Improving the performance of boosting for naive Bayesian classification. In: Ning Zhong, Lizhu Zhou eds. Proc of the 3rd Pacific-Asia Conf on Knowledge Discovery and Data Mining. Berlin Germany: Springer-Verlag, 1999. 296~305
  • 9Z Zheng. Nave Bayesian classifier committees. In: Chaire Nedellec, Celine Rouveirol eds. Proc of the 10th European Conf on Machine Learning. Chemnitz, Berlin Germany: Springer-Verlag, 1998. 196~207
  • 10N Friedman, D Geiger, M Goldszmidt. Bayesian network classifiers. Machine Learning, 1997, 29(2/3): 131~163

共引文献13

同被引文献144

引证文献12

二级引证文献44

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部