基于频繁项集挖掘的贝叶斯分类算法被引量：12

Bayesian Classifier Based on Frequent Item Sets Mining

下载PDF

导出

摘要朴素贝叶斯分类器是一种简单而且高效的分类学习算法,但是它所要求的属性独立性假设在真实世界应用中经常难以满足.为了放松属性独立性约束以提高朴素贝叶斯分类器的泛化能力,研究人员进行了大量的工作.提出了一种基于频繁项集挖掘技术的贝叶斯分类学习算法FISC(frequent item sets classifier).在训练阶段,FISC找到所有频繁项集并计算可能用到的概率估值.在测试阶段,FISC对于测试样本包含的每个项集构造一个分类器,通过集成这些分类器来给出预测结果.实验结果验证了FISC的有效性. ANaive Bayesian classifier provides a simple and effective way to classifier learning, but its assumption on attribute independence is often violated in real-world applications. To alleviate this assumption and improve the generalization ability of Naive Bayesian classifier, many works have been done cy researchers. AODE ensembles some one-dependence Bayesian classifiers and LB selects and combines long item sets providing new evidence to compute the class probability. Both of them achieve good performance, but higher order dependence relations may contain useful information for classification and limiting the number of item sets used in classifier may restricts the benefit of item sets. For this consideration, a frequent item sets mining-based Bayesian classifier, FISC （frequent item sets classifier）, is proposed. At the training stage, FISC finds all the frequent item sets satisfying the minimum support threshold min_sup and computes all the probabilities that may be used at the classification time. At the test stage, FISC constructs a classifier for each frequent item set contained in the test instance, and then classifies the instance by ensembling all these classifiers. Experiments validate the effectiveness of FISC and show how the perform＇ance of FISC varies with different min_sup. Based on the experiment result, an experiential selection for min-s_up is suggested.

作者眭俊明姜远周志华

机构地区南京大学计算机软件新技术国家重点实验室

出处《计算机研究与发展》 EI CSCD 北大核心 2007年第8期1293-1300,共8页 Journal of Computer Research and Development

基金国家自然科学基金项目(60635030) 江苏省自然科学基金项目(BK2005412)

关键词机器学习贝叶斯分类半朴素贝叶斯分类频繁项集挖掘集成学习 machine learning Bayesian classification semi-naive Bayesian classification frequent item sets mining ensemble learning

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献22

1P Domingos,M Pazzani.Beyond independence:Conditions for the optimality of the simple Bayesian classifier[C].The 13th Int'l Conf on Machine Learning,San Francisco,CA,1996.
2G I Webb,J R Boughton,Z J Wang.Not so naive Bayes:Aggregating one-dependence estimators[J].Machine Learning,2005,58(1):5-24.
3D Meretakis,B Wuthrich.Extending naive Bayes classification using long itemsets[C].The 5th ACM SIGKDD Int'l Conf on Knowledge Discovery and Data Mining,San Diego,CA,1999.
4R Agrawal,M Srikant.Fast algorithms for mining association rules in large databases[R].IBM Almaden Research Center,Tech Rep:RJ9839,1994.
5R Agrawal,M Srikant.Fast algorithms for mining association rules[C].The 20th Int'l Conf on Very Large Data Bases,Santiago,Chile,1994.
6H Mannila,H Toivonen,A I Verkamo.Efficient algorithms for discovering association rules[C].The AAAI'94 Workshop on Knowledge Discovery in Database,Seattle,WA,1994.
7P Langley,S Sage.Induction of selective Bayesian classifiers[C].The 10th Conf on Uncertainty in Artificial Intelligence,Seattle,WA,1994.
8M J Pazzani.Constructive induction of Cartesian product attributes[C].The Conf on Information,Statistics and Induction in Science' 96,Singapore,1996.
9G I Webb,M J Pazzani.Adjusted probability naive Bayesian induction[C].The 11th Australian Joint Conf on Artificial Intelligence,Brisbane,Australia,1998.
10P Langley.Induction of recursive Bayesian classifiers[C].The 6th European Conf on Machine Learning,Vienna,Austria,1993.

二级参考文献14

1R E Schapire. The strength of weak learnability. Machine Learning, 1990, 5(2): 197～227
2R E Schapire, Y Freund, P Bartlett et al. Boosting the margin: A new explanation for the effectiveness of voting methods. In: Douglas H Fisher eds. Proc of the 14th Int'l Conf on Machine Learning. San Francisco: Morgan Kaufmann, 1997. 322～330
3Y Freund, R E Schapire. Experiments with a new Boosting algorithm. In: Lorenza Saitta ed. Proc of the 13th Int'l Conf on Machine Learning. San Francisco: Morgan Kaufmann, 1996. 148～156
4Y Freund. Boosting a weak learning algorithm by majority. Information and Computation, 1995, 121(2): 256～285
5Y Freund. An adaptive version of the Boost by majority algorithm. In: Shai Ben-David, Phil Long eds. Proc of the 12th Annual Conf on Computational Learning Theory. New York: ACM Press, 1999. 102～113
6J R Quinlan. Bagging, Boosting, and C4.5. In: Ben-Eliyahu, Rachel eds. Proc of the 13th National Conf on Artificial Intelligence. Menlo Park, CA: AAAI Press, 1996. 725～730
7E Bauer, R Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 1999, 36(1/2): 105～139
8K M Ting, Z Zheng. Improving the performance of boosting for naive Bayesian classification. In: Ning Zhong, Lizhu Zhou eds. Proc of the 3rd Pacific-Asia Conf on Knowledge Discovery and Data Mining. Berlin Germany: Springer-Verlag, 1999. 296～305
9Z Zheng. Nave Bayesian classifier committees. In: Chaire Nedellec, Celine Rouveirol eds. Proc of the 10th European Conf on Machine Learning. Chemnitz, Berlin Germany: Springer-Verlag, 1998. 196～207
10N Friedman, D Geiger, M Goldszmidt. Bayesian network classifiers. Machine Learning, 1997, 29(2/3): 131～163

共引文献13

1李广群,王志海,田凤占.一种基于AdaBoost方法的树形HNB组合分类器[J].广西师范大学学报（自然科学版）,2007,25(4):164-167. 被引量：1
2党长青,刘洁,牛分中.基于Boosting RBF神经网络的入侵检测[J].计算机工程与应用,2008,44(15):118-120. 被引量：3
3范磊,李培,王开宇.TAN分类器及其在降水预报中的应用[J].海洋预报,2009,26(1):94-99.
4赵文清,朱永利,王晓辉.基于组合贝叶斯网络的电力变压器故障诊断[J].电力自动化设备,2009,29(11):6-9. 被引量：21
5林正奎,唐焕玲,鲁明羽,王敬东.基于特征多视图提升Naive Bayesian的Boosting改进算法[J].北京交通大学学报,2009,33(6):70-75. 被引量：1
6蔡月红,朱倩,孙萍,程显毅.基于属性选择的半监督短文本分类算法[J].计算机应用,2010,30(4):1015-1018. 被引量：8
7唐焕玲,鲁明羽,邬俊.基于投票信息熵的AdaBoost改进算法[J].控制与决策,2010,25(4):487-492. 被引量：5
8琚春华,殷贤君,许翀寰.结合自助抽样的动态数据流贝叶斯分类算法[J].计算机工程与应用,2011,47(8):118-121. 被引量：3
9唐焕玲,鲁明羽.利用置信度重取样的SemiBoost-CR分类模型[J].计算机科学与探索,2011,5(11):1048-1056. 被引量：5
10Wei-Guo Yi,Jing Duan,Ming-Yu Lu.Double-layer Bayesian Classifier Ensembles Based on Frequent Itemsets[J].International Journal of Automation and computing,2012,9(2):215-220. 被引量：3

同被引文献144

1李专,王元珍.多关系关联规则挖掘中的隐私保护[J].华中科技大学学报（自然科学版）,2007,35(11):41-43. 被引量：2
2廖海波,万中英,王明文.基于投影寻踪回归文本自动分类的模型[J].清华大学学报（自然科学版）,2005,45(S1):1823-1827. 被引量：5
3付雪峰,王明文.基于模糊-粗糙集的文本分类方法[J].华南理工大学学报（自然科学版）,2004,32(z1):73-76. 被引量：8
4曾雪强,王明文,陈素芬.一种基于潜在语义结构的文本分类模型[J].华南理工大学学报（自然科学版）,2004,32(z1):99-102. 被引量：27
5邱江涛,唐常杰,乔少杰,段磊,刘齐宏.基于加权频繁项集的文本分类规则挖掘[J].四川大学学报（工程科学版）,2008,40(6):110-114. 被引量：3
6侯汉清.分类法的发展趋势简论[J].情报科学,1981,2(1):58-63. 被引量：14
7张敏,于剑.基于划分的模糊聚类算法[J].软件学报,2004,15(6):858-868. 被引量：176
8宋枫溪,高林.文本分类器性能评估指标[J].计算机工程,2004,30(13):107-109. 被引量：33
9张阳,张利军,闫剑锋,李战怀.基于关联特征的朴素贝叶斯文本分类器[J].西北工业大学学报,2004,22(4):413-416. 被引量：4
10宋枫溪,郑如冰,王积忠.自动文本分类中两种文本表示方式的比较[J].计算机工程,2004,30(18):124-126. 被引量：6

引证文献12

1吴宁,柏春霞,祝毅博.一种应用关联规则森林的改进贝叶斯分类算法[J].西安交通大学学报,2009,43(2):48-52. 被引量：5
2柳永坡,吴际,金茂忠,杨海燕,贾晓霞,刘雪梅.基于贝叶斯统计推理的故障定位实验研究[J].计算机研究与发展,2010,47(4):707-715. 被引量：9
3吕小勇,石洪波.基于频繁项集的多标签文本分类算法[J].计算机工程,2010,36(15):83-85. 被引量：4
4肖可,奉国和.1999～2008年国内文本分类研究文献计量分析[J].情报学报,2010,29(4):679-687. 被引量：6
5Wei-Guo Yi,Jing Duan,Ming-Yu Lu.Double-layer Bayesian Classifier Ensembles Based on Frequent Itemsets[J].International Journal of Automation and computing,2012,9(2):215-220. 被引量：3
6唐磊,李春平,杨柳.统计策略序列模式挖掘及其在软件缺陷预测中的应用[J].计算机科学,2013,40(5):164-167. 被引量：1
7王东,熊世桓,向程冠,靳宁.基于频繁2-项集的贝叶斯分类器[J].兰州理工大学学报,2013,39(4):99-104. 被引量：2
8林晨,顾君忠.基于Nodeset的最大频繁项集挖掘算法[J].计算机工程,2016,42(12):204-207. 被引量：6
9肖宝,李璞,曲艺,胡文君.基于语义相关度和频繁项集挖掘的文本分类[J].钦州学院学报,2017,32(5):27-33.
10姜美.基于隐私保护的数据挖掘综述[J].电脑与电信,2017(8):31-35. 被引量：1

二级引证文献44

1陈朝大,梁柱勋,郑士基.一种利用关联规则的改进朴素贝叶斯分类算法[J].计算机系统应用,2010,19(11):106-109. 被引量：7
2路永和,曹利朝.基于粒子群优化的文本特征选择方法[J].现代图书情报技术,2011(7):76-81. 被引量：6
3何加浪,孟锦,张琨,张宏.一种故障传播感知的程序故障定位方法[J].电子与信息学报,2011,33(9):2192-2198. 被引量：3
4孙向琨,邓伟.结合TF-IDF的歌曲情感多标记分类[J].计算机工程,2011,37(19):189-190. 被引量：4
5王欢,武刚,杨抒.基于文本分类的林业Web黄页分类系统[J].计算机系统应用,2012,21(1):21-24. 被引量：2
6吴青,应时,贾向阳,朱小刚.面向服务软件中异常处理模块重要性的仿真分析方法[J].计算机科学,2012,39(10):136-138.
7莫同,褚伟杰,李伟平,吴中海.采用超图的微博群落感知方法[J].西安交通大学学报,2012,46(11):120-126. 被引量：8
8何加浪,张宏.神经网络在软件多故障定位中的应用研究[J].计算机研究与发展,2013,50(3):619-625. 被引量：5
9许文婕.国际儿科临床期刊PEDIATRICS文献计量学分析[J].中华实用儿科临床杂志,2013,28(4):311-314. 被引量：1
10文万志,李必信,孙小兵,齐珊珊.基于条件执行切片谱的多错误定位[J].计算机研究与发展,2013,50(5):1030-1043. 被引量：12

1胡为成.朴素贝叶斯分类器的改进[J].铜陵学院学报,2007,6(1):73-75. 被引量：1
2杜会锋,刘琼荪.基于Copula的贝叶斯分类器[J].计算机工程与应用,2010,46(10):111-112. 被引量：3
3郑默,刘琼荪.一种属性相关性的加权贝叶斯分类算法研究[J].微型机与应用,2011,30(7):96-98. 被引量：3
4王志海,张璠.一种基于粗糙集合理论的树扩张型贝叶斯网络分类器[J].复旦学报（自然科学版）,2004,43(5):725-728. 被引量：3
5王儒敬,葛运健,滕明贵,张晓明.基于粗集的空间对象分类学习算法[J].中国科学技术大学学报,2006,36(2):163-169. 被引量：2
6杨敏,贺兴时,刘平丽,王芳妮.基于属性约简的PLS加权朴素贝叶斯分类[J].西安工程大学学报,2013,27(1):118-121. 被引量：3
7胡为成,程转流,王本年.基于模拟退火遗传算法的贝叶斯分类[J].计算机工程,2007,33(9):219-221. 被引量：9
8胡为成,胡学钢.基于遗传算法的朴素贝叶斯分类[J].计算机技术与发展,2007,17(1):30-32. 被引量：3
9刘牛.基于属性加权的朴素贝叶斯分类算法改进[J].网络安全技术与应用,2011(6):72-74. 被引量：6
10肖梅,辛阳.基于朴素贝叶斯算法的VoIP流量识别技术研究[J].信息网络安全,2015(10):74-79. 被引量：7

计算机研究与发展

2007年第8期

浏览历史

内容加载中请稍等...

基于频繁项集挖掘的贝叶斯分类算法被引量：12

参考文献22

二级参考文献14

共引文献13

同被引文献144

引证文献12

二级引证文献44

相关作者

相关机构

相关主题

浏览历史

基于频繁项集挖掘的贝叶斯分类算法 被引量：12

参考文献22

二级参考文献14

共引文献13

同被引文献144

引证文献12

二级引证文献44

相关作者

相关机构

相关主题

浏览历史

基于频繁项集挖掘的贝叶斯分类算法被引量：12