一种新的基于粗糙集的动态样本识别算法被引量：8

A new dynamic sample recognition algorithm based on rough set

下载PDF

导出

摘要样本识别是知识获取的最终应用体现,是数据挖掘研究中的一个重要内容.现有的数据挖掘算法众多,如何才能选择到一个泛化能力较强、识别率较高的最优算法成为研究的重点.文中利用粗糙集能处理不完整、不精确数据的优势,结合支持向量机、决策树方法,通过分析数据的特征,提出利用样本对规则集的覆盖度和设置一个相关阈值来进行最优分类方法的动态选择.在第一时间为样本选择到相对较优的分类算法.仿真实验验证了算法的有效性. Sample identification is the ultimate application of knowledge acquisition,is an important element of the data mining study.There have been a lot of mining algorithms,how to choose the best algorithm with strong generalization ability is now a main research point.In this paper,we make use of the advantages that rough set can handle incomplete and inaccurate data,combined with Support Vector Machines,Decision Tree methods,by analyzing the characteristics of the data,presenting using a rule union＇s coverage and setting a threshold to select the optimal classification method dynamically.It can find out the best algorithm at the first time.There are four steps in total.First,use rough set methods to get the rule union.Second,by analyzing the relation of sample example and rule union,putting forward uses the coverage of sample to rule union to judge whether it is suitable to use rough sets to identify the sample.The coverage reflects the number of rules that match with the sample.When the coverage is greater（or less） than 1/n,（the n here is the number of rules we get）,it indicates that there are more than one rules（or no rules） match with the sample,then it may identifies the sample in error（or refuses to recognize）,the sample in that case need further analysis.Third,to the samples leaved from step 2,computing the distance between it and the support vector points,when the distance is greater than a certain threshold,then it tells us that SVM can classify it well,so uses the SVM method to classify it.Forth,if the distance in step 3 is smaller than the threshold,then,uses the decision tree algorithm to identify it.In order to verify the effective of the algorithm,in the experiment part,we choose eight data sets from the UCI to test.To each data set,We select 50 percent data randomly to be train set and the other 50 percent data is used to be test set.The result shows that the algorithm in this paper has the equal well recognition rate with current optimal algorithm.The experiment results have verified the effectiveness of the algorithm.

作者易兴辉王国胤胡峰

机构地区重庆邮电大学计算机科学与技术研究所

出处《南京大学学报（自然科学版）》 CAS CSCD 北大核心 2010年第5期501-506,共6页 Journal of Nanjing University（Natural Science）

基金国家自然科学基金(60573068 60773113) 重庆市自然科学基金(2008BA201 2008BA2041) 重庆市教育委员会科学技术研究项目(KJ090512)

关键词粗集支持向量机决策树样本识别 rough sets support vector machine decision tree sample identification

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献22

1Pawlak Z. Rough set. International Journal of Computer and Information Sciences, 1982, 11: 341-356.
2Pawlak Z, Grzymala-Busse J, Slowinski R, et al. Rough sets. Communications of the Association for Computing Machinery, 1995, 38 (11): 89-95.
3Pawlak Z. Vagueness--A rough set view. Mycielski J, Rozenberg G, Salomaa A. Structures in logic and computer science: A selection of essays in honor of A. Ehrenfeueht. Berlin.. Springer-Verlag, 1997, 106 - 117.
4XieK M, Chen Z H, Xie G, etal. BGrC for superheated steam temperature system modeling in power plant. Proceedings of the 2006 IEEE International Conference on Granular Computing. Atlanta, USA, 2006,708-711.
5Valdes J J, Romero E, Gonzalea R. Data and knowledge visualization with virtual reality spaces, neural networks and rough sets: Application to geophysical prospecting neural networks. Proceedings of the International Joint Conference on Neuval Network 2007. Orlando, Florida, USA, 2007,160-165.
6Hirano S, Tsumoto S. Segmentation of medical images based on approximations in rough set theory. Proceedings of the Rourg Sets and Current Trends in Computing 2002, 2002: 554-563.
7朱有产,熊伟,静永文,高亚彬.基于Rough Set理论的综合分类器设计与实现[J].通信学报,2006,27(z1):63-67. 被引量：6
8Peng Y Q, Liu G Q, Geng H S. Application of rough set theory in network fault diagnosis. Proceedings of the Information Technology and Application, 2005, 2:556-559.
9Wojcik Z M. Detecting spots for NASA space programs using rough sets. Proceedings of the 2^nd International Conference on Rough Sets and Current Trends in Computing, 2000, 531-537.
10Swoniarski R, Hargis L. Rough set as a format end of neural-networks texture classifiers. Neurocomputig, 2001,36(1-4) :85-102.

二级参考文献54

1彭宁云,文习山,王一,陈江波,柴旭峥.基于线性分类器的充油变压器潜伏性故障诊断方法[J].中国电机工程学报,2004,24(6):147-151. 被引量：35
2莫娟,王雪,董明,严璋.基于粗糙集理论的电力变压器故障诊断方法[J].中国电机工程学报,2004,24(7):162-167. 被引量：85
3王双成,苑森淼,王辉.基于类约束的贝叶斯网络分类器学习[J].小型微型计算机系统,2004,25(6):968-971. 被引量：30
4Moore AW, Zuev D. Internet traffic classification using Bayesian analysis techniques. In: Proc. of the 2005 ACM SIGMETRICS Int'l Conf. on Measurement and Modeling of Computer Systems, Banff, 2005. 50-60. http://www.cl.cam.ac.uk/-awm22 /publications/moore2005internet.pdf.
5Madhukar A, Williamson C. A longitudinal study of P2P traffic classification. In: Proc. of the 14th IEEE Int'l Syrup. on Modeling, Analysis, and Simulation. Monterey, 2006. http://ieeexplore.ieee.org/xpl/ffeeabs_all.jsp?arnumber=1698549.
6Moore AW, Papagiannaki K. Toward the accurate identification of network applications. In: Dovrolis C, ed. Proc. of the PAM 2005. LNCS 3431, Heidelberg: Springer-Verlag, 2005.41-54.
7Karagiannis T, Papagiannaki K, Faloutsos M. BLINC: Multilevel traffic classification in the dark. In: Proc. of the ACM SIGCOMM. Philadelphia, 2005. 229-240. http://conferences.sigcomm.org/sigcomm/2005/paper-KarPap.pdf.
8Roughan M, Sen S, Spatscheck O, Dutfield N. Class-of-Service mapping for QoS: A statistical signature-based approach to IP traffic classification. In: Proc. of the ACM SIGCOMM Internet Measurement Conf. Taormina, 2004. 135-148. http://www.imconf.net/imc-2004/papers/p 135-roughan.pdf.
9Zuev D, Moore AW. Traffic classification using a statistical approach. In: Dovrolis C, ed. Proc. of the PAM 2005. LNCS 3431, Heidelberg: Springer-Verlag, 2005. 321-324.
10Nguyen T, Armitage G. Training on multiple sub-flows to optimise the use of Machine Learning classifiers in real-world IP networks. In: Proc. of the 31 st IEEE LCN 2006. Tampa, 2006. http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4116573.

共引文献263

1高文才,曹帅.基于MRF-FCM算法的矿井运动目标图像优化[J].工矿自动化,2024,50(S01):69-73.
2邓建国,张素兰,张继福,荀亚玲,刘爱琴.监督学习中的损失函数及应用研究[J].大数据,2020,6(1):60-80. 被引量：39
3代志康,吴秋新,程希明.一种基于ResNet的网络流量识别方法[J].北京信息科技大学学报（自然科学版）,2020,35(1):82-88. 被引量：5
4崔宇,侯慧娟,苏磊,钱涛,盛戈皞,江秀臣.考虑不平衡案例样本的电力变压器故障诊断方法[J].高电压技术,2020,46(1):33-41. 被引量：30
5赵莹,张颖.基于粗糙集和贝叶斯理论的IT项目风险规则分析[J].沈阳工程学院学报（自然科学版）,2009,5(1):73-76.
6葛辉,郑忠言.一种新型变压器故障诊断方法[J].轻金属,2010(6):60-64.
7陈陆颖,丛蓉,杨洁,于华.P2P Streaming Traffic Classification in High-Speed Networks[J].China Communications,2011,8(5):70-78. 被引量：1
8赵树鹏,陈贞翔,彭立志.基于流中前5个包的在线流量分类特征[J].济南大学学报（自然科学版）,2012,26(2):156-160. 被引量：3
9孟姣,王丽宏,熊刚,姚垚.基于机器学习的SSH应用分类研究[J].计算机研究与发展,2012,49(S2):153-159. 被引量：2
10毛磊.超期羁押:司法机关一大毒性顽瘤[J].中国律师,2002(11):19-23. 被引量：6

同被引文献134

1王珏,姚一豫,王飞跃.基于Reduct的“规则+例外”学习[J].计算机学报,2005,28(11):1778-1789. 被引量：10
2王珏,刘三阳,张杰.群决策中基于语言信息处理的一种粗糙集方法[J].系统工程学报,2006,21(1):18-23. 被引量：11
3袁修久,何华灿.优势关系下模糊目标信息系统约简的辨识矩阵[J].空军工程大学学报（自然科学版）,2006,7(2):81-84. 被引量：9
4刘胥影,吴建鑫,周志华.一种基于级联模型的类别不平衡数据分类方法[J].南京大学学报（自然科学版）,2006,42(2):148-155. 被引量：23
5谢刚,张金隆.基于VPRS的软件项目投标风险规避群决策研究[J].中国管理科学,2006,14(2):71-76. 被引量：10
6程玉胜,张佑生,胡学钢.基于边界域的知识粗糙熵与粗集粗糙熵[J].系统仿真学报,2007,19(9):2008-2011. 被引量：16
7苗夺谦,王珏.基于粗糙集的多变量决策树构造方法[J].软件学报,1997,8(6):425-431. 被引量：120
8毕文杰,陈晓红.一种基于可变精度粗糙集的群体分类决策方法[J].系统工程,2007,25(8):94-97. 被引量：5
9Green M W. The Appropriate and Effective Use of Security Technologies in U. S. Schools: A Guide for Schools and Law En- tbreement Agencies [ EB/OL ]. [ 2008-06-05 ]. http ://www. ncjrs, gov/school/178265_1, pdf.
10Steffens J, Elagn E, Neven H. Person spotter-fast and robust system for human detection, tracking and recognition[ C ]//Pro- ceedings of IEEE International Conference on Automatic Face and Gesture Recognition. New York: IEEE, 1998.

引证文献8

1江效尧,黄兵.优势模糊区间目标粗糙集模型的群决策规则获取及应用[J].南京大学学报（自然科学版）,2012,48(4):429-435. 被引量：5
2陈轩泽,霍静,费峰,陈颖,马青玉.基于PCA与ArcGIS网络分析的图书馆阅览室管理系统[J].南京师范大学学报（工程技术版）,2012,12(2):57-63. 被引量：3
3程玉胜,江效尧,胡林生.基于粗糙集理论的协调集及其决策树构造[J].南京大学学报（自然科学版）,2012,48(6):790-796. 被引量：3
4顾沈明,叶晓敏,吴伟志.多标记粒度不完备信息系统的粗糙近似[J].南京大学学报（自然科学版）,2013,49(2):250-257. 被引量：4
5丁卫平,王建东,陈森博,程学云,沈学华.基于改进混合蛙跳算法的粗糙属性交叉熵优化约简[J].南京大学学报（自然科学版）,2014,50(2):159-166. 被引量：4
6戴志聪,吴伟志.不完备多粒度序信息系统的粗糙近似[J].南京大学学报（自然科学版）,2015,51(2):361-367. 被引量：11
7宋亚婷,韩冰,高新波.基于张量动态纹理模型的极光视频分类[J].南京大学学报（自然科学版）,2016,52(1):184-193. 被引量：2
8翟俊海,侯少星,王熙照.粗糙模糊决策树归纳算法[J].南京大学学报（自然科学版）,2016,52(2):306-312. 被引量：9

二级引证文献39

1杨璇,黄兵.多尺度优势模糊目标决策系统的粗糙集、最优尺度选择及约简[J].模糊系统与数学,2023,37(1):165-174.
2沈夏炯,薛钰,韩道军,张磊.访问控制系统中客体粒度决策方法研究[J].河南大学学报（自然科学版）,2020,0(1):63-69. 被引量：1
3黄兵.优势区间直觉模糊粗糙模型及应用[J].南京大学学报（自然科学版）,2012,48(4):367-375. 被引量：4
4王亚凤.ArcGIS在图书馆管理信息系统中的可视化应用探讨[J].图书馆工作与研究,2013(12):41-44. 被引量：4
5徐健锋,张远健,Zhou Duanning,Li Dan,李宇.基于粒计算的不确定性时间序列建模及其聚类[J].南京大学学报（自然科学版）,2014,50(1):87-94. 被引量：7
6薛占熬,刘杰,程慧茹,王朋函.基于Lukasiewicz的直觉模糊三I蕴涵算子R_(IL)[J].南京大学学报（自然科学版）,2015,51(1):99-104. 被引量：1
7顾沈明,胡超,吴伟志,王霞.多标记序信息系统的不确定性研究[J].南京大学学报（自然科学版）,2015,51(2):377-383. 被引量：3
8刘莹莹,吕跃进.基于相似度的集值信息系统属性约简算法[J].南京大学学报（自然科学版）,2015,51(2):384-389. 被引量：6
9翁世洲,吕跃进.区间粗糙数的排序方法及其应用[J].南京大学学报（自然科学版）,2015,51(4):818-825. 被引量：18
10顾沈明,万雅虹,吴伟志,徐优红.多粒度决策系统的局部最优粒度选择[J].南京大学学报（自然科学版）,2016,52(2):280-288. 被引量：7

1隗兵,戴文战.基于克隆选择算法的Hopfield网络容量提升方法[J].浙江理工大学学报（自然科学版）,2015,33(3):390-393. 被引量：2
2韩雪.基于参数选取影响BP神经网络训练结果的分析[J].智能计算机与应用,2011,1(3):43-46. 被引量：8
3熊海涛,吴俊杰,刘洪甫,刘鲁.分类中的类重叠问题及其处理方法研究[J].管理科学学报,2013,16(4):8-21. 被引量：9
4王文远,王大玲,冯时,李任斐,王琳.一种面向情感分析的微博表情情感词典构建及应用[J].计算机与数字工程,2012,40(11):6-9. 被引量：15
5左金平,郭玉栋.决策问题中粗糙集理论的应用研究[J].晋中学院学报,2007,24(3):91-93. 被引量：2
6姜国华,姜守旭,李建中.一种标签劣质XML数据上的twig查询处理的优化[J].智能计算机与应用,2011,1(2X):53-54.
7唐军广.浅谈防火墙技术及其在校园网络中的应用[J].数字技术与应用,2016,34(10):210-210.
8刘方宁.浅谈新一代物联网在电子商务中的应用[J].数字技术与应用,2011,29(10):89-89. 被引量：2
9李庆利.PLC在电气自动化中的应用初论[J].电子技术与软件工程,2014(6):253-253. 被引量：1
10韩喜君.物联网相关技术在电子商务中的应用[J].电子商务,2011,12(7):7-8. 被引量：8

南京大学学报（自然科学版）

2010年第5期

浏览历史

内容加载中请稍等...

一种新的基于粗糙集的动态样本识别算法被引量：8

参考文献22

二级参考文献54

共引文献263

同被引文献134

引证文献8

二级引证文献39

相关作者

相关机构

相关主题

浏览历史

一种新的基于粗糙集的动态样本识别算法 被引量：8

参考文献22

二级参考文献54

共引文献263

同被引文献134

引证文献8

二级引证文献39

相关作者

相关机构

相关主题

浏览历史

一种新的基于粗糙集的动态样本识别算法被引量：8