基于多代表点学习的RSKNN分类算法

Multi-Representatives Learning Algorithm for RSKNN Classification

下载PDF

导出

摘要 RSKNN算法是一种基于变精度粗糙集理论的k-近邻改进算法,该算法能够保证在一定分类精度的前提下,有效地降低分类的计算量,提高分类效率.但由于RSKNN算法只是简单地将每个类中的样本划分成一个核心和边界区域,并没有根据数据集本身的特点进行划分,因而存在极大的局限性.针对存在的问题,提出一种多代表点学习算法,运用结构风险最小化理论对影响分类模型期望风险的因素进行分析,并使用无监督的局部聚类算法学习优化代表点集合.在UCI公共数据集上的实验表明,该算法比RSKNN算法具有更高的分类精度. RSKNN is an improved kNN algorithm based on variable parameter rough set model. The algorithm guarantees under the premise of a certain classification accuracy, effectively reduces the computation burden of the classified samples, and improves the computation efficiency and precision of classification. But in this algorithm ,the instances of each class are simply classified into core and boundary areas. It has the limitation that it isn’t classified according the features of datasets. An efficient algorithm aiming at learning multi-representatives for RSKNN is proposed. Using the theory of structural risk minimization, a few factors that determine the expected risk of new classification model are analyzed. And an unsupervised algorithm for partial clustering is used to build an optimal set of representatives. Experimental results on UCI public datasets demonstrate that the proposed method significantly improves the accuracy of the classification.

作者余勇郭躬德陈黎飞

机构地区福建师范大学数学与计算机科学学院

出处《计算机系统应用》 2014年第11期92-98,共7页 Computer Systems & Applications

基金国家自然科学基金(61175123)

关键词近邻分类变精度粗糙集代表点分类模型上下近似 nearest neighbor classification variable precision rough set representative classification model upper and lower approximation

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献11

1Yang Q,Wu X. 10 challenging problems in data miningresearch. International Journal of Information Technology &Decision Making, 2006, 5(4): 597-604.
2李荣陆,胡运发.基于密度的kNN文本分类器训练样本裁剪方法[J].计算机研究与发展,2004,41(4):539-545. 被引量：98
3余鹰,苗夺谦,刘财辉,王磊.基于变精度粗糙集的KNN分类改进算法[J].模式识别与人工智能,2012,25(4):617-623. 被引量：32
4Guo Q Wang H,Bell D, et al. KNN model-based approach inclassification. On The Move to Meaningful Internet Systems2003: CoopIS, DOA, and ODBASE. Springer BerlinHeidelberg, 2003: 986-996.
5Guo Q Wang H, Bell D, et al. Using kNN model for automatictext categorization. Soft Computing, 2006,10(5): 423-430.
6陈黎飞,郭躬德.最近邻分类的多代表点学习算法[J].模式识别与人工智能,2011,24(6):882-888. 被引量：18
7Pawlak Z. Imprecise Categories,Approximations and RoughSets. Springer Netherlands, 1991.
8Ziarko W. Variable precision rough set model. Journal ofComputer and System Sciences, 1993,46(1): 39-59.
9Kotsiantis S,Pintelas P. Recent advances in clustering: A briefsurvey. WSEAS Trans, on Information Science andApplications, 2004,1(1): 73-81.
10Burges CJC. A tutorial on support vector machines forpattern recognition. Data Mining and KnowledgeDiscovery, 1998,2(2): 121-167.

二级参考文献40

1苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量：389
2王煜,白石,王正欧.用于Web文本分类的快速KNN算法[J].情报学报,2007,26(1):60-64. 被引量：33
3[1]D D Lewis. Naive (Bayes) at forty: The independence assumption in information retrieval. In: The 10th European Conf on Machine Learning(ECML98), New York: Springer-Verlag, 1998. 4～15
4[2]Y Yang, X Lin. A re-examination of text categorization methods. In: The 22nd Annual Int'l ACM SIGIR Conf on Research and Development in Information Retrieval, New York: ACM Press, 1999
5[3]Y Yang, C G Chute. An example-based mapping method for text categorization and retrieval. ACM Trans on Information Systems, 1994, 12(3): 252～277
6[4]E Wiener. A neural network approach to topic spotting. The 4th Annual Symp on Document Analysis and Information Retrieval (SDAIR 95), Las Vegas, NV, 1995
7[5]R E Schapire, Y Singer. Improved boosting algorithms using confidence-rated predications. In: Proc of the 11th Annual Conf on Computational Learning Theory. Madison: ACM Press, 1998. 80～91
8[6]T Joachims. Text categorization with support vector machines: Learning with many relevant features. In: The 10th European Conf on Machine Learning (ECML-98). Berlin: Springer, 1998. 137～142
9[7]S O Belkasim, M Shridhar, M Ahmadi. Pattern classification using an efficient KNNR. Pattern Recognition Letter, 1992, 25(10): 1269～1273
10[8]V E Ruiz. An algorithm for finding nearest neighbors in (approximately) constant average time. Pattern Recognition Letter, 1986, 4(3): 145～147

共引文献140

1姚学恒,张萍,闫立伟,操诚.基于机器学习的企业秘密文档自动分类方法[J].产业与科技论坛,2020,19(7):44-45.
2郑凌铭,舒胜文,陈彬,吴涵,黄建业,钱健.强台风环境下基于格点化和支持向量机的10 kV杆塔受损量预测方法[J].高电压技术,2020,46(1):42-51. 被引量：15
3李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量：96
4华北,曹先彬.基于代表样本动态生成的中文网页分类[J].计算机应用,2006,26(10):2502-2504. 被引量：2
5李订芳,胡文超,何炎祥.基于共享最近邻聚类和模糊集理论的分类器[J].控制与决策,2006,21(10):1103-1108. 被引量：5
6王煜,白石,王正欧.用于Web文本分类的快速KNN算法[J].情报学报,2007,26(1):60-64. 被引量：33
7屈军,林旭.文本分类中特征提取方法的比较与分析[J].现代计算机,2007,13(4):10-13. 被引量：8
8印鉴,谭焕云.基于χ~2统计量的kNN文本分类算法[J].小型微型计算机系统,2007,28(6):1094-1097. 被引量：13
9华北,曹先彬.基于代表样本动态生成的快速文本分类[J].计算机仿真,2007,24(6):322-325.
10王修君,沈鸿.一种基于增量学习型矢量量化的有效文本分类算法[J].计算机学报,2007,30(8):1277-1285. 被引量：14

1陈黎飞,郭躬德.最近邻分类的多代表点学习算法[J].模式识别与人工智能,2011,24(6):882-888. 被引量：18
2王江涛,梅雪,林锦国.基于Top-hat变换与主成分分析的人脸识别方法[J].计算机工程与设计,2009,30(2):395-397. 被引量：4
3兰天,郭躬德.基于RSKNN分类改进算法[J].计算机系统应用,2013,22(12):85-92.
4郑洁,秦永彬,许道云.基于Relief的特征加权壳近邻分类算法[J].计算机工程与设计,2013,34(3):951-954. 被引量：2
5曾勇,杨煜普,赵亮.基于局部均值与类均值的近邻分类[J].控制与决策,2009,24(4):547-550. 被引量：4
6马宾.一种改进的并行K_近邻网络舆情分类算法研究[J].微电子学与计算机,2015,32(6):62-66. 被引量：1
7张健飞,陈黎飞,郭躬德,李南.多代表点的子空间分类算法[J].计算机科学与探索,2011,5(11):1037-1047. 被引量：6
8张志强,郑家恒.基于加权类轴的Web文本分类方法研究[J].计算机应用,2004,24(2):148-150. 被引量：3
9马金娜,田大钢.基于SVM的中文文本自动分类研究[J].计算机与现代化,2006(8):5-8. 被引量：8
10胡正平,贾千文.基于SRM自组织多区域覆盖的可拒绝近邻分类算法研究[J].电子与信息学报,2009,31(2):293-296. 被引量：3

计算机系统应用

2014年第11期

浏览历史

内容加载中请稍等...

基于多代表点学习的RSKNN分类算法

参考文献11

二级参考文献40

共引文献140

相关作者

相关机构

相关主题

浏览历史