摘要
为应对抽样不均匀带来的影响,以基于代表的分类算法为基础,提出一种用于符号型数据分类的留一法集成学习分类算法(LOOELCA)。首先采用留一法获得n个小训练集,其中n为初始训练集大小。然后使用每个训练集构建独立的基于代表的分类器,并标注出分类错误的分类器及对象。最后,标注分类器和原始分类器形成委员会并对测试集对象进行分类。如委员会表决一致,则直接给该测试对象贴上类标签;否则,基于k最近邻(k NN)算法并利用标注对象对测试对象分类。在UCI标准数据集上的实验结果表明,LOOELCA与基于代表的粗糙集覆盖分类(RBC-CBNRS)算法相比,精度平均提升0. 35~2. 76个百分点,LOOELCA与ID3、J48、Na6ve Bayes、OneR等方法相比也有更高的分类准确率。
In order to response the effect of sampling non-uniformity,based on the representative-based classification algorithm,a Leave-One-Out Ensemble Learning Classification Algorithm(LOOELCA)for symbolic data classification was proposed.Firstly,n small training sets were obtained through leave-one-out methods,where n is the initial training set size.Then independent representative-based classifiers were built by using training sets,and the misclassified classifiers and objects were marked out.Finally,the marked classifier and the original classifier formed a committee and the test set objects were classified.If the committee voted the same,the test object was directly labeled with a class label;otherwise,the test object was classified based on the k-Nearest Neighbor(kNN)algorithm and the marked objects.The experimental results on the UCI standard dataset show that the accuracy of LOOELCA improved 0.35-2.76 percentage points on average compared with the Representative-Based Classification through Covering-Based Neighborhood Rough Set(RBC-CBNRS);compared with ID3,J48,Na ve Bayes,OneR and other methods,LOOELCA also has higher classification accuracy.
作者
王轩
张林
高磊
蒋昊坤
WANG Xuan;ZHANG Lin;GAO Lei;JIANG Haokun(School of Computer Science,Southwest Petroleum University,Chengdu Sichuan 610500,China)
出处
《计算机应用》
CSCD
北大核心
2018年第10期2772-2777,共6页
journal of Computer Applications
基金
国家自然科学基金资助项目(61379089
41604114)~~
关键词
代表
粗糙集
邻域
留一法
集成学习
representative
rough set
neighborhood
leave-one-out
ensemble learning