期刊文献+

邻域信息修正的不完整数据多填充集成分类方法 被引量:1

Multiple Imputation-Revision Ensemble Classification with Neighborhood Information
下载PDF
导出
摘要 不完整数据集分类前需要对缺失值先填充。目前已有了一些经典的缺失值填充算法,如均值填充、K近邻填充等。它们各有优势,但这些算法对缺失值的估算易受到与缺失值相关性不大的其他数据干扰,影响缺失值填充效果,进而影响后续分类性能。针对该问题,提出一种邻域信息修正不完整数据多填充集成分类方法。该方法通过嵌入修正填充模块来优化填充过程,利用纯度和邻域半径筛选出待修正填充的近邻数据样本,并根据这些近邻数据样本对缺失值进行修正填充,进一步提升填充精度。同时,融合了多种经典填充算法优势,利用多填充的数据多样性,通过引入集成学习提升分类精确度。实验结果表明,该方法对基准数据集上的缺失值填充效果、数据分类精确度都优于对比方法,同时在真实不完整数据集上也表现出更好的分类精确度。 Missing value imputation is one of the important preprocess techniques for incomplete data classification.Numerous missing value imputation methods have been proposed over the past decades.However,these algorithms are prone to being affected by other data that is not related to the missing values,leading to imprecise imputation results and degradation of subsequent classification performance.To address this issue,this paper proposes an incomplete data classification method based on multiple imputation-revision ensemble with local information.The method incorporates an imputation-revision module that selects neighbor of the sample to be corrected and imputed based on neighborhood purity and neighborhood radius,resulting in better imputation accuracy.The method also integrates the strengths of multiple classic imputation algorithms and utilizes the diversity of multiple imputed dataset to enhance classification accuracy via ensemble learning.Experimental results demonstrate that this method outperforms compared methods in terms of imputation accuracy and classification performance on benchmark datasets,and it also exhibits superior classification accuracy on real-world incomplete datasets.
作者 朱先远 严远亭 张燕平 ZHU Xianyuan;YAN Yuanting;ZHANG Yanping(School of Information and Artificial Intelligence,Anhui Business College of Vocational Technology,Wuhu,Anhui 241002,China;School of Computer Science and Technology,Anhui University,Hefei 230601,China)
出处 《计算机工程与应用》 CSCD 北大核心 2023年第23期125-135,共11页 Computer Engineering and Applications
基金 国家自然科学基金(61872002,62272001) 安徽高校自然科学研究重点项目(KJ2021A1483,2022AH052740,2023AH052296)。
关键词 不完整数据分类 修正填充 邻域信息 集成学习 incomplete data classification imputation-revision local information ensemble learning
  • 相关文献

参考文献6

二级参考文献25

  • 1刘星毅.GBNN-填充缺失属性值算法[J].微计算机信息,2007(05X):246-248. 被引量:6
  • 2Piao Y, Piao M, Park K, et al. An ensemble correlation-based gene selection algorithm for cancer classification with gene ex- pression data[J]. Bioinformatics, 2012,28(24) : 3306-3315.
  • 3Wang Shu-lin, Li Xue-ling, Zhang Shan-wen, et al. Tumor classi- fication by combining PNN classifier ensemble with neighbor- hood rough set based gene reduction[J]. Computers in Biology and Medicine,2010,40(2) : 179-189.
  • 4Tong Mu-chen-xuan, Liu Kun-hong,Xu Chun-gui, et al. An en- semble of SVM classifiers based on gene pairs[J]. Computers in Biology and Medicine, 2013,43 (6) : 729-737.
  • 5Kohavi R,John G H. Wrappers for feature subset selection[J]. Artificial Intelligence, 1997,97 (1/2) : 273-324.
  • 6Wang El, Zhu Ji, Zou Hui. Hybrid huberized support vector ma- chines for mieroarray classification and gene selection[J]. Bioin- formatics, 2008,24 (3) : 412-419.
  • 7Bolon-Canedo V, Sanchez-Marono N, Alonso-Betanzos A. An ensemble of filters and classifiers for microarray data elassifica- tion[J]. Pattern Recognition,2012,45(1) :531-539.
  • 8Jiao Na, Miao Duo-qian. An efficient gene selection algorithm based on tolerance rough set theory[J]. Data Mining and Granu- lar Computing, 2009,5908 : 176-183.
  • 9Pawlak Z. Rough sets[J]. Computer and Information Science, 1982,11 (5) : 341-356.
  • 10Jensen R, Shen Q. Fuzzy-rough attribute reduction with applica- tion to web categorization[J]. Fuzzy Sets and Systems, 2004, 141(3) :469-485.

共引文献15

同被引文献6

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部