期刊文献+

融合簇边界移动与自适应合成的混合采样算法 被引量:4

Mixed-Sampling Algorithm Combining Cluster Boundary Movement and Adaptive Synthesis
下载PDF
导出
摘要 针对伪负采样算法(Pseudo-Negative Sampling,PNS)存在的类内子聚集和类别重叠问题,提出一种融合簇边界负样本移动策略(Cluster Boundary Negative Movement Strategy,CBNMS)与自适应正样本合成技术(Adaptive Pos⁃itive Synthesis Technology,ADPST)的改进混合采样算法(Improved Cluster Boundary Negative Movement Strategy,ICB⁃NMS),以提升非均衡数据的整体分类性能和正类识别精度.CBNMS策略采用凝聚层次聚类对正负类样本进行划分,并通过各局部样本间相似关系识别潜在负类中且与正类相关性较大的簇边界负样本,提高采样的局部精确性和时效性.为进一步加强CBNMS策略对正样本重叠区域的识别性能,ICBNMS算法在簇边界负样本移动均衡化基础上,引入ADPST技术,利用稀疏度与距离复合因子组合加权以自适应确定最优样本生成区域,从而有效削弱样本的重叠性且丰富样本的多样性.实验结果表明,相比其他采样算法,ICBNMS算法在10个非均衡数据集的多组实验中G-mean和Fmeasure等指标获得最优值,且时间效率比CDSMOTE和PNS算法分别提升了32.27%和27.88%,凸显出更优越的鲁棒性和泛化性. For the problem of intra-class sub-gathering and class-overlapping in pseudo-negative sampling(PNS)al⁃gorithm,an improved mixed-sampling algorithm combining cluster boundary negative movement strategy(CBNMS)and adaptive positive synthesis technology(ADPST)is proposed to boost the overall classification performance and positive class identification accuracy of imbalanced data.The CBNMS strategy adopts AGENS(Agglomerative Hierarchical Cluster)to divide positive and negative samples,identifies the cluster boundary negative samples in the potential negative class with a large correlation with the positive class by the similar relationship between each local sample,and increases the local accu⁃racy and timeliness of sampling.In order to further strengthen the identification performance of the CBNMS strategy for the overlap area of positive samples,the ICBNMS(Improved Cluster Boundary Negative Movement Strategy)algorithm intro⁃duces ADPST technology on the basis of moving equalization of negative samples at the cluster boundary and utilizes the combination of sparsity and distance composite factor weighting to adaptively determine the optimal sample generation ar⁃ea,thereby effectively weakening the overlap of samples and enriching the diversity of samples.Experiment results show that compared with other sampling algorithms,the ICBNMS algorithm can obtain the optimal values of G-mean,F-measure and other indicators in multiple experiments of 10 imbalanced data sets,and its time efficiency has improved by 32.27%and 27.88%respectively compared with the CDSMOTE and PNS algorithms,highlighting more superior robustness and generalization.
作者 高雷阜 张梦瑶 赵世杰 GAO Lei-fu;ZHANG Meng-yao;ZHAO Shi-jie(Institute for Optimization and Decision Analytics,Liaoning Technical University,Fuxin,Liaoning 123000,China;Institute of Optimization and Decision,Liaoning Technical University,Fuxin,Liaoning 123000,China)
出处 《电子学报》 EI CAS CSCD 北大核心 2022年第10期2517-2529,共13页 Acta Electronica Sinica
基金 辽宁省教育厅重点攻关项目(No.LJ2019ZL001)。
关键词 非均衡数据分类 凝聚层次聚类 簇边界负样本移动 自适应正样本合成 混合采样 imbalanced data classification agglomerative hierarchical cluster cluster boundary negative sample movement adaptive positive sample synthesis mixed-sampling
  • 相关文献

参考文献1

二级参考文献4

共引文献29

同被引文献38

引证文献4

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部