摘要
工业监测数据中正常与异常状态数据一般存在非平衡性,而传统的过采样非平衡数据处理方法往往在解决非线性、高维含噪的非平衡问题时不能获得满意的模式分类效果.本文利用流形学习的非线性降维,提出一种流形嵌入过采样方法,为有机结合流形学习与过采样的非平衡数据模式分类方法提供了统一框架.研究结果表明:该方法采用过采样平衡数据在流形空间的低维嵌入数据直接完成模式分类,可以减小流形嵌入空间到原始数据空间反映射的计算代价和模式分类成本.另外,流形学习可以有效保持原始数据结构特性,在流形嵌入空间的过采样可以实现更符合原始数据特性的非线性插值.面向TE过程和矿山微震2种具有不同规模和特性的非平衡工业监测数据集,F1指标分别平均提升了21.94%和37.34%,AUC指标分别提升了37.85%和10.64%,从而验证了所提方法在解决较大数据规模的非平衡模式分类问题时,具有稳定良好的分类效果.
The normal and abnormal samples in industrial monitoring data sets are usually imbalanced.Traditional over-sampling methods usually cannot obtain a satisfactory pattern recognition result,especially when the data is non-linear distributed or high dimensional noise exists.In this paper,a manifold embedded over-sampling method was proposed based on the nonlinear dimensional reduction characteristic of manifold learning,and the method provides a unified framework for the combination of manifold learning methods and over-sampling methods.Differing from traditional methods,the classifier is trained in the manifold data space rather than observation space based on the balanced data set after manifold learning and over-sampling process.Therefore,it can reduce the computation and classification cost of some methods which re-map the generated samples from manifold space to original data space.Furthermore,manifold learning can maintain the structure of original data set,so over-sampling based on themanifold space data set can produce qualified samples satisfying the non-linear structure of observation data space.Experiments were implemented under two imbalanced industrial data sets(TE Process and mine micro-seism)with different attributes and data size,and the proposed method increased the average F1 value by 21.94%and 37.34%and the average AUCvalue by37.85%and 10.64%respectively.The results show that the proposed method has a more reliable and realizable performance for solving the imbalanced pattern classification problem in industrial scenes.
作者
程健
杨凌凯
崔宁
郭一楠
CHENG Jian;YANG Lingkai;CUI Ning;GUO Yinan(School of Information and Control Engineering,China University of Mining and Technology,Xuzhou,Jiangsu,221116,China)
出处
《中国矿业大学学报》
EI
CAS
CSCD
北大核心
2018年第6期1325-1333,共9页
Journal of China University of Mining & Technology
基金
国家重点研发计划项目(2016YFC0801406)
国家自然科学基金项目(61573361)
江苏省六大高峰人才项目(2017-DZXX-046)
中国矿业大学学科前沿研究专项(2015XKQY19)
关键词
流形学习
过采样
非平衡数据
模式分类
TE过程
矿山微震
manifold learning
over-sampling
imbalanced data
pattern classification
TE process
mine micro-seism