摘要
针对现有学习算法难以有效提高不均衡在线贯序数据中少类样本分类精度的问题,提出一种基于不均衡样本重构的加权在线贯序极限学习机。该算法从提取在线贯序数据的分布特性入手,主要包括离线和在线两个阶段:离线阶段主要采用主曲线构建少类样本的可信区域,并通过对该区域内样本进行过采样,来构建符合样本分布趋势的均衡样本集,进而建立初始模型;而在线阶段则对贯序到达的数据根据训练误差赋予各样本相应权重,同时动态更新网络权值。采用UCI标准数据集和澳门实测气象数据进行实验对比,结果表明,与现有在线贯序极限学习机(OS-ELM)、极限学习机(ELM)和元认知在线贯序极限学习机(MCOS-ELM)相比,所提算法对少类样本的识别能力更高,且所提算法的模型训练时间与其他三种算法相差不大。结果表明在不影响算法复杂度的情况下,所提算法能有效提高少类样本的分类精度。
Many traditional machine learning methods tend to get biased classifier which leads to low classification precision for minor class in imbalanced online sequential data. To improve the classification accuracy of minor class, a new weighted online sequential extreme learning machine based on imbalanced sample-reconstruction was proposed. The algorithm started from exploiting distributed characteristics of online sequential data, and contained two stages. In offline stage, the principal curve was introduced to construct the confidence region, where over-sampling was achieved for minor class to construct the equilibrium sample set which was consistent with the sample distribution trend, and then the initial model was established. In online stage, a new weighted method was proposed to update sample weight dynamically, where the value of weight was related to training error. The proposed method was evaluated on UCI dataset and Macao meteorological data. Compared with the existing methods, such as Online Sequential-Extreme Learning Machine (OS-ELM), Extreme Learning Machine (ELM) and Meta-Cognitive Online Sequential- Extreme Learning Machine (MCOS-ELM), the experimental results show that the proposed method can identify the minor class with a higher ability. Moreover, the training time of the proposed method has not much difference compared with the others, which shows that the proposed method can greatly increase the minor prediction accuracy without affecting the complexity of algorithm.
出处
《计算机应用》
CSCD
北大核心
2015年第6期1605-1610,共6页
journal of Computer Applications
基金
国家自然科学基金资助项目(U1204609)
中国博士后科学基金资助项目(2014M550508)
河南省基础与前沿技术研究计划项目(132300410430)
关键词
样本重构
极限学习机
主曲线
过采样
不均衡数据
sample-reconstruction
Extreme Learning Machine (ELM)
principal curve
over-sampling
imbalanced data