摘要
针对现有算法对贯序到达的密度型不均衡数据分类效果不佳的缺陷,提出一种基于粒度划分的在线贯序极限学习机算法.离线阶段,根据数据分布特性对多类样本进行粒度划分,用粒心代替原有样本,建立初始模型;在线阶段,根据更新后的分布特性对多类边界数据进行二次粒度划分,替换原有边界数据,并动态更新网络权值.理论分析证明该算法存在信息损失上界.实验结果表明,该算法能有效提高贯序不均衡数据上的整体泛化性能和分类效率.
Aiming at the shortcomings of the present classification algorithms on density-based imbalanced data which are selected sequentially, an online sequential extreme learning machine based on granular division is proposed. In the offline stage, majority class samples are divided by using granularity according to the data distribution property, and the centre of granule is introduced for replacing the samples in this granule. In the online stage, the boundary majority samples are divided again by using granularity according to the new updated distribution property, and then are replaced by the new centre of granule to update network weight dynamically. Furthermore, a theoretical proof is given to testify the proposed algorithm had upper bound of information loss. The experimental results show that the proposed method can improve the total generalization performance and classification efficiency compared with some state-of-the-art algorithms.
出处
《控制与决策》
EI
CSCD
北大核心
2016年第12期2147-2154,共8页
Control and Decision
基金
国家自然科学基金项目(U1204609)
中国博士后科学基金项目(2014M550508)
河南省高校科技创新人才计划项目(15HASTIT022)
河南师范大学优秀青年基金项目(14YQ007)
河南省高校青年骨干教师计划项目(2014GGJS-046)
关键词
极限学习机
粒度划分
贯序不均衡数据
欠取样
extreme learning machine
granular division
sequential imbalanced data
under-sampling