期刊文献+

基于双层采样的主动式数据流挖掘方法

Active data stream mining methodological based on two-layer sampling
下载PDF
导出
摘要 为解决传统数据流分类算法难以解决动态数据流环境中概念变化和样本标注等难题,根据主动学习原理,提出基于双层采样的主动式数据流挖掘方法。该方法的采样策略分别基于学习模型的改变期望和误差缩减两个方面设计实现,选择出具有代表性和信息量丰富的未标注样本,经专家标注后增量更新学习模型;采用聚类方法实现局部感知的概念漂移检测,以增强采样策略的有效性。试验结果表明:主动式数据流挖掘方法在降低样本标注代价的同时,可提高模型的分类能力和概念漂移的适应性,相比其他数据流挖掘方法具有一定优势。 In order to solve the problems that traditional data stream classification algorithms are difficult to overcome concept changes and sample labeling in the dynamic data stream environment, a new active learning data stream mining method based on two-layer sampling is proposed. The sampling strategy of the method is designed and implemented based on the change expectation of the learning model and error reduction, and selecting representative and informative unlabeled samples are labeled by experts to form incrementally update the learning model. At the same time, the clustering method is used to realize the concept drift detection of local awareness, which has enhanced the effectiveness of the sample sampling strategy. The simulation results show that the active data stream mining method proposed in this study can improve the classification ability of the model and the adaptability of concept drift while reducing the cost of sample labeling, and it has certain advantages to compare with other data stream mining methods.
作者 张匡燕 刘三民 李京阳 ZHANG Kuangyan;LIU Sanmin;LI Jingyang(School of Computer and Information,Anhui Polytechnic University,Wuhu 241000,China)
出处 《天津理工大学学报》 2022年第6期52-57,共6页 Journal of Tianjin University of Technology
基金 安徽省自然科学基金资助项目(1608085MF147) 安徽省高校自然科学研究重大项目(KJ2019ZD15)。
关键词 数据流挖掘 主动学习 聚类分析 概念漂移 样本标注 data stream mining active learning cluster analysis concept drift sample labeling
  • 相关文献

参考文献2

二级参考文献5

共引文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部