摘要
为了快速适应非平稳环境中工业数据流的分布变化,需要在非结构化和噪声干扰的数据中准确、实时的完成概念漂移的检测.本文提出了一种基于多元区域集划分的工业数据流概念漂移检测算法(Concept Drift detection-Multivariate region set Partition,CDMP).首先基于实例模糊密度进行多元区域集划分,根据划分的若干模糊分区集合,识别概念漂移发生的区域.概念漂移的持续发生会显著降低基于多元区域集构建的模型的分类性能,CDMP通过构建多元历史模型池来保留具有多样性的历史模型,以降低模型调整或再训练造成的性能损耗,同时保证概念漂移检测中准确性.CDMP在不同数据集上进行了性能测试.实验结果表明,CDMP实现了对历史模型多样性的保留和重用,能够在不同噪声水平的工业物联网环境中实现对重现型、突发型等多类型概念漂移的准确检测.
To adapt to the rapidly changing distribution patterns generated in non-stationary industrial environments,it has become necessary to accurately and timely detect concept drift in unstructured and noisy data streams.In this study,a concept drift detection-multivariate region set partition(CDMP)algorithm for industrial data streams is proposed.The CD⁃MP algorithm first performs multivariate region set partition based on the fuzzy density of data instances,and identifies the region in which concept drift occurs through a set of fuzzy partitions.The persistent occurrence of concept drift can signifi⁃cantly degrade the classification performance of models built on multivariate region sets.To address this issue,CDMP builds a historical model pool that retains diverse historical models,thus reducing the performance loss caused by model ad⁃justment or retraining while ensuring the accuracy of concept drift detection.CDMP's performance is tested on different datasets.Experimental results show that CDMP preserves and reuses historical models with diversity,and can accurately detect different types of concept drift,including reoccurring and sudden drift,in industrial IoT environments with different levels of noise interference.
作者
韩光洁
赵腾飞
刘立
张帆
徐政伟
HAN Guang-jie;ZHAO Teng-fei;LIU Li;ZHANG Fan;XU Zheng-wei(College of Internet of Things Engineering,Hohai University,Changzhou,Jiangsu 213022,China)
出处
《电子学报》
EI
CAS
CSCD
北大核心
2023年第7期1906-1916,共11页
Acta Electronica Sinica
基金
国家自然科学基金(No.62002099)
江苏省自然科学基金(No.BK20200184)
常州市科技项目(No.CJ20220052)
机器人学国家重点实验室联合开放基金(No.2022-KF-22-10)。
关键词
工业物联网
概念漂移
多元区域集
实例模糊分区
多样性历史模型
industrial Internet of things
concept drift
multivariate region sets
instance fuzzy partition
diverse his⁃tory models