概念漂移处理大多采用集成学习策略,然而这些方法多数不能及时提取漂移发生后新分布数据的关键信息,导致模型性能较差。针对这个问题,本文提出一种基于串行交叉混合集成的概念漂移检测及收敛方法(Concept drift detection and convergen...概念漂移处理大多采用集成学习策略,然而这些方法多数不能及时提取漂移发生后新分布数据的关键信息,导致模型性能较差。针对这个问题,本文提出一种基于串行交叉混合集成的概念漂移检测及收敛方法(Concept drift detection and convergence method based on hybrid ensemble of serial and cross,SC_ensemble)。在流数据处于平稳状态下,该方法通过构建串行基分类器进行集成,以提取代表数据整体分布的有效信息。概念漂移发生后,在漂移节点附近构建并行的交叉基分类器进行集成,提取代表最新分布数据的局部有效信息。通过串行基分类器和交叉基分类器的混合集成,该方法兼顾了流数据包含的整体分布信息,又强化了概念漂移发生时的重要局部信息,使集成模型中包含了较多“好而不同”的基学习器,实现了漂移发生后学习模型的高效融合。实验结果表明,该方法可使在线学习模型在漂移发生后快速收敛,提高了模型的泛化性能。展开更多
One recent area of interest in computer science is data stream management and processing. By ‘data stream', we refer to continuous and rapidly generated packages of data. Specific features of data streams are imm...One recent area of interest in computer science is data stream management and processing. By ‘data stream', we refer to continuous and rapidly generated packages of data. Specific features of data streams are immense volume, high production rate, limited data processing time, and data concept drift; these features differentiate the data stream from standard types of data. An issue for the data stream is classification of input data. A novel ensemble classifier is proposed in this paper. The classifier uses base classifiers of two weighting functions under different data input conditions. In addition, a new method is used to determine drift, which emphasizes the precision of the algorithm. Another characteristic of the proposed method is removal of different numbers of the base classifiers based on their quality. Implementation of a weighting mechanism to the base classifiers at the decision-making stage is another advantage of the algorithm. This facilitates adaptability when drifts take place, which leads to classifiers with higher efficiency. Furthermore, the proposed method is tested on a set of standard data and the results confirm higher accuracy compared to available ensemble classifiers and single classifiers. In addition, in some cases the proposed classifier is faster and needs less storage space.展开更多
基金国家自然科学基金( the National Natural Science Foundation of China under Grant No.60573174)安徽省自然科学基金( the Natural Science Foundation of Anhui Province of China under Grant No.050420207)
文摘概念漂移处理大多采用集成学习策略,然而这些方法多数不能及时提取漂移发生后新分布数据的关键信息,导致模型性能较差。针对这个问题,本文提出一种基于串行交叉混合集成的概念漂移检测及收敛方法(Concept drift detection and convergence method based on hybrid ensemble of serial and cross,SC_ensemble)。在流数据处于平稳状态下,该方法通过构建串行基分类器进行集成,以提取代表数据整体分布的有效信息。概念漂移发生后,在漂移节点附近构建并行的交叉基分类器进行集成,提取代表最新分布数据的局部有效信息。通过串行基分类器和交叉基分类器的混合集成,该方法兼顾了流数据包含的整体分布信息,又强化了概念漂移发生时的重要局部信息,使集成模型中包含了较多“好而不同”的基学习器,实现了漂移发生后学习模型的高效融合。实验结果表明,该方法可使在线学习模型在漂移发生后快速收敛,提高了模型的泛化性能。
文摘One recent area of interest in computer science is data stream management and processing. By ‘data stream', we refer to continuous and rapidly generated packages of data. Specific features of data streams are immense volume, high production rate, limited data processing time, and data concept drift; these features differentiate the data stream from standard types of data. An issue for the data stream is classification of input data. A novel ensemble classifier is proposed in this paper. The classifier uses base classifiers of two weighting functions under different data input conditions. In addition, a new method is used to determine drift, which emphasizes the precision of the algorithm. Another characteristic of the proposed method is removal of different numbers of the base classifiers based on their quality. Implementation of a weighting mechanism to the base classifiers at the decision-making stage is another advantage of the algorithm. This facilitates adaptability when drifts take place, which leads to classifiers with higher efficiency. Furthermore, the proposed method is tested on a set of standard data and the results confirm higher accuracy compared to available ensemble classifiers and single classifiers. In addition, in some cases the proposed classifier is faster and needs less storage space.