Imbalance classification techniques have been frequently applied in many machine learning application domains where the number of the majority (or positive) class of a dataset is much larger than that of the minori...Imbalance classification techniques have been frequently applied in many machine learning application domains where the number of the majority (or positive) class of a dataset is much larger than that of the minority (or negative) class. Meanwhile, feature selection (FS) is one of the key techniques for the high-dimensional classification task in a manner which greatly improves the classification performance and the computational efficiency. However, most studies of feature selection and imbalance classification are restricted to off-line batch learning, which is not well adapted to some practical scenarios. In this paper, we aim to solve high-dimensional imbalanced classification problem accurately and efficiently with only a small number of active features in an online fashion, and we propose two novel online learning algorithms for this purpose. In our approach, a classifier which involves only a small and fixed number of features is constructed to classify a sequence of imbalanced data received in an online manner. We formulate the construction of such online learner into an optimization problem and use an iterative approach to solve the problem based on the passive-aggressive (PA) algorithm as well as a truncated gradient (TG) method. We evaluate the performance of the proposed algorithms based on several real-world datasets, and our experimental results have demonstrated the effectiveness of the proposed algorithms in comparison with the baselines.展开更多
调度自动化系统的异常数据辨识和数据质量是电力系统精准调度和预测的基础,为数据分析提供了可靠的保障。针对异常数据对调度自动化系统的影响,提出一种调度自动化系统主子站通道异常数据辨识模型。首先,构建基于参数自适应的密度噪声...调度自动化系统的异常数据辨识和数据质量是电力系统精准调度和预测的基础,为数据分析提供了可靠的保障。针对异常数据对调度自动化系统的影响,提出一种调度自动化系统主子站通道异常数据辨识模型。首先,构建基于参数自适应的密度噪声空间聚类算法(Parameter Adaptation-Density Based Spatial Clustering of Applications With Noise,PADBSCAN)对异常数据进行辨识。此外,基于自相关性理论和数据传输的周期性,分析和挖掘传输数据以及用电行为中的潜在规律,通过用电相似度判据消除时间偏移的影响,对伪异常数据进行辨识。基于山东省某市的调度自动化系统主子站通道实测数据对本文方法有效性进行验证,结果表明所提方法能够有效辨识异常数据,能够满足实际工程需要。展开更多
基金This research was supported by the Guangzhou Key Laboratory of Robotics and Intelligent Software under Grant No. 15180007, the Fundamental Research Funds for the Central Universities of China under Grant Nos. D215048w and 2015ZZ029, and the National Natural Science Foundation of China under Grant Nos. 61005061 and 61502177.
文摘Imbalance classification techniques have been frequently applied in many machine learning application domains where the number of the majority (or positive) class of a dataset is much larger than that of the minority (or negative) class. Meanwhile, feature selection (FS) is one of the key techniques for the high-dimensional classification task in a manner which greatly improves the classification performance and the computational efficiency. However, most studies of feature selection and imbalance classification are restricted to off-line batch learning, which is not well adapted to some practical scenarios. In this paper, we aim to solve high-dimensional imbalanced classification problem accurately and efficiently with only a small number of active features in an online fashion, and we propose two novel online learning algorithms for this purpose. In our approach, a classifier which involves only a small and fixed number of features is constructed to classify a sequence of imbalanced data received in an online manner. We formulate the construction of such online learner into an optimization problem and use an iterative approach to solve the problem based on the passive-aggressive (PA) algorithm as well as a truncated gradient (TG) method. We evaluate the performance of the proposed algorithms based on several real-world datasets, and our experimental results have demonstrated the effectiveness of the proposed algorithms in comparison with the baselines.
文摘调度自动化系统的异常数据辨识和数据质量是电力系统精准调度和预测的基础,为数据分析提供了可靠的保障。针对异常数据对调度自动化系统的影响,提出一种调度自动化系统主子站通道异常数据辨识模型。首先,构建基于参数自适应的密度噪声空间聚类算法(Parameter Adaptation-Density Based Spatial Clustering of Applications With Noise,PADBSCAN)对异常数据进行辨识。此外,基于自相关性理论和数据传输的周期性,分析和挖掘传输数据以及用电行为中的潜在规律,通过用电相似度判据消除时间偏移的影响,对伪异常数据进行辨识。基于山东省某市的调度自动化系统主子站通道实测数据对本文方法有效性进行验证,结果表明所提方法能够有效辨识异常数据,能够满足实际工程需要。