摘要
为了解决传统算法检测准确性低,复杂性高不适于电力大数据异常值检测的问题,通过密度峰值聚类算法研究了电力大数据异常值检测问题。分析了密度峰值聚类算法的聚类过程。按照聚类中心选择原则,通过相邻距离和密度的归一化乘积对聚类点的差异度进行衡量,按照差异度的统计特性与改变趋势选择最大的一组点当成聚类中心。按照z空间填充曲线与高维数据点z携带位置信息特性提出基于z的分布式密度峰值聚类算法,降低异常检测复杂性,以达到电力大数据异常值检测要求。采用优化后的密度峰值聚类算法对电力大数据异常值进行检测,在局部密度超过阈值,同时距离超过阈值的情况下,认为相应电力数据点为异常值。将基于距离的检测算法和基于密度的检测算法作为对比进行测试,结果表明:所提算法得到的异常电力数据点,和实际情况相符,和其他两种算法相比没有出现错检测和漏检测的情况。可见所提算法适于电力大数据异常值检测,且检测结果准确性高。
In order to solve the problem of low accuracy and high complexity of traditional algorithm which is not suitable for abnormal value detection of large power data,the abnormal value detection of large power data was researched by density peak clustering algorithm.The clustering process of density peak clustering algorithm was analyzed.According to the principle of cluster center selection,the difference degree of clustering points was measured by the normalized product of adjacent distance and density.According to the statistical characteristics of difference degree and changing trend,the largest group of points were selected as cluster centers.A distributed density peak clustering algorithm based on z-value was proposed according to the z-space filling curve and the location information characteristics of z-value carried by high-dimensional data points,which reduced the complexity of anomaly detection and achieves the requirement of anomaly detection for large power data.The optimized density peak clustering algorithm was used to detect the abnormal values of large power data.When the local density exceeds the threshold and the distance exceeded the threshold,the corresponding power data points were considered as abnormal values.The distance-based detection algorithm and density-based detection algorithm were compared and tested.The results show that the abnormal power data points obtained by the proposed algorithm are consistent with the actual situation.Compared with the other two algorithms,there is no case of error detection and leakage detection.It is found that the proposed algorithm is suitable for anomaly detection of large power data,and the accuracy of detection results is high.
作者
陆春光
叶方彬
赵羚
姜驰
董伟
LU Chun-guang;YE Fang-bin;ZHAO Ling;JIANG Chi;DONG Wei(State Grid Zhejiang Electric Power Co.,Ltd.Institute of Electric Power Science,Hangzhou 310007,China;Zhejiang Huayun Information Technology Co.,Ltd.,Hangzhou 310002,China)
出处
《科学技术与工程》
北大核心
2020年第2期654-658,共5页
Science Technology and Engineering
基金
国家电网有限公司科技项目(521101180017)。
关键词
密度峰值聚类
电力大数据
异常值
检测
peak density clustering
large power data
abnormal value
detection