摘要
为了解决现有异常数据识别方法异常数据误识率较高、清洗时间较长的问题,提出基于电力大数据清洗模型的异常数据识别方法研究。在分布式文件系统上读取电力大数据,采用并行CURE聚类算法获取正常电力大数据,以此为基础,通过正常电力大数据边界特点分析,选择正常电力大数据边界样本,以选择的正常电力大数据边界样本为异常数据识别依据,设置异常数据识别规则,执行异常数据识别算法,利用电力大数据清洗模型清洗上述识别的异常数据,得到精确的电力大数据,实现了异常数据的识别。测试结果显示,与现有两种异常数据识别方法相比较,提出的异常数据识别方法降低了异常数据误识率,减少了异常数据清洗时间,充分说明提出的异常数据识别方法具备更好的识别性能。
In order to solve the problems of high error recognition rate and long cleaning time of abnormal data in existing abnormal data recognition methods,a method of abnormal data recognition based on large data cleaning model of electric power is proposed.On the basis of reading large data of electric power on distributed file system and using parallel CURE clustering algorithm to obtain large data of normal electric power,the boundary samples of large data of normal electric power are selected by analyzing the boundary characteristics of large data of normal electric power.The boundary samples of large data of normal electric power are selected as the basis of identifying abnormal data.Abnormal data recognition rules,the implementation of abnormal data recognition algorithm,the use of power big data cleaning model to clean the abnormal data identified above,get accurate large data of power,realize the recognition of abnormal data.The test results show that,compared with the existing two methods of anomaly data recognition,the proposed method reduces the error rate of anomaly data,reduces the cleaning time of anomaly data,fully demonstrates that the proposed method of anomaly data recognition has better recognition performance.
作者
许文婧
XU Wen-jing(State Grid Xinjiang Electric Power Co.,Ltd.,Yili Electric Power Supply Company,XinJiang Yining 835000,China)
出处
《新一代信息技术》
2019年第17期41-46,共6页
New Generation of Information Technology
基金
国家电网有限公司总部科技项目资助(项目编号:B3441617K005)。
关键词
电力大数据清洗模型
异常数据
识别
清洗
Power big data cleaning model
Abnormal data
Recognition
Cleaning