摘要
为了对电网设备档案数据中无法提炼错误规则的数据问题进行自动诊断,提高数据质量,文章利用大数据机器学习技术,运用机器学习算法,对数据进行自动检测;基于Spark分布式内存计算,利用K-Means聚类算法对档案数据进行聚类训练,再对训练后数据进行分析和处理。试验证明,基于本方法论形成的自动诊断工具能够大幅降低在数据治理工作中的人力投入,减少工作量,降低工作成本,并且可以获得比人力筛查更详细更准确的结果。
In order to automatically diagnose the data problems that cannot be extracted from the error rules in the grid equipment archival data, based on big data technology, this paper used machine learning to automatically detect the data for such problems. Based on the distributed memory calculation of Spark, the K-Means clustering algorithm is used to cluster the archival data, and then the data after training are processed and analyzed. The automatic diagnosis tool based on this method can greatly reduce labor cost, workload and the cost of work, and achieve more detailed and accurate results than human screening.
作者
龙婧
刘伟
殷胜
LONG Jing;LIU Wei;YIN Sheng(Hubei Huazhong Electric Power Technology Development Co.,Ltd.,Wuhan 430000,Chin)
出处
《电力信息与通信技术》
2018年第7期21-27,共7页
Electric Power Information and Communication Technology
关键词
大数据
机器学习
电网设备档案数据
数据异常
自动诊断
big data
machine learning
grid equipment archiving data
abnormal data
automatic diagnosis