摘要
为减少数据缓存成本,提高负荷数据在配电网规划设计、智能分析等领域的可用性,充分有效地对大规模、混杂、不精确的监测或采集负荷数据进行在线清洗,保证每个周期的时序数据得到一致的偏差检测和精确修复,在分析不同类型异常负荷数据产生原因和分布特点的基础上,提出一种面向大规模配电网负荷数据的在线清洗与修复方法,包括基于密度的负荷数据流异常辨识方法和基于协同过滤推荐算法的负荷数据修复方法。为突破配电网负荷大数据在线分析性能瓶颈,还在Hadoop平台上给出相应的分布式并行解决方案,通过使用实际配电网运行中的负荷数据进行验证,结果表明所提算法和框架能够有效预处理配电网负荷数据,具有实际应用价值。
In order to improve data availability in field of distribution network planning and intelligence analysis with reduced data cache cost, effectively analyze large-scale, mixed and inaccurately monitored or collected load data online, and to ensure consistent deviation detection and accurate repair for time series data in each cycle, an online data cleaning and repair method for large-scale distribution network load data is proposed based on analysis of different types of abnormal load causes and distribution features, including abnormal load steam identification method on density and data repair method on collaborative filtering recommendation algorithm. To break through bottlenecks in online data analysis performance for distribution network load, parallel solution on Hadoop platform is given. Verified with actual distribution network operation data, result shows that the proposed algorithm and frame could get effective data preprocessing and yield favorable significance in practice and research.
出处
《电网技术》
EI
CSCD
北大核心
2015年第11期3134-3140,共7页
Power System Technology
基金
国家电网公司科技项目(EPRIPDKJ[2014]3763号)~~
关键词
数据清洗
流数据
大规模配电网
在线清洗
data cleaning
stream data
large-scale distribution network
online cleaning