摘要
能源互联网架构下,电力营销大数据是支撑智能电网众多高级应用的关键基础,数据清洗对于电力营销大数据更是极为重要。然而,数据缺失问题会不可避免地出现在实际电网运行环节中,严重影响数据的分析和使用。针对上述问题,文章以Spark大数据在线处理平台为基础,提出了融合相似用户聚类和奇异值阈值理论的在线数据清洗框架和方法。借助奇异值分解,证明了电力营销数据具有近似低秩特性。以此为基础,考虑电力用户的用电差异,提出了一种融合改进K最近邻算法和奇异值阈值理论的在线数据清洗框架和方法。同时,针对奇异值阈值模型计算缓慢问题,提出采用滑动时间窗在线修复策略,加快修复速度,提升修复精度。最后,通过河北省某电力营销数据验证了所提算法的有效性,实验结果显示该在线修复算法能够更快速、高效地修复大规模电力营销缺省数据。
Under the framework of energy Internet,power marketing big data is the foundation to support many advanced applications of smart grid,and data cleaning is extremely important for power marketing big data.However,the data missing problem will inevitably appear in the actual operation of power grid,which greatly affects the analysis and use of data.Aiming at the above problem,this paper proposes an online data cleaning framework and method based on spark platform,which combines similar user clustering and singular value thresholding theory.Firstly,with the help of singular value decomposition,it is proved that the power marketing data has the characteristics of approximate low rank.On this basis,considering the power consumption difference of power users,this paper proposes an online data cleaning framework and method which integrates the improved K-nearest neighbor clustering and the theory of singular value thresholding.Meanwhile,in order to solve the problem of slow calculation of singular value thresholding model,a sliding time window online recovery strategy is proposed to accelerate the repair speed and improve the recovery accuracy.Finally,the effectiveness of the proposed algorithm is verified by power marketing data of Hebei Province.The experimental results show that the online recovery algorithm can repair the large-scale missing data of power marketing more quickly and effectively.
作者
马红明
马浩
杨迪
吴宏波
刘家丞
李骥
MA Hongming;MA Hao;YANG Di;WU Hongbo;LIU Jiacheng;LI Ji(State Grid Hebei Marketing Service Center,Shijiazhuang 050021,China;State Grid Hebei Electric Power Co.,Ltd.,Shijiazhuang 050021,China;Ministry of Education Key Lab for Intelligent Networks and Network Security,Xi'an Jiaotong University,Xi'an 710049,China)
出处
《电测与仪表》
北大核心
2024年第9期120-126,共7页
Electrical Measurement & Instrumentation
基金
国家自然科学基金资助项目(61773308)。
关键词
数据清洗
电力营销数据
缺省数据恢复
奇异值阈值算法
data cleaning
power marketing data
missing data recovery
singular value thresholding algorithm