摘要
智能电网的发展带来了海量能源数据,数据质量是开展数据价值挖掘等任务的基础。然而,多源海量光伏能源数据的采集与传输过程中不可避免地存在异常数据,因此需要进行数据清洗。目前,基于传统统计机器学习的数据清洗模型存在一定的局限性。文中提出了一种基于Transformer自编码结构的改进型K-means聚类模型,用于能源大数据清洗。该模型通过肘部法则自适应地确定聚类簇数,并利用自编码网络对聚类内数据进行压缩和重构,从而实现异常数据的检测和恢复。同时,模型利用Transformer的多头注意力机制学习数据间的相关特征,提高了对异常数据的筛查能力。在光伏发电公开数据集上的实验证明,与其他方法相比,该模型具有更好的异常数据检测效果,筛查准确率可达96%以上。此外,所提模型能在一定程度上恢复异常数据,为能源大数据应用提供了有效的支持。
The development of smart grids has brought about a massive amount of energy data,and data quality is the foundation for tasks such as data value mining.However,during the collection and transmission process of large-scale photovoltaic energy data from multiple sources,it is inevitable to encounter abnormal data,thus requiring data cleaning.Currently,traditional statistical machine learning-based data cleaning models have certain limitations.This paper proposes an improved K-means clustering model based on the Transformer autoencoder structure for energy big data cleaning.It adaptively determines the number of clusters using the elbow method and utilizes autoencoder networks to compress and reconstruct data within clusters,thereby detecting and recovering abnormal data.Additionally,the proposed model employs the multi-head attention mechanism of Transformer to learn the relevant features among the data,enhancing the screening capability for abnormal data.Experimental results on a publicly available photovoltaic power generation dataset demonstrate that,compared to other methods,the proposed model achieves better performance in detecting abnormal data,with a screening accuracy of over 96%.Moreover,it is capable of recovering abnormal data to a certain extent,providing effective support for the application of energy big data.
作者
彭勃
李耀东
龚贤夫
PENG Bo;LI Yaodong;GONG Xianfu(The Grid Planning and Research Center of Guangdong Power Grid Corporation Limited,Guangzhou 510080,China)
出处
《计算机科学》
CSCD
北大核心
2024年第S01期713-717,共5页
Computer Science
基金
中国南方电网有限责任公司科技项目037700KK52220042(GDKJXM20220906)。