Missing value is one of the main factors that cause dirty data.Without high-quality data,there will be no reliable analysis results and precise decision-making.Therefore,the data warehouse needs to integrate high-qual...Missing value is one of the main factors that cause dirty data.Without high-quality data,there will be no reliable analysis results and precise decision-making.Therefore,the data warehouse needs to integrate high-quality data consistently.In the power system,the electricity consumption data of some large users cannot be normally collected resulting in missing data,which affects the calculation of power supply and eventually leads to a large error in the daily power line loss rate.For the problem of missing electricity consumption data,this study proposes a group method of data handling(GMDH)based data interpolation method in distribution power networks and applies it in the analysis of actually collected electricity data.First,the dependent and independent variables are defined from the original data,and the upper and lower limits of missing values are determined according to prior knowledge or existing data information.All missing data are randomly interpolated within the upper and lower limits.Then,the GMDH network is established to obtain the optimal complexity model,which is used to predict the missing data to replace the last imputed electricity consumption data.At last,this process is implemented iteratively until the missing values do not change.Under a relatively small noise level(α=0.25),the proposed approach achieves a maximum error of no more than 0.605%.Experimental findings demonstrate the efficacy and feasibility of the proposed approach,which realizes the transformation from incomplete data to complete data.Also,this proposed data interpolation approach provides a strong basis for the electricity theft diagnosis and metering fault analysis of electricity enterprises.展开更多
基金This research was funded by the National Nature Sciences Foundation of China(Grant No.42250410321).
文摘Missing value is one of the main factors that cause dirty data.Without high-quality data,there will be no reliable analysis results and precise decision-making.Therefore,the data warehouse needs to integrate high-quality data consistently.In the power system,the electricity consumption data of some large users cannot be normally collected resulting in missing data,which affects the calculation of power supply and eventually leads to a large error in the daily power line loss rate.For the problem of missing electricity consumption data,this study proposes a group method of data handling(GMDH)based data interpolation method in distribution power networks and applies it in the analysis of actually collected electricity data.First,the dependent and independent variables are defined from the original data,and the upper and lower limits of missing values are determined according to prior knowledge or existing data information.All missing data are randomly interpolated within the upper and lower limits.Then,the GMDH network is established to obtain the optimal complexity model,which is used to predict the missing data to replace the last imputed electricity consumption data.At last,this process is implemented iteratively until the missing values do not change.Under a relatively small noise level(α=0.25),the proposed approach achieves a maximum error of no more than 0.605%.Experimental findings demonstrate the efficacy and feasibility of the proposed approach,which realizes the transformation from incomplete data to complete data.Also,this proposed data interpolation approach provides a strong basis for the electricity theft diagnosis and metering fault analysis of electricity enterprises.