摘要
当前空调系统运行数据预处理方面的研究相对较少。分析了空调系统运行数据质量问题,论述了空调系统运行数据噪声识别与清洗的重要性。阐明了机器学习中K-Means聚类算法的模型实现,分析了基于K-Means聚类算法对运行数据进行数据噪声识别的方法;在噪声数据识别的基础上,构建了空调系统运行数据的数据噪声清洗技术。利用实际的空调系统数据进行算法的具体应用,以空调系统的180组实际运行数据为样本,应用K-Means聚类算法进行数据判别,识别出180组数据中存在的噪声数据,并进行了噪声数据的数据清洗,对每一步的噪声数据处理进行详细的说明。研究结果表明,基于K-Means聚类算法,可以有效识别和清洗空调系统运行数据中的异常值和噪声值,为后续的数据挖掘工作奠定良好的数据基础。
At present,the research on data preprocessing of air conditioning system operation data is relatively small. Firstly, this paper analyzes the problem of data quality of air conditioning system operation data,and discusses the importance of data noise identification and cleaning in running data of air conditioning system. Then the model implementation of K-Means clustering algorithm in machine learning is expounded,and the method of data noise recognition based on K-Means clustering algorithm is analyzed:(1) if a data object does not belong to any cluster,it is considered as a noise value;(2) if a certain data is far distance between the image and the nearest cluster,it is recognized that the data is far away from the nearest cluster,it is considered as a noise value;( 3) if a data object is part of a small cluster or a sparse cluster,all objects in the cluster are noise values. On the basis of the noise data recognition,this paper constructs the data noise cleaning technology of the operating data of the air conditioning system. In general,if the noise data does not belong to any cluster,or if it belongs to a small cluster relative to other clusters,then the cluster data can be directly deleted and ignored; if the noise data belongs to the center of the cluster,the data can be ignored. The distance between them is very far away,so using the method of smoothing data,that is,the K-Means clustering algorithm is also used to cluster a cluster and replace the noise data with the value of the sub cluster center. Finally,this paper uses the actual air conditioning system data to carry out the concrete application of the algorithm,taking 180 groups of actual operating data of the air conditioning system as the sample,using the K-Means clustering algorithm to identify the data,identify the noise data in the 180 sets of data,and carry out the cleaning of the noise data,and deal with the noise data of each step. A detailed description is made. The research results show that the K-Means clustering algorithm can effectively identify and clean the abnormal values and noise values in the running data of the air conditioning system,and lay a good data base for the subsequent data mining.
作者
曹勇
崔治国
武根峰
刘辉
李冉
CAO Yong;CUI Zhi-guo;WU Gen-feng;LIU Hui;LI Ran(China Academy of Building Research, Beijing 100013, China)
出处
《建筑节能》
CAS
2018年第5期79-83,共5页
BUILDING ENERGY EFFICIENCY
基金
"十三五"国家重点研发计划项目:公共机构高效用能系统及智能调控技术研发与示范(2016YFB0601700)