摘要
水资源管理系统中储存着海量的取用水量数据,通过筛选数据中的异常值定位异常取水行为,是水资源监管的重要手段。对取用水量数据中的异常值普遍缺乏明确定义,传统的异常值检测算法在实时性和稳定性方面存在不足。在总结归纳现阶段取用水量异常数据种类、特点的基础上,首先运用平均插值法对可直观识别异常值进行预处理,在预处理后的数据中随机取样训练,建立多个孤立二叉树形成孤立森林,以此为工具对数据样本进行异常值检测。对某供水公司连续两年日取水量监测数据的实证分析结果表明,基于孤立森林算法的异常值检测方法将数据样本的特征通过非监督学习方式存储在森林中,具有更高的稳定性;能够准确检测出数据样本中的异常值,相比于传统最小二乘拟合方法具有更高的检出率。
Water resource management system store hugs amounts of data on water consumption,and it is an important means of water resource regulation to locate abnormal water intake behavior by screening the abnormal values in the data.These outliers lack effective classification.The traditional outlier detection algo rithm has shortcomings in real-time and stability.On the basis of summarizing the types and characteristics of abnormal data of water consumption at the present stage,firstly,the average interpolation method is used to pre-process the outliers,and random sampling training is performed in the pre-processed data to establish multiple isolated binary trees to form isolation forest.The forest is used to perform outlier detec tion on data samples.The empirical analysis of the daily water intake monitoring data of a water supply company shows that the outlier detection method based on the isolation forest algorithm stores the character istics of the data samples in the forest through unsupervised learning,which has higher stability and can accurately detect.The outliers in the data samples have a higher detection rate than the traditional least squares fitting method;they are suitable for real-time monitoring of water resources data.
作者
赵臣啸
薛惠锋
王磊
万毅
ZHAO Chenxiao;XUE Huifeng;WANG Lei;WAN Yi(China Aerospace Academy of Systems Science and Engineering,Beijing 100048,China;Water Resources Management Center,The Ministry of Water Resources of the People’s Republic of China,Beijing 100053,China)
出处
《中国水利水电科学研究院学报》
北大核心
2020年第1期31-39,共9页
Journal of China Institute of Water Resources and Hydropower Research
基金
国家自然科学基金重点项目(U1501253)。
关键词
水资源监测
异常数据
平均插值
孤立森林
最小二乘拟合
water resources monitoring
abnormal data
average interpolation
isolation forest
least squares