期刊文献+

基于孤立森林算法的取用水量异常数据检测方法 被引量:25

Water Consumption Abnormal Data Detection Method based on Isolation Forest
下载PDF
导出
摘要 水资源管理系统中储存着海量的取用水量数据,通过筛选数据中的异常值定位异常取水行为,是水资源监管的重要手段。对取用水量数据中的异常值普遍缺乏明确定义,传统的异常值检测算法在实时性和稳定性方面存在不足。在总结归纳现阶段取用水量异常数据种类、特点的基础上,首先运用平均插值法对可直观识别异常值进行预处理,在预处理后的数据中随机取样训练,建立多个孤立二叉树形成孤立森林,以此为工具对数据样本进行异常值检测。对某供水公司连续两年日取水量监测数据的实证分析结果表明,基于孤立森林算法的异常值检测方法将数据样本的特征通过非监督学习方式存储在森林中,具有更高的稳定性;能够准确检测出数据样本中的异常值,相比于传统最小二乘拟合方法具有更高的检出率。 Water resource management system store hugs amounts of data on water consumption,and it is an important means of water resource regulation to locate abnormal water intake behavior by screening the abnormal values in the data.These outliers lack effective classification.The traditional outlier detection algo rithm has shortcomings in real-time and stability.On the basis of summarizing the types and characteristics of abnormal data of water consumption at the present stage,firstly,the average interpolation method is used to pre-process the outliers,and random sampling training is performed in the pre-processed data to establish multiple isolated binary trees to form isolation forest.The forest is used to perform outlier detec tion on data samples.The empirical analysis of the daily water intake monitoring data of a water supply company shows that the outlier detection method based on the isolation forest algorithm stores the character istics of the data samples in the forest through unsupervised learning,which has higher stability and can accurately detect.The outliers in the data samples have a higher detection rate than the traditional least squares fitting method;they are suitable for real-time monitoring of water resources data.
作者 赵臣啸 薛惠锋 王磊 万毅 ZHAO Chenxiao;XUE Huifeng;WANG Lei;WAN Yi(China Aerospace Academy of Systems Science and Engineering,Beijing 100048,China;Water Resources Management Center,The Ministry of Water Resources of the People’s Republic of China,Beijing 100053,China)
出处 《中国水利水电科学研究院学报》 北大核心 2020年第1期31-39,共9页 Journal of China Institute of Water Resources and Hydropower Research
基金 国家自然科学基金重点项目(U1501253)。
关键词 水资源监测 异常数据 平均插值 孤立森林 最小二乘拟合 water resources monitoring abnormal data average interpolation isolation forest least squares
  • 相关文献

参考文献9

二级参考文献68

  • 1段建东,张保会,周艺,罗四倍,任晋峰,杭乃善,刁桂平.基于暂态量的超高压输电线路故障选相[J].中国电机工程学报,2006,26(3):1-6. 被引量:64
  • 2田新广,孙春来,段洣毅,钱小军,邱志明.基于机器学习的用户行为异常检测模型[J].计算机工程与应用,2006,42(19):101-103. 被引量:8
  • 3蒋云钟,张小娟,石玉波,田琦,周望鸿,陈莹.水资源实时监控与管理系统标准体系建设[J].中国水利,2007(1):55-58. 被引量:13
  • 4麦瑞坤,何正友,符玲,钱清泉.基于电流行波能量和小波变换的输电线路故障选相研究[J].电网技术,2007,31(3):38-43. 被引量:19
  • 5李广琦.电力系统暂态分析[M].北京:中国电力出版社,2006:120-123.
  • 6Forrest S,Perelsonas,Allen L,et al.Self-Nonself Discrimination in a Computer[A].In Proceedings of IEEE Symposium on Research in Security and Privacy[C].Oakland,1994.
  • 7Dasgupta D,Forrest S.Novelty detection in time series data using ideas from immuneology[A].5th International Conference on Intelligent Systems[C].Reno Nevada,1996.
  • 8Gonzalez F,Dasgupta D,Gomez J.The effect of binary matching rules in negative selection[A].GECCO,LNCS2723[C].2003:195-206.
  • 9Kim J,Bentley P J.Negative selection and niching by an artificial immune system for network intrusion detection[A].Genetic and Evolutionary Comutation Conference[C].Orlando,Florida,1999:145-158.
  • 10李庆扬 王能超 易大义.数值分析(第3版)[M].武汉:华中理工大学出版社,1986.20-31.

共引文献83

同被引文献305

引证文献25

二级引证文献68

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部