期刊文献+

基于孤立森林采样策略的企业异常用水模式检测 被引量:1

Detecting Abnormal Water Consumption Pattern of Enterprise Based on Isolation Forest Sampling
下载PDF
导出
摘要 为解决企业异常用水模式检测过程中的低频短时间序列数据和不平衡分类问题,提出了一种基于孤立森林采样策略的二分类预测方法。首先构造用水波动性特征和统计性特征,利用孤立森林算法计算多数类中样本点的“孤立”程度以衡量每个样本的“代表性”,再按样本“代表性”排序,对“代表性”高的样本优先进行采样;然后将抽取出的样本与少数类合并,构成较平衡的训练样本集;最后利用较平衡的数据集训练XGBoost分类器并进行预测。在某市的7604家企业13个月的用水量数据集上,该方法对企业异常用水模式的预测结果AUC和查全率可达到0.927和0.891,比基于随机欠采样的XGBoost方法的0.885和0.733分别提升了4.7%和21.6%。 To solve the low-frequency short-sequence data and unbalanced classification problem in detecting the abnormal water consumption pattern of enterprises,this paper proposes a two-class prediction method based on Isolation Forest sampling.Firstly,the volatility and statistical features of water consumption are constructed.The Isolated Forest algorithm is used to calculate the degree of isolation of samples in the large class to measure the representation of each sample,and the samples are extracted according to their representation.Then the extracted samples are merged with the small class to form a balanced training dataset.Finally,the XGBoost classifier is trained with the balanced dataset and predicting the abnormal pattern.On the dataset of 7,604 enterprises’13-month water consumption in a city,the AUC and recall ratio of the method proposed by this paper can reach 0.927 and 0.891,and those of XGBoost method based on random under sampling are 0.855 and 0.733,which are improved by 4.7%and 21.6%respectively.
作者 林青轩 郭强 邓春燕 王雅静 刘建国 LIN Qingxuan;GUO Qiang;DENG Chunyan;WANG Yajing;LIU Jianguo(Research Center for Complex Systems Science, University of Shanghai for Science & Technology, Shanghai 200093,China;Institute of Accounting and Finance, Shanghai University of Finance and Economics, Shanghai 200433, China)
出处 《复杂系统与复杂性科学》 EI CSCD 2020年第3期47-51,共5页 Complex Systems and Complexity Science
基金 国家自然科学基金(61773248,71771152) 国家社科重大基金(18ZDA088,20ZDA060)。
关键词 异常用水模式检测 不平衡分类 孤立森林 XGBoost abnormal water consumption pattern detection unbalanced classification isolation forest XGBoost
  • 相关文献

参考文献4

二级参考文献47

  • 1林舒杨,李翠华,江弋,林琛,邹权.不平衡数据的降采样方法研究[J].计算机研究与发展,2011,48(S3):47-53. 被引量:31
  • 2蒋盛益,谢照青,余雯.基于代价敏感的朴素贝叶斯不平衡数据分类研究[J].计算机研究与发展,2011,48(S1):387-390. 被引量:21
  • 3刘胥影,吴建鑫,周志华.一种基于级联模型的类别不平衡数据分类方法[J].南京大学学报(自然科学版),2006,42(2):148-155. 被引量:23
  • 4Chen C,,Liaw A,Breiman L.Using random forests to learn unbalanced data. . 2004
  • 5中国电机工程学会电力信息化专业委员会.中国电力大数据发展白皮书(2013)[R].北京:中国电力出版社,2013.
  • 6Yap K S,Tiong S K,Nagi J,et al.Comparison of supervised learning techniques for non-technical loss detection in power utility[J].International Review on Computers and Software,2012,7(2):626-636.
  • 7Nagi J,Yap K S,Tiong S K,et al.Nontechnical loss detection for metered customers in power utility using support vector machines[J].IEEE Transactions on Power Delivery,2010,25(2):1162-1171.
  • 8León C,Biscarri F,Monedero I,et al.Variability and trend-based generalized rule induction model to NTL detection in power companies[J].IEEE Transactions on Power Systems,2011,26(4):1798-1807.
  • 9Fontugne R,Tremblay N,Borgnat P,et al.Mining anomalous electricity consumption using ensemble empirical mode decomposition[C]//2013 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP).Vancouver,BC:IEEE,2013:5238-5242.
  • 10Nagi J,Yap K S,Tiong S K,et al.Improving SVM-based nontechnical loss detection in power utility using the fuzzy inference system[J].IEEE Transactions on Power Delivery,2011,26(2):1284-1285.

共引文献227

同被引文献2

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部