期刊文献+

iLOF*:一种改进的局部异常检测算法 被引量:7

iLOF*: An Optimized Local Outlier Detection Algorithm
下载PDF
导出
摘要 异常检测是数据挖掘领域研究的基本问题之一,已被广泛应用于气象预报、网络入侵检测、电信和信用卡欺诈侦察等领域.基于密度的异常检测算法LOF具有较好的检测效果和适用性,但其计算量较大,运行效率不够高,且在进行对象之间的距离计算时忽略了不同属性对异常值的不同影响.针对以上不足,本文提出了一种高效的LOF改进算法iLOF*.该算法利用网格进行数据约简,从而提高了算法的运行效率;同时,在进行对象之间的距离计算时,引入信息熵,给不同属性赋予不同的权值,从而提高了算法的准确率.另外,用MapReduce计算框架将iLOF*算法并行化,进一步提高了算法在大规模数据集上的运行效率.最后的实验结果验证了iLOF*算法的有效性和高效性. Outlier detection is an important branch in the areaof data mining,It has been widely used in weather forecasting, network intrusion detection, telecommunications and credit card fraud detection,etc. LOF algorithm has good detection effect and availability, but its computation is very high, whose efficiency is not good enough,And when calculating the distance between two objects, LOF algorithm ignores the different influence of different properties.To solve above disadvantages, we put forward an improved outlier detection algorithmiLOF*, iLOF* algorithm usesgrid to reduce the data sets, so as to improve the efficiency of the algorithm; at the same time, when calculating the distance between the object, iLOF* algorithm gives different weights to different properties through the introduction of information entropy, which improve the accuracy of the algorithm.In addition, we use the parallel computing framework MapReduce to parallel iLOF * algorithm, which further improves the efficiency of algorithm on large data sets.The experimental results demonstrate the effectiveness and efficiency of the proposed algorithm.
作者 王飞
出处 《计算机系统应用》 2015年第12期233-238,共6页 Computer Systems & Applications
关键词 数据挖掘 异常检测 局部异常因子 信息熵 并行化 data mining outlier detection local outlier factor information entropy parallelization
  • 相关文献

参考文献13

  • 1Hawkins DM. Identification of outliers. London: Chapman and Hall, 1980.
  • 2Breunig MM, Kriegel HP, Ng RT, Sander J. LOF: identifying density-based local outliers. 2000,.
  • 3Papadimitriou S, Kitagawa H, Gibbons PB, et al. Loci: Fast outlier detection using the local correlation integral. Proc. of the 19th International Conference on Data Engineering 2003. IEEE, 2003:315-326.
  • 4Ma Y, Shi H, Wang M. Adaptive local outlier probability for dynamic process monitoring. Chinese Journal of Chemical Engineering, 2014, 22(7): 820-827.
  • 5李存华,孙志挥.GridOF:面向大规模数据集的高效离群点检测算法[J].计算机研究与发展,2003,40(11):1586-1592. 被引量:28
  • 6Shannon CE. A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review, 2001, 5(1): 3-55.
  • 7Agyemang M, Ezeife CI. Lsc-mine: Algorithm for mining local outliers. Proc. of the 15th Information Resource Management Association (IRMA) International Conference. New Orleans. 2004, 1 : 5-8.
  • 8Tang J, Chen Z, Fu AWC, Cheung DW. Enhancing effectiveness of outlier detections for low density patterns. In Dvances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg.2002, 535-548.
  • 9Jiang F, Sui Y, Cao C. An information entropy-based approach to outlier detection in rough sets. Expert Systems with Applications, 2010, 37(9): 6338-6344.
  • 10Dean J, Ghemawat S. MapReduce: a flexible data processing tool. Communications of the ACM, 2010, 53(1): 72-77.

二级参考文献7

  • 1D Hawkins. Identification of Outliers. London: Chapman and Hall, 1980.
  • 2T Johnson, I Kwok, R Ng. Fast computation of 2-dimensional depth contours. In: Proc of the 4th Int'l Conf on Knowledge Discovery and Data Mining. New York: AAAI Press, 1998. 224-228.
  • 3E M Knorr, R T Ng. Algorithms for mining distance-based outliers in large datasets. In: Proc of the 24th Int'l Conf on Very Large Databases. New York: Morgan Kaufmann, 1998. 392-403.
  • 4D Yu, G Sheikholeslami, A Zhang. Findout: Finding outliers in very large datasets. Department of Computer Science and Engineering, State University of New York at Buffalo, Tech Rep:99-03, 1999. http://www. cse. buffalo. edu/tech-reports.
  • 5M Breunig, H Kriegel, R T Ng et al. LOF: Identifying densitybased local outliers. In: Proc of ACM SIGMOD Int'l Cortf on Management of Data. Dallas, Texas: ACM Press, 2000. 93-104.
  • 6M Joshi, R Agarwal, V Kumar. Mining needles in a haystack:Classifying rare classes via two-phase rule induction. In: Proc of ACM SIGMOD Int'l Conf on Management of Data. Santa Barbara, CA: ACM Press, 2001. 91-102.
  • 7H Samet. The Design and Analysis of Spatial Data Structures.Boston, MA: Addison-Wesley, 1990.

共引文献27

同被引文献41

引证文献7

二级引证文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部