期刊文献+

一种面向高维混合属性数据的异常挖掘算法 被引量:3

New approach for outlier detection in high dimensional dataset with mixed attributes
下载PDF
导出
摘要 异常检测是数据挖掘领域研究的最基本的问题之一,它在欺诈甄别、气象预报、客户分类和入侵检测等方面有广泛的应用。针对网络入侵检测的需求提出了一种新的基于混合属性聚类的异常挖掘算法,并且依据异常点(outliers)是数据集中的稀有点这一本质,给出了一种新的数据相似性和异常度的定义。本文所提出算法具有线性时间复杂度,在KDDCUP99和WisconsinPrognosisBreastCancer数据集上的实验表明,算本法在提供了近似线性时间复杂度和很好的可扩展性的同时,能够较好的发现数据集中的异常点。 The outlier detection problem has important applications in the fields of fraud detection, weather prediction, customer segmentation1 and intrusion detection. Many recent algorithms use concepts of proximity in order to find outliers based on their relationship to the rest of the data. In this paper we proposed a new algorithm to detect outlier in high dimensional domains with mixed attributes based on clustering, and proposed a new method to measure similarity and outlyingness of objects. The algorithm we proposed can give near linear performance. The experimental results on KDDCUP99 and Wisconsin Breast Cancer dataset show that our algorithm is not only effective and scalable but also leads to reasonable good accuracy.
出处 《计算机应用》 CSCD 北大核心 2005年第6期1353-1356,共4页 journal of Computer Applications
基金 国家自然科学基金资助项目(60273075)
关键词 异常检测 聚类 数据挖掘 outlier detection clustering data ming
  • 相关文献

参考文献16

  • 1HAWKINS D. Identification of Outliers[ M]. Chapman and Hall,London, 1980.
  • 2BARNETT V, LEWIS T. Outliers in statistical data[ M]. John Wiley, 1994.
  • 3BICKEL DR. Robust estimators of the mode and skewness of continuous data[ J]. Computational Statistics and Data Analysis, 2002, 39(2): 153 - 163.
  • 4ARNING A, AGRAWAL R, RAGHAVAN P. A Linear Method for Deviation Detection in Large Databases[ A]. Proc 2nd Int Conf on Knowledge Discovery and Data Mining[C], Portland, OR, AAAI Press, 1996. 164 - 169.
  • 5SARAWAGI S, AGRAWAL R, MEGIDDO N. Discovery-Driven exploration of OLAP data cubes[ A]. Proc 6th Int Conf on Extending Database Technology[ C]. Valencia: Springer - Verlag, 1998.168 -182.
  • 6HE ZY, XU XF, DENG SC. Discovering cluster-based local outliers [J]. Pattern Recognition Letters, 2003, 24(9 - 10): 1651 - 1660.
  • 7KNORR EM, NG RT. A Unified Approach for Mining Outliers[ A].Proceedings of the 7th CASCON[ C], 1997.236 -248.
  • 8KNORR EM. Outliers and data mining: Finding exceptions in data [D]. Ph D thesis, THE UNIVERSITY OF BRITISH COLUMBIA (CANADA), 2002.
  • 9BREUNIG MM, KRIEGEL HP, NG RT, et al. LOF: Identifying density-based local outliers[ A]. Proceedings of SIGMOD_00[ C],Dallas Texas, 2000.427 -438.
  • 10PAPADIMITRIOU S, KITAGAWA H, GIBBONS PB, et al. LOCI:Fast Outlier Detection Using the Local Correlation Integral[ R].Technical Report, IRP-TR-02-09, 2002.

二级参考文献12

  • 1..http://www.olapcouncil.org/research/APB 1R2_spec.pdf,1998.
  • 2Han J, Chee S, Chiang J. Issues for on-line analytical mining of data warehouses. In: Haas L, Tiwary A, eds. Proceedings of the SIGMOD'98 Workshop on Research Issues on Data Mining and Knowledge Discovery. Seattle: ACM Press, 1998.2:1~2:5.
  • 3Sarawagi S, Agrawal R, Megiddo N. Discovery-Driven exploration of OLAP data cubes. In: Schek H, Saltor F, Ramos I, Alonso G,eds. Proceedings of the 6th International Conference on Extending Database Technology. Valencia: Springer-Verlag, 1998.168~182.
  • 4Harinarayan V, Rajaraman A, Ullman J. Implementing data cubes efficiently. In: Jagadish H, Mumick I, eds. Proceedings of the ACM-SIGMOD International Conference on Management of Data. Montreal: ACM Press, 1996. 205~216.
  • 5Liang W, Orlowska ME, Yu JX. Optimizing multiple dimensional queries simultaneously in multidimensional databases VLDB Journal, 2000,8(3-4):319~338.
  • 6Srikant R, Vu Q, Agrawal R. Mining association rules with item constraints. In: Heckerman D, Mannila H, Pregibon D, eds.Proceedings of the 1997 International Conference on Data Mining and Knowledge Discovery. AAAI Press, 1997. 67~73.
  • 7Bayardo R, Agrawal R, Gunopulos D. Constraint-Based rule mining on large, dense data sets. In: Papazoglou M, ed. Proceedings of the 1999 International Conference on Data Engineering. Sydney: IEEE Computer Society, 1999. 188~197.
  • 8Klemettinen M, Mannila P, Ronkainen P. Finding interesting rules from large sets of discovered association rules. In: Nicholas C,Mayfield J, eds. Proceedings of the 3rd International Conference on Information and Knowledge Management. ACM Press, 1994.401~407.
  • 9Imielinski T, Khachiyan L, Abdulghani A. Cubegrades: Generalizing association rules. Data Mining and Knowledge Discovery,2002,6(3):219~257.
  • 10Sarawagi S. Explaining differences in multidimensional aggregates. In: Brodie M, ed. Proceedings of the 25th International Conference on Very Large Databases. Edinburgh: Morgan Kaufmann Publishers, 1999.42~53.

共引文献10

同被引文献12

  • 1俞研,黄皓.一种半聚类的异常入侵检测算法[J].计算机应用,2006,26(7):1640-1642. 被引量:17
  • 2LEONID P,ELEAZAR E,SALVATORE J S.Intrusion Detection with Unlabeled Data Using Clustering[C],In:Proceedings of ACM CSS Workshop on Data Mining Applied to Security (DMSA -2001).Philadelphia,PA,2001 (11):5-8.
  • 3PORTNOY L, ESKIN E, STOLFO S J. Intrusion detection with unlabeled data using clustering[ C]// Proceedings of the ACM CSS Workshop on Data Mining Applied to Security. New York, NY, USA: ACM, 2001.
  • 4CHIMPHLEE W, ABDULLAH A H, NOOR MD SAP M, et al. Integrating genetic algorithms and fuzzy c-means for anomaly detection [J].Annual IEEE INDICON. Washington, DC: IEEE, 2005:575 - 579.
  • 5KRISHNAPURAM R , KELLER J M . A possibilistic approach to clustering[ J]. IEEE Transactions on Fuzzy Systems, 1993, 1 (2) : 98 -110.
  • 6KDD CUP99 data set[ EB/OL]. [ 2008 - 04 - 10]. http://kdd. its. uci. edu/databases/kddcup99/kddcup99. html.
  • 7UCI. Machine learning repository[ EB/OL]. [ 2008 - 04 - 10]. http://www. ics. uci. edu/?mlearn/MLSummary.html.
  • 8李昕,钱旭,王自强.一种高效的高维异常数据挖掘算法[J].计算机工程,2010,36(21):34-36. 被引量:7
  • 9徐钢,张晓彤,黎敏,徐金梧.基于软超球体的高维非线性数据异常点识别算法[J].工程科学学报,2017,39(10):1552-1558. 被引量:2
  • 10杨敬民,张文杰.物联网环境下移动高维异常数据自动挖掘仿真[J].计算机仿真,2018,35(1):441-444. 被引量:10

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部