一种面向高维混合属性数据的异常挖掘算法被引量：3

New approach for outlier detection in high dimensional dataset with mixed attributes

下载PDF

导出

摘要异常检测是数据挖掘领域研究的最基本的问题之一,它在欺诈甄别、气象预报、客户分类和入侵检测等方面有广泛的应用。针对网络入侵检测的需求提出了一种新的基于混合属性聚类的异常挖掘算法,并且依据异常点(outliers)是数据集中的稀有点这一本质,给出了一种新的数据相似性和异常度的定义。本文所提出算法具有线性时间复杂度,在KDDCUP99和WisconsinPrognosisBreastCancer数据集上的实验表明,算本法在提供了近似线性时间复杂度和很好的可扩展性的同时,能够较好的发现数据集中的异常点。 The outlier detection problem has important applications in the fields of fraud detection, weather prediction, customer segmentation1 and intrusion detection. Many recent algorithms use concepts of proximity in order to find outliers based on their relationship to the rest of the data. In this paper we proposed a new algorithm to detect outlier in high dimensional domains with mixed attributes based on clustering, and proposed a new method to measure similarity and outlyingness of objects. The algorithm we proposed can give near linear performance. The experimental results on KDDCUP99 and Wisconsin Breast Cancer dataset show that our algorithm is not only effective and scalable but also leads to reasonable good accuracy.

作者李庆华李新蒋盛益

机构地区华中科技大学计算机科学与技术学院

出处《计算机应用》 CSCD 北大核心 2005年第6期1353-1356,共4页 journal of Computer Applications

基金国家自然科学基金资助项目(60273075)

关键词异常检测聚类数据挖掘 outlier detection clustering data ming

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论] TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献16

1HAWKINS D. Identification of Outliers[ M]. Chapman and Hall,London, 1980.
2BARNETT V, LEWIS T. Outliers in statistical data[ M]. John Wiley, 1994.
3BICKEL DR. Robust estimators of the mode and skewness of continuous data[ J]. Computational Statistics and Data Analysis, 2002, 39(2): 153 - 163.
4ARNING A, AGRAWAL R, RAGHAVAN P. A Linear Method for Deviation Detection in Large Databases[ A]. Proc 2nd Int Conf on Knowledge Discovery and Data Mining[C], Portland, OR, AAAI Press, 1996. 164 - 169.
5SARAWAGI S, AGRAWAL R, MEGIDDO N. Discovery-Driven exploration of OLAP data cubes[ A]. Proc 6th Int Conf on Extending Database Technology[ C]. Valencia: Springer - Verlag, 1998.168 -182.
6HE ZY, XU XF, DENG SC. Discovering cluster-based local outliers [J]. Pattern Recognition Letters, 2003, 24(9 - 10): 1651 - 1660.
7KNORR EM, NG RT. A Unified Approach for Mining Outliers[ A].Proceedings of the 7th CASCON[ C], 1997.236 -248.
8KNORR EM. Outliers and data mining: Finding exceptions in data [D]. Ph D thesis, THE UNIVERSITY OF BRITISH COLUMBIA (CANADA), 2002.
9BREUNIG MM, KRIEGEL HP, NG RT, et al. LOF: Identifying density-based local outliers[ A]. Proceedings of SIGMOD_00[ C],Dallas Texas, 2000.427 -438.
10PAPADIMITRIOU S, KITAGAWA H, GIBBONS PB, et al. LOCI:Fast Outlier Detection Using the Local Correlation Integral[ R].Technical Report, IRP-TR-02-09, 2002.

二级参考文献12

1..http://www.olapcouncil.org/research/APB 1R2_spec.pdf,1998.
2Han J, Chee S, Chiang J. Issues for on-line analytical mining of data warehouses. In: Haas L, Tiwary A, eds. Proceedings of the SIGMOD'98 Workshop on Research Issues on Data Mining and Knowledge Discovery. Seattle: ACM Press, 1998.2:1～2:5.
3Sarawagi S, Agrawal R, Megiddo N. Discovery-Driven exploration of OLAP data cubes. In: Schek H, Saltor F, Ramos I, Alonso G,eds. Proceedings of the 6th International Conference on Extending Database Technology. Valencia: Springer-Verlag, 1998.168～182.
4Harinarayan V, Rajaraman A, Ullman J. Implementing data cubes efficiently. In: Jagadish H, Mumick I, eds. Proceedings of the ACM-SIGMOD International Conference on Management of Data. Montreal: ACM Press, 1996. 205～216.
5Liang W, Orlowska ME, Yu JX. Optimizing multiple dimensional queries simultaneously in multidimensional databases VLDB Journal, 2000,8(3-4):319～338.
6Srikant R, Vu Q, Agrawal R. Mining association rules with item constraints. In: Heckerman D, Mannila H, Pregibon D, eds.Proceedings of the 1997 International Conference on Data Mining and Knowledge Discovery. AAAI Press, 1997. 67～73.
7Bayardo R, Agrawal R, Gunopulos D. Constraint-Based rule mining on large, dense data sets. In: Papazoglou M, ed. Proceedings of the 1999 International Conference on Data Engineering. Sydney: IEEE Computer Society, 1999. 188～197.
8Klemettinen M, Mannila P, Ronkainen P. Finding interesting rules from large sets of discovered association rules. In: Nicholas C,Mayfield J, eds. Proceedings of the 3rd International Conference on Information and Knowledge Management. ACM Press, 1994.401～407.
9Imielinski T, Khachiyan L, Abdulghani A. Cubegrades: Generalizing association rules. Data Mining and Knowledge Discovery,2002,6(3):219～257.
10Sarawagi S. Explaining differences in multidimensional aggregates. In: Brodie M, ed. Proceedings of the 25th International Conference on Very Large Databases. Edinburgh: Morgan Kaufmann Publishers, 1999.42～53.

共引文献10

1刘柱,霍颖瑜.面向大数据的异常数据修正技术[J].科学技术创新,2019(18):68-69.
2蒋盛益,徐雨明,陈溪辉.异常挖掘研究综述[J].衡阳师范学院学报,2004,25(3):63-66. 被引量：2
3谭耀文,谭义红,李学勇.数据挖掘技术在统计预处理中的应用[J].湘潭师范学院学报（自然科学版）,2005,27(2):76-78. 被引量：1
4刘洪涛,童德利,陈世福.一种基于属性的异常点检测算法[J].计算机科学,2005,32(5):164-166. 被引量：4
5胡为群,祝利莉,郑可锋,叶少挺,朱旭斌.农业资源属性数据挖掘研究[J].科技情报开发与经济,2006,16(15):207-208. 被引量：2
6李敏,张玉峰.基于知识情境的企业竞争情报多维挖掘研究[J].图书情报工作,2008,52(3):77-79. 被引量：3
7薛安荣,姚林,鞠时光,陈伟鹤,马汉达.离群点挖掘方法综述[J].计算机科学,2008,35(11):13-18. 被引量：68
8徐翔,刘建伟,罗雄麟.离群点挖掘研究[J].计算机应用研究,2009,26(1):34-40. 被引量：27
9杨春卉.退役运动员易患伤病挖掘分析模型仿真[J].科技通报,2013,29(12):19-21. 被引量：1
10夏火松,龙瑾,李芳,贺婷婷.基于高频关键词的离群点监测与异类知识研究——从文献分析视角[J].情报杂志,2017,36(5):181-186. 被引量：3

同被引文献12

1俞研,黄皓.一种半聚类的异常入侵检测算法[J].计算机应用,2006,26(7):1640-1642. 被引量：17
2LEONID P,ELEAZAR E,SALVATORE J S.Intrusion Detection with Unlabeled Data Using Clustering[C],In:Proceedings of ACM CSS Workshop on Data Mining Applied to Security (DMSA -2001).Philadelphia,PA,2001 (11):5-8.
3PORTNOY L, ESKIN E, STOLFO S J. Intrusion detection with unlabeled data using clustering[ C]// Proceedings of the ACM CSS Workshop on Data Mining Applied to Security. New York, NY, USA: ACM, 2001.
4CHIMPHLEE W, ABDULLAH A H, NOOR MD SAP M, et al. Integrating genetic algorithms and fuzzy c-means for anomaly detection [J].Annual IEEE INDICON. Washington, DC: IEEE, 2005:575 - 579.
5KRISHNAPURAM R , KELLER J M . A possibilistic approach to clustering[ J]. IEEE Transactions on Fuzzy Systems, 1993, 1 (2) : 98 -110.
6KDD CUP99 data set[ EB/OL]. [ 2008 - 04 - 10]. http://kdd. its. uci. edu/databases/kddcup99/kddcup99. html.
7UCI. Machine learning repository[ EB/OL]. [ 2008 - 04 - 10]. http://www. ics. uci. edu/?mlearn/MLSummary.html.
8李昕,钱旭,王自强.一种高效的高维异常数据挖掘算法[J].计算机工程,2010,36(21):34-36. 被引量：7
9徐钢,张晓彤,黎敏,徐金梧.基于软超球体的高维非线性数据异常点识别算法[J].工程科学学报,2017,39(10):1552-1558. 被引量：2
10杨敬民,张文杰.物联网环境下移动高维异常数据自动挖掘仿真[J].计算机仿真,2018,35(1):441-444. 被引量：10

引证文献3

1李洪波.物联网环境下舰船监控网络高维异常数据挖掘方法[J].舰船科学技术,2019,0(20):154-156. 被引量：1
2周英,孙名松.基于聚类的网络入侵检测系统模型[J].哈尔滨理工大学学报,2007,12(1):39-42.
3陆虎,李永忠.不确定聚类算法及其在入侵检测系统中应用[J].计算机应用,2008,28(10):2715-2717. 被引量：3

二级引证文献4

1党小超,郝占军,王筱娟.基于簇连接度聚类算法的入侵检测[J].计算机工程与应用,2010,46(21):82-85. 被引量：1
2金春霞,周海岩.不确定性数据聚类挖掘研究综述[J].现代计算机,2011,17(3):10-12. 被引量：1
3马莹,陈志龙,刘贺,卢厚清.公式赋权-灰色模糊评价法的构建与应用[J].计算机应用,2018,38(A02):34-37. 被引量：5
4郭霏霏.基于智能技术的移动物联网数据深层挖掘技术[J].黑龙江工程学院学报,2022,36(5):27-31.

1熊赟,朱扬勇.特异群组挖掘:框架与应用[J].大数据,2015,1(2):66-77. 被引量：5
2孙喜来,王欣,葛昂,郑家民,邓宏斌.面向相似度的多维异构数据比对模型研究[J].信息安全与技术,2011,2(9):71-76.
3姜福祥,潘洋宇.三坐标测量机测量数据的降噪[J].工具技术,2007,41(10):102-105. 被引量：1
4林菁,江琳.免疫粒子群算法下向量机参数选择及金融应用[J].福建金融管理干部学院学报,2012(3):60-64.
5支晓斌,高垚琦.改进的基于闵氏距离的软子空间聚类算法[J].西安邮电大学学报,2015,20(6):56-60.
6戴炳荣,王晓丽,李超,陈洁,施天行.一种基于PCA-SVM的医疗卫生数据挖掘分类方法[J].计算机应用与软件,2016,33(8):67-70. 被引量：5
7丁威.基于P2P的多媒体数据相似性查询方法[J].一重技术,2010(4):50-53.
8增强竞争力的车间管理软件[J].现代制造,2008(24):64-65.
9卢小甫,李凡长.一种基于切丛的维数约简方法[J].计算机工程与科学,2010,32(5):89-91.
10董明,杨志波.Dynamic Bayesian Network Based Prognosis in Machining Processes[J].Journal of Shanghai Jiaotong university(Science),2008,13(3):318-322.

计算机应用

2005年第6期

浏览历史

内容加载中请稍等...

一种面向高维混合属性数据的异常挖掘算法被引量：3

参考文献16

二级参考文献12

共引文献10

同被引文献12

引证文献3

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

一种面向高维混合属性数据的异常挖掘算法 被引量：3

参考文献16

二级参考文献12

共引文献10

同被引文献12

引证文献3

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

一种面向高维混合属性数据的异常挖掘算法被引量：3