一种基于相似度量的离群点检测方法被引量：2

A Kind of Outlier Detection Algorithm Based on Similarity Measurement

下载PDF

导出

摘要离群点检测在是数据挖掘的重要领域,广泛应用在信用卡欺诈检测、网络入侵检测等重要方面,文中在结合层次聚类和相似性,给出高维数据的相似度量函数与类密度的概念,并基于类密度重新定义高维数据的离群点,从而提出一种基于相似度量的离群点检测算法;实验表明:算法对高维数据中的离群点检测有一定的价值。 Outlier detection is an important content in data mining and is widely used in the field of credit card fraud detection, network invasion detection and so on. According to hierarchical clustering and similarity, this paper presents the concept of high dimensional data similarity measurement function and class density, based on class density,the outlier of high dimensional data is redefined so that a kind of outlier detection algorithm based on similarity measurement is proposed. Experiment shows that this algorithm has certain value on outlier detection in high dimensional data.

作者孙启林方宏彬张健刘明术

机构地区安徽大学数学科学学院

出处《重庆工商大学学报（自然科学版）》 2012年第10期96-100,共5页 Journal of Chongqing Technology and Business University:Natural Science Edition

基金安徽省教育厅自然科学基金项目(05010428)

关键词离群点网络入侵数据挖掘层次聚类相似性度量 outlier network invasion data mining hierarchical clustering similarity measurement

分类号 TP390 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献10

1HAWKINS D. Identifications of Outliers[ M ]. London : Chapman and Hall, 1980.
2EKNORR R. Algorithms for mining distance-based outliers in large datasets [ A ]. In Proc of the24th VLDBConf[ C ]. NewYork: MorganKaufmann, 1998 : 392403.
3HAN J W, DAMBER M. Data Mining : Concepts andTechnologies [ M ]. SanFrancisco : Morgan Kaufmann 2001.
4ROUSSEEUW P J, LEROY A M. Robust Regression and Outlier Detection[ M ]. New York:John Wiley& Sons, 1987.
5RAKESH A, IJOHANNES G, DMIITRIOS G,et al. Automat ic Subspace Clustering of High Dimensional Data for Data Mining Application [ C ] //Proceedings of the 1998 ACMSIGMOD Internation a Conference on Management of Data, Seattle, Washington, 1998.
6GGARWAL A,PROCOPIUC C, WOLF J L, et al. Fast al- gorithmsf or projected clustering [ C ] //Proc. of the ACMSIGMOD Conference Philadel Phia, P A, 1999:61-72.
7XU Z S, XIA M M. Distance and similarity measures for hesitant fuzzy sets [ J ]. Information Sciences ,2011. 2128-2138.
8AGRAWAL R, GEHRKE J. GUNOPOLOS D, et al . Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In ACM SIGMOD Conference, 1998.
9贺玲,吴玲达,蔡益朝.高维空间中数据的相似性度量[J].数学的实践与认识,2006,36(9):189-194. 被引量：20
10黄斯达,陈启买.一种基于相似性度量的高维数据聚类算法的研究[J].计算机应用与软件,2009,26(9):102-105. 被引量：13

二级参考文献12

1汪祖媛,庄镇泉,王煦法.逐维聚类的相似度索引算法[J].计算机研究与发展,2004,41(6):1003-1009. 被引量：5
2贺玲,吴玲达,蔡益朝.高维空间中数据的相似性度量[J].数学的实践与认识,2006,36(9):189-194. 被引量：20
3Rakesh Agrawal,Johannes Gehrke, Dimitrios Gunopulos, et al . Automatic Subspace Clustering of High Dimensional Data for Data Mining Application [ C ]//Proceedings of the 1998 ACM-SIGMOD International Conference on Management of Data, Seattle, Washington, 1998.
4Aggarwal C C, Procopiuc C, Wolf J L, et al. Fast algorithms for projected clustering [ C ]//Proc. of the ACM SIGMOD Conference Philadel- Phia,PA,1999:61 -72.
5Agrawal R, Gehrke J. Gunopolos D, et al. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In ACM SIGMOD Conference, 1998.
6Sudipto Guha, Rajeev Rastogi, Kyuseok Shim CURE. An Efficient Clustering Algorithm for Large Databases [ C ]//Proceedings of the ACM SIGMOD international conference on Management of data. New York: ACM Press, 1998:73 - 84.
7Yannis Sismanis. Nick Roussopoulos. The dwarf data cube eliminates the high dimensionality eurse[R]. TR-CS4552. University of Maryland, 2003.
8Pitor Indyk. Rajeev Motvani. Approximate nearest neighbo::s: Toward removing the curse of dimensionality[C].In ACM Symposium on Theory of Computing. 1998.
9Bellmann R. Adaptive Control Processes: A Guided Tour[M]. Princeton University Press. 1961.
10Jerome H Friedman. Flexible metric nearest neighbor classification [R]. Technical Report, Department of Statistics, Stanford University, 1994.

共引文献29

1廖松博,何震瀛.HDCH:MapReduce平台上的音频数据聚类系统[J].计算机研究与发展,2011,48(S3):472-475. 被引量：3
2文贵华.面向机器学习的相对变换[J].计算机研究与发展,2008,45(4):612-618. 被引量：10
3黄斯达,陈启买.基于相似性度量的高维聚类算法的研究[J].微计算机信息,2009,25(27):187-188. 被引量：4
4黄斯达,陈启买.一种基于相似性度量的高维数据聚类算法的研究[J].计算机应用与软件,2009,26(9):102-105. 被引量：13
5詹棠森,林卫中.基于数据最优分区间相似度算法及应用[J].数学的实践与认识,2009,39(20):31-34. 被引量：6
6赵兹,马江洪.信息检索中的两个数据融合方法比较[J].计算机应用,2010,30(A01):54-56. 被引量：1
7邵昌昇,楼巍,严利民.高维数据中的相似性度量算法的改进[J].计算机技术与发展,2011,21(2):1-4. 被引量：23
8武森,叶俞飞,俞晓莉.拓展集合差异度高维数据聚类[J].计算机应用研究,2011,28(9):3253-3255.
9王俊,刘刚.基于粒子群优化聚类的温室无线传感器网络节能方法[J].农业工程学报,2012,28(7):172-177. 被引量：8
10谢明霞,王家耀,郭建忠,陈科.不等距划分的高维相似性度量方法研究[J].武汉大学学报（信息科学版）,2012,37(7):780-783. 被引量：3

同被引文献26

1许枫,丛鸿文.侧扫声纳声图判别[J].海洋测绘,2001,21(1):58-61. 被引量：21
2薛安荣,鞠时光,何伟华,陈伟鹤.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463. 被引量：96
3Resnick P, Varian H R. Recommender systems [ J ]. Communi- cations of the ACM,1997,40(3) :56-58.
4Han Jiawei, Micheline K. Data mining: concepts and tech- niques[ M]. 2nd ed. San Francisco: Mogran Kaufmann Pub- lishers ,2006.
5Guido B F,Flavio M. Outlier detection in large data sets[ J]. Computers and Chemical Engineering ,2011,35:388-390.
6Patil V A, Ragha L. Comparing performance of collaborative filtering algorithms [ C ]//Proc of 2012 international confer- ence on communication,information & computing technology. Mumbai, India : [ s. n. ] ,2012 : 1-6.
7Mehta B, Hofmann T, Fankhauser P. Lies and propaganda : de- tecting spam users in collaborative filtering [ C ]//Proceedings of the 12th international conference on intelligent user inter- faces. Honolulu, Hawaii : ACM ,2007 : 14-21.
8Itaf N, Ghafoor A,Zia U. An attack resistant method for detec- ting dishonest recommendations in pervasive computing envi- ronment[ C]//Proc of 18th IEEE international conference on network. Singapore : IEEE ,2012 : 173-178.
9Chung Chen-Yao, Hsu Ping-Yu, Huang Shih-Hsiang. A no- vel approach to filter out malicious rating profiles from recom- mender systems[ J]. Decision Support Systems ,2013,55 ( 1 ) : 314-325.
10Breuning M M, Kriegel H P, Ng R T, et al. LOF : identifying density-based local outliers[ C ]//Proc of ACM SIGMOD con- ference. New York, USA : ACM Press ,2000:427-438.

引证文献2

1周莹莹,王晓军.利用离群点算法预处理协同过滤推荐系统数据[J].计算机技术与发展,2015,25(9):129-133. 被引量：1
2曾腾,张春华,王朋.基于局部异常因子算法的三维声纳单帧重建研究[J].兵工学报,2020,41(3):552-558. 被引量：7

二级引证文献8

1匡红刚,丰强,田勇,邹平,梁科,李晓斌,谈详华.基于离群点算法的反窃电研究[J].电工技术,2020,0(5):19-20.
2冯德权,熊咏梅,谢志奇,刘海力,黄敏.基于区域空间算法的虚拟检修区域误触报警方法[J].电子设计工程,2022,30(3):140-143.
3张百川,周兴华,丁继胜,王方旗,徐博阳.影响三维声纳成像质量的因素分析及应用[J].海洋测绘,2022,42(6):35-39. 被引量：2
4金永明.基于SAE-CNN局部异常因子的摆弹机构故障诊断方法[J].机械研究与应用,2023,36(1):12-16.
5曾腾,任露露,王宇杰,王朋,黄海宁.基于组合特征的水下三维目标检测跟踪算法[J].兵工学报,2023,44(5):1384-1393.
6王鸿飞,程学军,王建平.激光雷达云数据视场交迭异常监测系统[J].电子器件,2023,46(4):1128-1133.
7曾腾,王朋,黄海宁,王冠群,张武.基于最近点迭代的水下三维声图增强算法[J].网络新媒体技术,2024,13(3):19-25.
8梁海生.基于数字孪生的配电网局部异常因子故障辨识仿真[J].电气传动,2024,54(7):66-72.

1姚争儿,李志奎.基于内容的图像检索的算法研究[J].商情,2013(6):87-87.
2张敬茂,沈艳霞.基于小波核相似度量函数的谱分割算法[J].电子测量与仪器学报,2016,30(12):1845-1852. 被引量：2
3徐章艳,李凡,刘学照.一种新的基于Fuzzy集的相似度量[J].华中科技大学学报（自然科学版）,2001,29(11):21-23.
4魏坤,赵永强,潘泉,张洪才.一种改进相似度量的红外目标跟踪算法[J].光子学报,2008,37(5):987-991. 被引量：2
5郭涛,李贵洋.信用卡欺诈行为多层动态检测模型[J].微计算机信息,2009,25(12):91-93. 被引量：4
6凌晨添.进化神经网络在信用卡欺诈检测中的应用[J].微电子学与计算机,2011,28(10):14-17. 被引量：14
7李贵洋,郭涛,刘芳.基于支持向量机的信用卡欺诈检测[J].微计算机信息,2010,26(6):69-70.
8杨森,郭建奎,朱扬勇.基于事中反馈的信用卡欺诈检测与防控[J].计算机应用与软件,2008,25(9):154-156.
9张燕.基于本质特征和网络特征的信用卡欺诈检测[J].微型电脑应用,2016,32(12):72-77. 被引量：1
10黄承慧,印鉴,陆寄远.一种改进的Lucene语义相似度检索算法[J].中山大学学报（自然科学版）,2011,50(2):11-15. 被引量：13

重庆工商大学学报（自然科学版）

2012年第10期

浏览历史

内容加载中请稍等...

一种基于相似度量的离群点检测方法被引量：2

参考文献10

二级参考文献12

共引文献29

同被引文献26

引证文献2

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

一种基于相似度量的离群点检测方法 被引量：2

参考文献10

二级参考文献12

共引文献29

同被引文献26

引证文献2

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

一种基于相似度量的离群点检测方法被引量：2