改进的SOD孤立点检测算法

Improved SOD Outlier Detection Algorithm

下载PDF

导出

摘要针对传统SOD孤立点检测算法在处理高维数据时存在的问题,提出一种改进算法。通过对每一维的聚集度进行量化,确定各维的参考价值,从而降低算法结果对参数设定的敏感度,利用相对距离表示各点到中心值的偏离度,使其更利于不同密度子空间的孤立点检测。仿真实验结果表明,改进算法的检测精度优于传统SOD算法。 Aiming at the problems in process of dealing with high dimensional data for traditional SOD outlier detection algorithm,this paper presents an improved one.Through quantifying the aggregation of each dimension,the reference value of each dimension can be fixed,thus reducing the parameter settings impact on algorithm results.Using the relative distance to show the degree of deviation is convenient for detecting outlier in different densities subspace.Simulation results demonstrate the improved algorithm is better than traditional one in detection accuracy.

作者刘文远张亮孙德杰陈子军

机构地区燕山大学信息科学与工程学院

出处《计算机工程》 CAS CSCD 北大核心 2011年第9期93-94,97,共3页 Computer Engineering

基金河北省重大技术创新基金资助项目"河北省港口群生产管理集成信息系统"(09213562Z)

关键词高维数据子空间孤立点检测数据挖掘 high dimensional data subspace outlier detection data mining

分类号 TP311.52 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献7

1谭庆,张瑞玲.基于局部偏离因子的孤立点检测算法[J].计算机工程,2008,34(17):59-61. 被引量：5
2Hans P K,Matthias S,Arthur Z.Angle-based Outlier Detection in High-dimensional Data[C]//Proc.of KDD'08,Las Vegas,Nevada,USA:[s.n.],2008.
3Christian B,Katrin H,Nikola S M,et al.CoCo:Coding Cost for Parameter-free Outlier Detection[C]//Proc.of KDD'09.Paris,France:[s.n.],2009.
4Ankur A.Local Subspace Based Outlier Detection[C]//Proc.of the 2nd International Conference on Communications in Computer and Information Science.Noida,India:[s.n.],2009.
5Ye Mao,Li Xue,Maria E O.Projected Outlier Detection in High-dimensional Mixed-attributes Data Set[J].Expert Systems with Applications,2009,36(3):7104-7113.
6Hans P K,Peer K,Erich S,et al.Outlier Detection in Axis-parallel Subspaces of High Dimensional Data[C]//Proc.of the 13th PAKDD'09.Bangkok,Thailand:[s.n.],2009.
7雷小锋,谢昆青,林帆,夏征义.一种基于K-Means局部最优性的高效聚类算法[J].软件学报,2008,19(7):1683-1692. 被引量：114

二级参考文献14

1Han JW, Kamber M. Data Mining: Concepts and Techniques. 2nd ed., San Francisco: Morgan Kaufmann Publishers, 2001. 223-250.
2Ester M, Kriegel HP, Sander J, Xu XW. A density-based algorithm for discovering clusters in large spatial database with noise. In: Simoudis E, Han J, Fayyad UM, eds. Proc. of the 2nd Int'l Conf. on Knowledge Discovery and Data Mining. Portland: AAAI Press, 1996. 226-231.
3Zhang T, Ramakrishnan R, Linvy M. BIRCH: An efficient data clustering method for very large databases. In: Jagadish HV, Mumick IS, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Montreal: ACM Press, 1996. 103-114.
4Guha S, RastogiR, Shim K. CURE: An efficient clustering algorithm for large databases. In: Haas LM, Tiwary A, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. New York: ACM Press, 1998. 73-84.
5Ankerst M, Breuning M, Kriegel HP, Sander J. OPTICS: Ordering points to identify the clustering structure. In: Delis A, Faloutsos C, Ghandeharizadeh S, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Philadelphia: ACM Press, 1999. 49-60.
6Karypis G, Han EH, Kumar V. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. Computer, 1999,32(8): 68-75.
7Hand DJ, Vinciotti V. Choosing k for two-class nearest neighbour classifiers with unbalanced classes. Pattern Recognition Letters, 2003,24(9): 1555-1562.
8Stonebraker M, Frew J, Gardels K, Meredith J. The SEQUOIA 2000 storage benchmark. In: Buneman P, ed. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Washington: ACM Press, 1993.2-11.
9Lazarevic A, Srivastava J, Kumar V. PAKDD 2004 Tutorial: Data Mining for Analysis of RareEvents[EB/OL]. (2004-03-26). http://www.deakin.edu.au/-pakdd04/pdf/Tutorial2.pdf.
10Hawkins D. Identification of Outliers[M]. London, England: Chapman and Hall, 1980.

共引文献117

1吕政阳,邓涛,张丽艳.一种基于机器视觉的飞机钣金件跨粒度识别方法[J].仪器仪表学报,2020,41(2):195-204. 被引量：10
2王海,高岭,陈东棋,任杰.一种基于用户行为的嵌入式功耗优化方法[J].系统仿真学报,2015,27(2):320-326.
3周慧芳.自适应的k-means聚类算法SA-K-means[J].科技创新导报,2009,6(34):4-5. 被引量：3
4罗晖霞,曲晓玲.基于网络舆情的K-Means算法的改进研究[J].电脑开发与应用,2010,23(8):4-6. 被引量：3
5彭柳青,张军英,许进.基于k-Means均匀效应的健壮聚类初始算法[J].华中科技大学学报（自然科学版）,2010,38(8):73-76. 被引量：2
6李东艳,李绍滋,柯逍.基于外部数据库的图像自动标注改善模型[J].计算机应用,2010,30(10):2610-2613. 被引量：1
7刘琳,于海斌.异构无线传感器网络中簇首的优化部署策略[J].通信学报,2010,31(10):229-237. 被引量：7
8李晓燕,陈刚,寿黎但,董金祥.一种面向协作标签系统的图片检索聚类方法[J].中国图象图形学报,2010,15(11):1635-1643. 被引量：3
9雷小锋,何涛,李奎儒,谢昆青,丁世飞.面向结构稳定性的分裂-合并聚类算法[J].计算机科学,2010,37(11):217-222. 被引量：4
10黄美璇.一种基于Kmax的K-means改进算法[J].佛山科学技术学院学报（自然科学版）,2010,28(2):49-52. 被引量：1

1姜晗,贾泂.基于聚类的孤立点检测算法[J].计算机与现代化,2007(11):37-39. 被引量：6
2李曼,赵松林.K—means聚类算法分析应用研究[J].魅力中国,2011(7):243-243. 被引量：2
3谭京京.数据挖掘中的孤立点检测研究[J].黑龙江科技信息,2016(10):84-84.
4陈宝国,郑丽英.基于Web日志文件的孤立点检测算法[J].计算机与数字工程,2010,38(5):35-37. 被引量：2
5数码相机用的SD卡能否在电脑上格式化[J].网友世界,2010(14):76-76.
6俞木发.Win8的“心” Win7的“脸”[J].计算机应用文摘,2013(15):6-8.
7刘曼玲,范洁.基于粗糙集的孤立点检测算法[J].微计算机信息,2009(33):1-2.
8Linux问答[J].开放系统世界,2002(5):124-126.
9鄢团军,刘勇.孤立点检测算法与应用[J].三峡大学学报（自然科学版）,2009,31(1):98-103. 被引量：10
10孙云,李舟军,陈火旺.孤立点检测算法及其在数据流挖掘中的可用性[J].计算机科学,2007,34(10):200-203. 被引量：15

计算机工程

2011年第9期

浏览历史

内容加载中请稍等...

改进的SOD孤立点检测算法

参考文献7

二级参考文献14

共引文献117

相关作者

相关机构

相关主题

浏览历史