期刊文献+

基于密度的不确定数据离群点检测研究 被引量:6

Density-based Outlier Detection on Uncertain Data
下载PDF
导出
摘要 针对不确定数据集进行离群点检测,设计了基于密度的不确定数据的局部离群因子(Uncertain Local Outlier Factor,ULOF)算法。通过建立不确定数据的可能世界模型来确定不确定对象在可能世界中的概率。结合传统的LOF算法推导出ULOF算法,根据ULOF值判断不确定对象的局部离群程度;然后对ULOF算法的效率性和准确性进行了详细分析,提出了基于网格的剪枝策略、k最近邻查询优化来减少数据的候选集;最后通过实验证明了ULOF算法对不确定数据检测的可行性和效率性,优化后的方法有效地提高了异常检测准确率,降低了时间复杂度,改善了不确定数据的异常检测性能。 Based on local information, a new outlier detection algorithm was designed to calculate density-based uncertain local outlier factor (ULOF) for each point in an uncertain dataset. Firstly, by establishing the possible world model, we calculated the probability of possible word for uncertain data. Then we combined the traditional LOF algorithm to derivate the ULOF algorithm formula, and judged the degree outlier of each data according to the ULOF value. We also did a detailed analysis for efficiency and accuracy of ULOF algorithm. At the same time, we proposed gird-based pruning strategy and k-nearest neighborhood query optimization to reduce the candidate dataset. At last the results of several experiments on synthetic data demonstrate the feasibility and effectiveness of the proposed approach. Optimized NLOF algorithm can improve the outlier detection accuracy, reduce the time complexity and improve the performance of outlier detection on uncertain data.
出处 《计算机科学》 CSCD 北大核心 2015年第5期230-233,264,共5页 Computer Science
基金 国家自然科学基金(61173131) 重庆自然科学基金(CSTS2010BD2061)资助
关键词 不确定数据 局部离群点检测 可能世界模型 k最近邻 Uncertain data, Local outlier detection, Possible world model, k-nearest neighborhood
  • 相关文献

参考文献17

  • 1Garces H,Sbarbaro D.Outliers Detection in Environmental Monitoring Databases[J].Engineering Applications of Artificial Intelligence,2011,24(2):341-349.
  • 2Jampani R,Xu F,Wu M.A Monte Carlo Approach to Managing Uncertain Data[C]∥Proc.SIGMOD,2008:687-700.
  • 3Widom J.Trio:A System for Integrated Management of Data,Accuracy,and Lineage[C]∥Proc.of the Second Biennial Conference on Innovative Data Systems Research.Asilomar,2005:262-276.
  • 4Li F F,Yi K,Jestes J.Ranking Distributed Probabilistic Data[C]∥Proc.SIGMOD Conference.ACM New York,NY,USA 2009:361-374.
  • 5张晓峰 王丽珍 陆叶.一种基于属性加权的不确定K-means聚类算法.计算机研究与发展,2009,:504-508.
  • 6Tsang S,Kao B,Yip K Y.Decision Trees for Uncertain Data[C]∥The 25th International Conference on Data Engineering New Jersey :IEEE Press,2009:441-444.
  • 7Kriegel H P,Pfeifle M.Density-based Clustering of UncertainData[C]∥ACM Knowledge Discovery and Data Mining.ACM Press,2005:672-677.
  • 8Aggarwal C C.Managing and Mining Uncertain Data[J].Advances in Database Systems,2009(35):75-89.
  • 9Ngai W K,Kao B,Chui C K,et al.Efficient Clustering of Uncertain Data[C]∥ICDM,IEEE Computer Society,2006:436-445.
  • 10Qin B,Xia Y,Li F.A Bayesian Classifier for Uncertain Data[C]∥SAC,ACM,2010:1010-1014.

二级参考文献4

  • 1Breunig M, Kriegel H P, Ng R, et al. LOF: Identifying Density-based Local Outliers[C]//Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Dalles, TX, USA: [s. n.], 2000.
  • 2Han J, Kamber M. Data Mining: Concepts and Techniques[M]. [S. l.]: Morgan Kaufmann, 2000.
  • 3Stonebraker M, Frew J, Gardels K, et al. The Sequoia 2000 Storage Benchmark[C]//Proc. of ACM SIGMOD Int'l Conference on Management of Data. Washington, D. C., USA: [s. n.], 1993.
  • 4杨风召,朱扬勇,施伯乐.IncLOF:动态环境下局部异常的增量挖掘算法[J].计算机研究与发展,2004,41(3):477-484. 被引量:33

共引文献8

同被引文献62

引证文献6

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部