期刊文献+

高维数据集离群子空间特性研究 被引量:2

Research on Subspace Characteristic of High Dimension Outlier Dataset
下载PDF
导出
摘要 探讨对挖掘出的离群数据集进行解释与分析的有效方法。以粗糙集理论的属性约简技术为基础,定义了属性离群贡献度等概念对高维数据集离群特性进行了量化描述,提出了离群划分与离群约简思想以及离群数据关键属性域子空间分析方法,给出了一种离群约简算法并分析了算法复杂性。实验表明,这种方法可以有效地揭示离群数据产生来源,有助于对整体数据集的更全面理解,且提出的算法对于问题规模具有较好的适应性。 Some efficient methods of explaining and analyzing outliers is discussed in this paper.For describing outlying feature of high dimension dataset quantificationally,a concept of degree of outlying contribution is defined in the paper based on attribute reduction in the theory of rough set.With outlying partition and reduction and the analyzing method of the key attribute subspace of outliers are put forward,this paper presents an algorithm for outlying reduction and analyzes its complexity.Experimental results show that the approach can be used for identifying the origin of outliers a nd improve the understanding of whole data set and the proposed algorithm is scalable and efficient.
出处 《计算机工程与应用》 CSCD 北大核心 2006年第9期147-149,共3页 Computer Engineering and Applications
基金 国家自然科学基金资助项目(编号:60403009) 重庆市自然科学基金资助项目(编号:2005BB2224)
关键词 离群划分 关键域子空间 离群贡献度 离群约简 outlying partition,key attribute subspace,degree of outlying contribution,outlying reduction
  • 相关文献

参考文献4

  • 1S Ramaswamy,R Rastogi,K Shim.Efficient Algorithms for Mining Outliers from Large Data Sets[C].In:Proc of the ACM SIGMOD International Conference on Management of Data,ACM Press,2000:427~438
  • 2S Papadimitriou,H Kitagawa,P B Gibbons.LOCI:fast outlier detection using the local correlation integral[C].In:Proc of the 19th International Conference on Data Engineering,IEEE Computer Society,2003:315~326
  • 3魏藜,宫学庆,钱卫宁,周傲英.高维空间中的离群点发现[J].软件学报,2002,13(2):280-290. 被引量:44
  • 4E M Knorr,R T Ng.Finding intensional knowledge of distance-based outliers[C].In:Proceedings of the 25th International Conference on Very Large Data Bases.New York:Morgan Kaufmann,1999:211~222

二级参考文献27

  • 1Fayyad, U., Piatetsky-Shapiro, G., Smyth, P. Knowledge discovery and data mining: towards a unifying framework. In: Simoudis, E., Han, J., Fayyad, U.M., eds. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland, Oregon: AAAI Press, 1996. 82~88.
  • 2Ng, R. T., Han, J. Efficient and effective clustering methods for spatial data mining. In: Bocca, J.B., Jarke, M., Zaniolo, C., eds. Proceedings of the 20th International Conference on Very Large Data Bases. Santiago: Morgan Kaufmann, 1994. 144~155.
  • 3Ester, M., Kriegel, H.-p., Sander, J., et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U.M., eds. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland, Oregon: AAAI Press, 1996. 226~231.
  • 4Zhang, T., Ramakrishnan, R., Linvy, M. BIRCH: an efficient eata clustering method for very large databases. In: Jagadish, H.V., Mumick, I.S., eds. Proceedings of the ACM SIGMOD International Conference on Management of Data. Montreal: ACM Press, 1996. 103~114.
  • 5Wang, W., Yang, J., Muntz, R. STING: a statistical information grid approach to spatial data mining. In: Jarke, M., Carey, M.J., Dittrich, K.R., et al., eds. Proceedings of the 23rd International Conference on Very Large Data Bases. Athens, Greece: Morgan Kaufmann, 1997. 186~195.
  • 6Sheikholeslami, G., Chatterjee, S., Zhang, A. WaveCluster: a multi-resolution clustering approach for very large spatial databases. In: Gupta, A., Shmueli, O., Widom, J., eds. Proceedings of the 24th International Conference on Very Large Data Bases. New York : Morgan Kaufmann, 1998. 428~439.
  • 7Hinneburg, A., Keim, D.A. An efficient approach to clustering in large multimedia databases with noise. In: Agrawal, R., Stolorz, P.E., Piatetsky-Shapiro, G. eds. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining. New York: AAAI Press, 1998. 58~65.
  • 8Agrawal, R., Gehrke, J., Gunopulos, D., et al. Automatic subspace clustering of high dimensional data for data mining applications. In: Haas, L.M., Tiwary, A., eds. Proceedings of the ACM SIGMOD International Conference on Management of Data. Seattle, Washington, D C: ACM Press, 1998. 94~105.
  • 9Ruts, I., Rousseeuw, P. Computing depth contours of bivariate point clouds. Journal of Computational Statistics and Data Analysis, 1996,23:153~168.
  • 10Arning, A., Agrawal, R., Raghavan, P. A linear method for deviation detection in large databases. In: Simoudis, E., Han, J., Fayyad, U.M., eds. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland, Oregon: AAAI Press, 1996. 164~169.

共引文献43

同被引文献14

  • 1金义富,朱庆生,邹咸林.基于遗传算法的α-离群约简搜索算法[J].计算机科学,2006,33(10):198-201. 被引量:2
  • 2金义富,朱庆生,邢永康.一种基于关键域子空间的离群数据聚类算法[J].计算机研究与发展,2007,44(4):651-659. 被引量:8
  • 3Hodge V J,Austin J.A survey of outlier detection methodologies[J].Artificial Intelligence Review,2004,22:85-126.
  • 4Angiulli F,Basra S,Pizzufi C.Distance-based detection and prediction of outliers[J].IEEE Trans on Knowledge and Data Engineering,2006,18(2):145-160.
  • 5Ramaswamy S,Rastogi R,Shim K.Efficient algorithms for mining outliers from large data sets[C] //Proc of the ACM SIGMOD International Conference on Management of Data.[S.l.] :ACM Press,2000:427-438.
  • 6Giudici P.Applied data mining:statistical methods for business and industry[M].[S.l.] :John Wiley & Sons,2004.
  • 7Chen Z,Tang J,Fu A.Modeling and efficient mining of intentional knowledge of outliers[C] //Proc of the 7th International Database Engineering and Applications Symposium.[S.l.] :IEEE Computer Society,2003:1-10.
  • 8Xiong H,Pandey G,Steinbach M,et al.Enhancing data analysis with noise removal[J].IEEE Trans on Knowledge and Data Engineering,2006,18(3):304-319.
  • 9KNORR E M, NG R T, TUCAKOV V. Distance-based outliers: algorithms and applications[J]: The VLDB Journal,2000,8(3-4) : 237-253.
  • 10KNORR E M, NG R T. Finding intentional knowledge of distancedbased outliers [ C ]//Proc of the 25th VLDB Conference. Edinburgh: [s. n. ] , 1999.

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部