期刊文献+

Linear manifold clustering for high dimensional data based on line manifold searching and fusing 被引量:1

Linear manifold clustering for high dimensional data based on line manifold searching and fusing
下载PDF
导出
摘要 High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this problem.The basic idea was to search the line manifold clusters hidden in datasets,and then fuse some of the line manifold clusters to construct higher dimensional manifold clusters.The orthogonal distance and the tangent distance were considered together as the linear manifold distance metrics. Spatial neighbor information was fully utilized to construct the original line manifold and optimize line manifolds during the line manifold cluster searching procedure.The results obtained from experiments over real and synthetic data sets demonstrate the superiority of the proposed method over some competing clustering methods in terms of accuracy and computation time.The proposed method is able to obtain high clustering accuracy for various data sets with different sizes,manifold dimensions and noise ratios,which confirms the anti-noise capability and high clustering accuracy of the proposed method for high dimensional data. High dimensional data clustering, with the inherent sparsity of data and the existence of noise, is a serious challenge for clustering algorithms. A new linear manifold clustering method was proposed to address this problem. The basic idea was to search the line manifold clusters hidden in datasets, and then fuse some of the line manifold clusters to construct higher dimensional manifold clusters. The orthogonal distance and the tangent distance were considered together as the linear manifold distance metrics. Spatial neighbor information was fully utilized to construct the original line manifold and optimize line manifolds during the line manifold cluster searching procedure. The results obtained from experiments over real and synthetic data sets demonstrate the superiority of the proposed method over some competing clustering methods in terms of accuracy and computation time. The proposed method is able to obtain high clustering accuracy for various data sets with different sizes, manifold dimensions and noise ratios, which confirms the anti-noise capability and high clustering accuracy of the proposed method for high dimensional data.
出处 《Journal of Central South University》 SCIE EI CAS 2010年第5期1058-1069,共12页 中南大学学报(英文版)
基金 Project(60835005) supported by the National Nature Science Foundation of China
关键词 线性流形 高维数据 数据聚类 线搜索 数据集中 聚类算法 抗噪声能力 固有噪声 linear manifold subspace clustering line manifold data mining data fusing clustering algorithm
  • 相关文献

参考文献23

  • 1AGRAWAL R, GEHRKE J, GUNOPULOS D, RAGHAVAN E Automatic subspace clustering of high dimensional data [J]. Data Mining and Knowledge Discovery, 2005, 11(1): 5-33.
  • 2WITTEN D M, TIBSHIRANI R. A framework for feature selection in clustering [J]. J Am Stat Assoc, 2010, 105(490): 713-726.
  • 3ZHENG F, SHEN X, FU Z, ZHENG S, LI G. Feature selection for genomic data sets through feature clustering [J]. Int J Data Min Bioinform, 2010, 4(2): 228-240.
  • 4LIU H, YU L. Toward integrating feature selection algorithms for classification and clustering [J]. IEEE Trans Knowledge and Data Eng, 2005, 17(3): 1-12.
  • 5HAQUE P E, LIU H. Subspace clustering for high dimensional data: A review [J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 90-105.
  • 6HIRSCH M, SWIFT S, L1U X. Optimal search space for clustering gene expression data via consensus [J]. J Comput Biol, 2007, 14(10): 1327-1341.
  • 7AGRAWAL R, GEHRKE J, GUNOPULOS D, RAGHAVAN P. Automatic subspace clustering of high dimensional data for data mining applications [C]// Proceedings of the ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 1998: 94-105.
  • 8PHAM D T, AFIFY A A. Clustering techniques and their applications in engineering [C]// Proceedings of the Institution of Mechanical Engineers. Washington: Professional Engineering Publishing, 2007: 1445-1459.
  • 9KA1LING K, KRIEGEL H P, KROGER P. Density-connected subspace clustering for high-dimensional data [C]// Proc Fourth SIAM Int'l Conf Data Mining. German: Lake Buena Vista FL, 2004: 246 257.
  • 10AGGARWAL C C, WOLF J L, YU P S, PROCOPIUC C, PARK J K. Fast algorithms for projected clustering [C]// Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 1999: 61-72.

二级参考文献2

共引文献2

同被引文献12

  • 1马江洪,葛咏.图像线状模式的有限混合模型及其EM算法[J].计算机学报,2007,30(2):288-296. 被引量:12
  • 2Reddy C and Aziz M S.Modeling local nonlinear correlationsusing subspace principal curves[J].Statistical Analysis andData Mining,2010,3(5):332-349.
  • 3Kumar S,Ong S H,Ranganath S,et al..Invariant textureclassification for biomedical cell specimens via non-linearpolar map filtering[J].Computer Vision and ImageUnderstanding,2010,114(1):44-53.
  • 4Tong L and Hongbin Z.Riemannian manifold learning[J].IEEE Transactions on Pattern Analysis and MachineIntelligence,2008,30(5):796-809.
  • 5Xiang S M,Nie F P,Pan C H,et al..Regressionreformulations of LLE and LTSA with locally lineartransformation[J].IEEE Transactions on Systems,Man andCybernetics,Part B,2011,41(5):1250-1262.
  • 6Sun Y J,Todorovic S,and Goodison S.Local-learning-basedfeature selection for high-dimensional data analysis[J].IEEETransactions on Pattern Analysis and Machine Intelligence,2010,32(9):1610-1626.
  • 7Donoho D L and Grimes C.Hessian eigenmaps:locally linearembedding techniques for high-dimensional data[J].Proceedings of the National Academy of Sciences of theUnited States of America,2003,100(10):5591-5596.
  • 8Armstrong M A.Basic Topology[M].New York:Springer-Verlag,1997:43-51.
  • 9Bilmes J.A gentle tutorial on the EM algorithm and itsapplication to parameter estimation for Gaussian mixtureand hidden Markov models[R].Technique Report,ICSI-TR-97-021,University of California,Berkeley,USA,1997.
  • 10Cheng C H,Fu A W,and Zhang Y.Entropy-based subspaceclustering for mining numerical data[C].Fifth ACM SIGKDDInternational Conference on Knowledge Discovery and DataMining,San Diego,California,USA,August 1999:84-93.

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部