期刊文献+

基于三角不等式原理的TTSAS聚类加速算法 被引量:1

Using Triangle Inequality to Accelerate TTSAS Cluster Algorithm
下载PDF
导出
摘要 顺序聚类算法是一种非常直接和快速的算法,并且不需要提前确定聚类个数。但是当处理海量数据时,时间效率仍然有待提高。TTSAS算法是两个阈值的顺序聚类算法,在此基础上,该文应用三角不等式原理提出了TI_TTSAS算法,该算法避免了冗余的距离计算,实验结果证明,相对于TTSAS算法,TI_TTSAS在速度上有很大程度的提高,数据规模越大,改进效果越明显。并且聚类效果保持了TTSAS算法的准确性。 Sequential algorithm is a straightforward cluster algorithm, and people do not have to provide the number of clusters in advance. However, when faced with large-scale data. the efficiency of the algorithm has need to be improved. Based on two-threshold sequential algorithm scheme(TTSAS), this article presents a new sequential algorithm TI TTSAS. which avoids unnecessary distance calculations by applying the triangle inequality. Experiments show that the new algorithm is more effective for datasets of more dimensions, and becomes more and more effective as the number of clusters increases. The results keeps the accuracy of TTSAS algorithm.
出处 《计算机工程》 EI CAS CSCD 北大核心 2006年第17期97-99,125,共4页 Computer Engineering
基金 甘肃省自然科学基金资助项目(3ZS051-A25-035) 甘肃省气象局创新基金资助项目(2005)
关键词 顺序聚类 三角不等式原理 两阈值顺序聚类算法 三角不等式顺序聚类 Sequence cluster Triangle inequality TTSAS TI_TTSAS
  • 相关文献

参考文献10

  • 1Han Jiawei,Kamber M.Data Mining:Concepts and Techniques[M].New York:Morgan Kaufmann,2001.
  • 2谷波,张永奎.文本聚类算法的分析与比较[J].电脑开发与应用,2003,16(11):4-6. 被引量:11
  • 3倪巍伟,陆介平,孙志挥.基于向量内积不等式的分布式k均值聚类算法[J].计算机研究与发展,2005,42(9):1493-1497. 被引量:15
  • 4西奥多利迪斯.李晶皎译.模式识别(第2版)[M].北京:电子工业出版社,2004-08.
  • 5Kennedy P J,Simoff S J,Skillicorn D,et al.Extracting and Explaining Biological Knowledge in Microarray Data[C].Proc.of the 8th Pacific-asia Conference on Knowledge Discovery and Data Mining,Sydney,2004.
  • 6Kainulainen J J.Clustering Algorithms:Basics and Visualization[EB/OL].http://www.niksula.cs.hut.fi/~jkainula/pdfs/clustering.pdf,2004.
  • 7Elkan C.Using the Triangle Inequality to Accelerate K-means[C].Proceedings of the 20th International Conference on Machine Learning,Washington D.C.,2003.
  • 8Andrew W M.The Anchors Hierarchy:Using the Triangle In-equality to Survive High Dimensional Data[C].Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence,2000.
  • 9Han Jiawei.How Can Data Mining Help Bio-data Analysis[C].Workshop on Data Mining in Bioinformatics,2002.
  • 10Pelleg D.Andrew Moore:X-means:Extending K-means with Efficient Estimation of Number of Clusters[C].Proceedings of the 17th International Conference on Machine Learning,2000.

二级参考文献16

  • 1Han Jiawei, Micheline. Data Mining: Concepts and Techniques.San Francisco: Morgan Kaufmann Publishers, 2000.
  • 2M. Ester, HP. Kriegel, J. Sander, et al. A density based algorithm of discovering clusters in large spatial databases with noise. In: E. Simoudis, Han Jiawei, U. M. Fayyad, eds. Proc.the 2nd Int'l Conf. Knowledge Discovery and Data Mining Portland. Menlo Park, CA: AAAI Press, 1996. 226~231.
  • 3Tian Zhang, Raghu Ramakrishnan, Miron Livny. BIRCH: An efficient data clustering method for very large databases. In: Proc.ACM SIGMOD Int'l Conf. Management of Data. New York:ACM Press, 1996. 73~84.
  • 4S. Guha, R. Rostogi, K. Shim. CURE: An efficient clustering algorithm for large databases. In: L. M. Haas, A. Tiwary, eds.Proc. the ACM SIGMOD Int'l Conf. Management of Data Seattle. New York: ACM Press, 1998. 73~84.
  • 5W. Zhnn, et al. Muntz. STING: A statistical information grid approach to spatial data mining. In: Proc. 23rd VLDB Conf.,San Francisco: Morgan Kaufrnann, 1997. 186~195.
  • 6S. Kantabutra, A. L. Couch. Parallel k-means clustering algorithm on Nows. NECTEC Technical Journal, 1999, 1 ( 1 ) :243~ 247.
  • 7Manasi N. Joshi. Parallel k-means algorithm on distributed memory multiprocessors. http:∥www. cs. umn. edu/~mnjoshi/PKMeans. pdf, 2003.
  • 8C. Pizzuti, D. Talia. P-Autoclass: Scalable parallel clustering for mining large data sets. IEEE Trans. Knowledge and Data Engineering, 2003, 15(6): 629~641.
  • 9O. Egecioglu, H. Ferhatosmanoglu, U. Ogras. Dimensionality reduction and similarity computation by inner-product approximates. IEEE Trans. Knowledge and Data Engineering,2004, 16(6): 714~726.
  • 10Maria Halkidi, Michalis Vazirgiannis. Clustering validity assessment: Finding the optimal partitioning of a data set. IEEE Int'l Conf. Data Mining, California, 2001.

共引文献24

同被引文献2

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部