期刊文献+

基于最优投影和动态阈值的最近邻搜索算法 被引量:2

An Optimal Projection and Dynamic Threshold Based Nearest Neighbor Search Algorithm
下载PDF
导出
摘要 作者在前人工作成果的基础上,提出并实现了一种基于最优投影和动态阈值调整的最近邻搜索算法DTA(Dynamic Threshold Algorithm);证明了最优投影线定理和投影邻域定理;并分析了DTA算法与SNN算法相比在算法性能上的优势.实验结果表明,当数据规模增大时,DTA算法的运行时间增加相对缓慢,在大规模数据集上DTA算法的运行时间可达传统算法的10%以下;DTA算法对阈值的变化不敏感,能适应不同分布的数据集合. The problem of nearest neighborsearch in high dimensional data set frequently appears in the field of data mining, especially in classificationand cluster analysis. It' s also an important toolfor traditional Chinese medicine analysis. This paper absorbs ideas of previous researches and proposes a novel nearest neighbor search algorithm based on dynamic threshold. The main contribution includes: (1) Proving optimal projection line theoremand projection neighbor theorem; (2) Proposing an optimal projection line and dynamic threshold based nearest neighbor search algorithm: DTA (Dynamic Threshold Algorithm) ; (3) Comparing DTA with SNN (Search Nearest Neighbor), and analyzing advantages of DTA. Experiments show that: as data scale grows, the running time of DTA grows relatively slowly, and on large data sets, the running time of DTAis10% less than that of the traditional algorithm; DTA is insensitive to threshold and adaptive to data sets of various distributions.
出处 《四川大学学报(自然科学版)》 CAS CSCD 北大核心 2006年第4期777-782,共6页 Journal of Sichuan University(Natural Science Edition)
基金 国家自然科学基金(60473071) 国家自然科学基金(90409007) 高等学校博士学科点专项科研基金(SRFDP20020610007)
关键词 最近邻搜索 最优投影线 数据挖掘 nearest neighbor search optimal projection line data mining
  • 相关文献

参考文献5

  • 1LI C, TANG C J, PENG J, et al. TCMiner: A High Performance Data Mining System for Multi-dimensional Data Analysis of Traditional Chinese Medicine Prescriptions[C]. Berling :Springer-Verlag Berling Heidelberg ,2004.
  • 2Han J W, Kambr M. Data Mining Concepts and Techniques[M]. Beijing: Higher Education Press, 2001.
  • 3PerezJ C, Vidal E. An Approximate Nearest Neighbours Search Algorithm based on the Extended General Spacefilling Curves Heuristic[C]. Australia: [s. n. ], 1998.
  • 4胡建军,唐常杰,李川,彭京,元昌安,陈安龙,蒋永光.基于最近邻优先的高效聚类算法[J].四川大学学报(工程科学版),2004,36(6):93-99. 被引量:24
  • 5边肇旗 张学工.模式识别[M].北京:清华大学出版社,2000..

二级参考文献8

  • 1Han J W, Kambr M. Data mining concepts and techniques[M]. Beijing: Higher Education Press, 2001. 145~176.[2]Kaufan L, Rousseeuw P J. Finding groups in data: an introduction to cluster analysis[M]. New York: John Wiley & Sons, 1990.
  • 2Guha S, Rastogi R, Shim K. CURE: an efficient clustering algorithm for large databases[A]. Haas L M, Tiwary A. Proceedings of the ACM SIGMOD International Conference on Management of Data[C]. Seattle: ACM Press, 1998. 73~84.
  • 3Ester M, Kriegel H P, Sander J, et al. A density based algorithm for discovering clusters in large spatial databases with noise[A]. Simoudis E, Han J W, Fayyad U M. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining[C].
  • 4Agrawal R, Gehrke J, Gunopolos D, et al. Automatic subspace clustering of high dimensional data for data mining application[A]. Haas L M, Tiwary A. Proceedings of the ACM SIGMOD International Conference on Management of Data[C]. Seattle: ACM Press, 1998.
  • 5Zhang T,Ramakrishnan R,Livny M. BIRCH:an efficient data clustering method for very large database[R].Computer Sciences Dept,Univ of Wisconsin-Madison,1995.
  • 6Zhang T,Ramakrishnan R,Livny M. BIRCH:an efficient data clustering method for very large databases[A]. Jagadish H V, Mumick I S. Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data[C]. Quebec: ACM Press, 1996.103~114.
  • 7Beyer K S,Goldstein J,Ramakrishnan R,et al. When is 'nearest neighbor' meaningful?[A].Beeri C,Buneman P.Proceedings of the 7th International Conference on Data Theory[C].ICDT'99. LNCS1540,Jerusalem, Israel: Springer, 1999.217~235.
  • 8Karypis G,Han E H,Kumar V. CHAMELEON: a hierarchical clustering algorithm using dynamic modeling[J].IEEE Computer,1999,32(8):68-75.

共引文献24

同被引文献10

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部