摘要
作者在前人工作成果的基础上,提出并实现了一种基于最优投影和动态阈值调整的最近邻搜索算法DTA(Dynamic Threshold Algorithm);证明了最优投影线定理和投影邻域定理;并分析了DTA算法与SNN算法相比在算法性能上的优势.实验结果表明,当数据规模增大时,DTA算法的运行时间增加相对缓慢,在大规模数据集上DTA算法的运行时间可达传统算法的10%以下;DTA算法对阈值的变化不敏感,能适应不同分布的数据集合.
The problem of nearest neighborsearch in high dimensional data set frequently appears in the field of data mining, especially in classificationand cluster analysis. It' s also an important toolfor traditional Chinese medicine analysis. This paper absorbs ideas of previous researches and proposes a novel nearest neighbor search algorithm based on dynamic threshold. The main contribution includes: (1) Proving optimal projection line theoremand projection neighbor theorem; (2) Proposing an optimal projection line and dynamic threshold based nearest neighbor search algorithm: DTA (Dynamic Threshold Algorithm) ; (3) Comparing DTA with SNN (Search Nearest Neighbor), and analyzing advantages of DTA. Experiments show that: as data scale grows, the running time of DTA grows relatively slowly, and on large data sets, the running time of DTAis10% less than that of the traditional algorithm; DTA is insensitive to threshold and adaptive to data sets of various distributions.
出处
《四川大学学报(自然科学版)》
CAS
CSCD
北大核心
2006年第4期777-782,共6页
Journal of Sichuan University(Natural Science Edition)
基金
国家自然科学基金(60473071)
国家自然科学基金(90409007)
高等学校博士学科点专项科研基金(SRFDP20020610007)
关键词
最近邻搜索
最优投影线
数据挖掘
nearest neighbor search
optimal projection line
data mining