摘要
相似性的计算是 CBR和 k- NN等 L azy L earning研究中十分关键的问题 .研究了降低相似性计算代价的方法 ,并以 k- NN为例 ,介绍了基于部分特征的相似性算法和基于投影的相似性算法 ,它们能够通过减少计算距离过程中所涉及的特征数目来提高算法的效率 .实验表明效率的提高是明显的 ,其中基于部分特征的 k- NN算法效率提高 2 6%~ 2 8% ,基于投影的 k- NN算法效率提高 48%~ 83% .作者已将该算法应用到工程中 .
Similarity is a pivotal notion in research on lazy learning, such as case based reasoning and k NN (nearest neighbor). A method of how to decrease complexity of computing similarity is studied, and a similarity calculation algorithm is introduced, that is based on partial features and the similarity calculation algorithm that is based on projection. For briefness and clarity, they are described in the procedure of k NN: partial feature based k NN algorithm and projection based k NN algorithm. In the steps of acquiring distance, using only few features can improve efficiency. This improvement is remarkable in our experiment: the former increases about 26%~28%, and the latter increases from 48% to 83%. At the same time, those algorithms have been adapted in application successfully.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2000年第10期1166-1172,共7页
Journal of Computer Research and Development
基金
国家自然科学基金!(项目编号 6 980 3 0 10 )
国家"八六三"高技术研究发展计划基金资助!(项目编号 86 3 -5 11-946
86 3 -818-0 7)
关键词
相似性
计算方法
高维数据
数据采掘
数据库
similarity, data reduction, nearest neighbor, lazy learning, data mining, KDD