摘要
提出一种在数据网格环境下的书法字k近邻查询方法.当用户在查询结点提交一个查询书法字和k时,首先以一个较小的查询半径,在数据结点进行基于混合距离尺度的书法字过滤,然后将过滤后的候选书法字以“打包”传输的方式发送到执行结点,在执行结点并行地对这些候选书法字进行距离(求精)运算,最终将结果书法字返回到查询结点.当返回的书法字个数小于k时,扩大半径值,继续循环,直到得到k个最近邻书法字为止.理论分析和实验表明,该方法在减少网络通信开销、增加I/O和CPU并行、降低响应时间方面具有较好的性能.
In this paper, a novel k-Nearest Neighbor (k-NN) query over the Chinese calligraphic character databases based on Data Grid is proposed. First when user in the query node submits a query character and k, the character filtering algorithm is performed using the hybrid distance metric (HDM) index. Then the candidate characters are transferred to the executing nodes in a package mode. Furthermore, the refinement process of the candidate characters is conducted in parallelism to get the answer set. Finally, the answer set is transferred to the query node. If the number of answer set is less than k, then the query procedure is re-performed by increasing the query radius until the k nearest neighbor characters are obtained. The analysis and experimental results show that the performance of the algorithm is good in minimizing the response time by decreasing network transfer cost and increasing parallelism of I/O and CPU.
出处
《软件学报》
EI
CSCD
北大核心
2006年第11期2289-2301,共13页
Journal of Software
基金
国家自然科学基金No.60533090
国家杰出青年基金No.60525108
高等学校中英文图书数字化国际合作计划~~
关键词
中文书法字
K近邻查询
类超球
数据网格
Chinese calligraphic character
k-nearest neighbor query
cluster hypersphere
data grid