期刊文献+

基于向量距离的词序相似度算法 被引量:10

Word Order Similarity Algorithm Based on Vector Distance
下载PDF
导出
摘要 手机POI搜索已经成为手机搜索的主要应用之一。该文结合手机搜索的特点以及POI数据的结构性特征采用简拼进行POI搜索。由于词序相似度是影响简拼搜索排序结果的主要因素,该文提出了基于向量距离计算词序相似度的算法。该算法采用空间向量模型作为简拼的表示方法,将提取的公共简拼映射为位置向量,进而利用位置向量间的距离计算词序相似度。通过理论分析,该算法相比基于逆序数的词序相似度算法,将时间复杂度由O(nlogn)降为O(n),空间复杂度由O(n)降为O(1)。实验结果表明,基于向量距离的词序相似度算法有效地保证了准确性,可以满足手机POI简拼搜索的应用需求,并在性能上将词序相似度的计算效率提高16.88%。 Mobile POI Search has become one of the main applications in Mobile Search. With the characters input for Mobile Search and the structural feature of POI data, Jianpin was used in the Mobile POI Search to improve the user experience. Since word order similarity is the main factor to the ranking results, an algorithm based on vector distance is devised to compute word order similarity in this paper. The algorithm first establishes the Jianpin vector space model, extracts the common part from the two Jianpin vectors and maps it into position vectors. Then it fig- ures out the similarity based on the distance between the position vectors. Theoretical analysis shows that, com- pared with the method based on reverted ordinal number, the proposed algorithm decreases the time complexity from O(nlogn) to O(n) and the space complexity from O(n) to O(1). Experimental results confirm that the proposed algorithm can ensure the precision and improve the efficiency by 16.88%.
出处 《中文信息学报》 CSCD 北大核心 2009年第3期45-50,共6页 Journal of Chinese Information Processing
关键词 计算机应用 中文信息处理 手机POI搜索 简拼搜索 词序相似度 向量距离 computer applications Chinese information processing mobile POI search jianpin search word order similarity vector distance
  • 相关文献

参考文献8

  • 1南铉国,崔荣一.基于多层次融合的语句相似度计算模型[J].延边大学学报(自然科学版),2007,33(3):191-194. 被引量:14
  • 2周法国,杨炳儒.句子相似度计算新方法及在问答系统中的应用[J].计算机工程与应用,2008,44(1):165-167. 被引量:45
  • 3吕学强,任飞亮,黄志丹,姚天顺.句子相似模型和最相似句子查找算法[J].东北大学学报(自然科学版),2003,24(6):531-534. 被引量:68
  • 4Possas B, Ziviani N, Meira W, Ribeiro-Neto B. Set- based vector model: An efficient approach for correlation based ranking [J]. ACM Transactions on Information Systems, 2005, 23(4) : 397-429.
  • 5Hammouda K M, KamelMS. Efficient phras-based document indexing for Web document clustering [J]. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(10):1279-1296.
  • 6同济大学数学教研室.线性代数[M].第3版.北京:高等教育出版社,1999..14-15.
  • 7Saraiva, P. C., Moura, E. S., Ziviani, N. Rank-Preserving two-level caching for scalable search engines [C]//Proceedings of the 24th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (New Orleans, Louisiana, United States). SIGIR'01. ACM Press, New York, NY, 2008, 51-58.
  • 8Jansen B. J., Spink, A., Bateman, J., and Saracevic, T. Real life information retrieval: A study of user queries on the web[C]//ACM SIGIR Forum. SIGIR'98. New York, NY, 1998, 32(1): 5-17.

二级参考文献23

共引文献111

同被引文献75

引证文献10

二级引证文献42

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部