期刊文献+

基于Hash的正向回溯算法的改进 被引量:1

Improvement on forward backtracking algorithm based on Hash
下载PDF
导出
摘要 中文分词一直是中文类搜索引擎的重要前提之一。针对经典的机械分词方法中字符串匹配的最长匹配字的选择问题,提出了一种基于Hash的词典结构,避免了最长匹配字的过长或过短。对于歧义的发现,引入了回溯机制,即算法在每次查询词语完毕后,再以查询的词语的最后一个字为首字,开始进行新一轮的查询。对于回溯机制带来的查询次数倍增问题,提出对词语末字的检验是否能成为首字的算法,减少查询次数和时间复杂度。该方法相比于其他融合方法,具有较快的查询速度和较好的歧义处理能力。 Chinese word segmentation is one of the important preconditions of Chinese search engine. For the longest matching word selection in the string matching of classical method of mechanical word segmentation,this paper proposed a Hash-based dictionary structure,to avoid the longest matching word is too long or too short. For the discovery of ambiguity,the paper introduces the backtracking mechanism,that is,when the algorithm in each querying of word is completed,the algorithm query the last character of the word,finally using the last character of first word to start a new round of inquiry. However,the backtracking mechanism has brought about the problem of doubling the time of queries,so it proposed that the last character of the word can become the first word,reduces the number of queries and time complexity. Compared with other fusion methods,the proposed method has a faster searching speed and the ability to deal with ambiguity.
出处 《信息技术》 2017年第11期167-171,共5页 Information Technology
关键词 分词 Hash词典 回溯 尾字检验 segmentation Hash dictionary backtracking tail character test
  • 相关文献

参考文献4

二级参考文献43

共引文献67

同被引文献12

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部