期刊文献+

一种支持混合语言的并行查询纠错方法 被引量:1

Aparallel Query Correction Method for Mixed Language
下载PDF
导出
摘要 中文信息检索系统中的查询语句包含中文字、拼音、英文等多种形式,而有些查询语句过长,不利于纠错处理。现有的查询纠错方法不能很好的解决中文检索系统中的混合语言与中文长查询的问题。为了解决上述两个问题,该文提出了一种支持混合语言的并行纠错方法。该方法通过对混合语言统一编码,建立统一编码语言模型和异构字符词典树,并根据语言特点制定相应的编辑规则对查询词语进行统一处理,其中,针对中文长查询,提出双向并行的纠错模型。为了并行处理查询语句,我们在字符词典树和语言模型的基础上提出了逆向字符词典树和逆向语言模型的概念。模型中使用的训练语料库是从用户查询日志、网页点击日志、网页链接信息等文件中提取的高质量文本。实验表明,与单向查询纠错相比,支持混合语言的并行纠错方法在准确率上提升了9%,召回率降低了3%,在速度上提升了40%左右。 Query in Chinese information retrieval system often contains Chinese,Chinese phonetic alphabet and English etc.Existing method can not solve the issue of mixed language and long Chinese query.In order to solve these problems,we propose a parallel query correction method for mixed language.The method establishes language model with mixed language and built the heterogeneous character dictionary tree according to the corresponding edit rules to process the query words.For the long Chinese query,we put forward spell correction model of two-way parallel.For paralle processing,we put forward the concept of reverse character dictionary tree and reverse language model.The training corpus used in the model is extracted from the user query log,click log,web links and other information.Experiment shows that the parallel query correction method for mixed language increases the accuracy by 9%,reduces the recall by 3%,and,especially,speeds up the processing by 40% compared to single pass query correction.
出处 《中文信息学报》 CSCD 北大核心 2016年第2期99-106,共8页 Journal of Chinese Information Processing
基金 国家重点基础研究发展规划(973计划)项目(2014CB340406 2012CB316303 2013CB329602) 国家自然科学基金(61173064 61300206) 国家科技支撑计划项目(2015BAK20B03) 国家科技支撑计划课题(2011BAH11B02) 国家242专项(2013G129) 国家科技支撑专项(2012BAH46B04)
关键词 查询纠错 词典树 语言模型 并行纠错 spell correction dictionary tree language module parallel spell check
  • 相关文献

参考文献1

二级参考文献10

  • 1施得胜等.基于统计的中文错字侦测法[J].电脑与通讯,1992,(8).
  • 2Zhang Zhaohuang. A Pilot Study on Automatic Chinese Spelling Error Correction. Communication of COLIPS,1994,4(2): 143 - 149
  • 3Lei Zhang, Ming Zhou, Changning Huang, etc. Multifeature- based approach to automatic error detection and correction of Chinese text. In Proc. Workshop NLPRS'99, Beijing. 1999.
  • 4Sun Cai. Research on Lexical Error Detection and Correction of Chinese Text: [Master's Degree Dissertation]. Beijing: Tsinghua University Computer Science and Technology Department, 1997
  • 5Golding A R. A Bayesian hybrid method for context - sensitive spelling correction. In: Proc. 3rd Workshop on Very Large Corpora, Boston, MA: 1995
  • 6Golding A R, Dan R. Applying Winnow to context- sensitive spelling correction. In: Proc. the 13th ICML,Bari, Italy: 1996
  • 7Yarowsky D. Decision list for lexical ambiguity resolution: Application to accent restoration in Spanish and French. In: Proc. 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, NM:1994
  • 8Karen Kukich. Techniques for automatically correcting words in text. ACM Computing Surveys, 1992, 24 (4):377-439
  • 9Chi-Hong Leung, Wing - Kay Kan. Difficulties in Chinese typing error detection and ways to the solution.Computer Processing of Oriental Languages. 1996, 10(1) :97-113
  • 10Slava M Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE transactions on ASSP. 1987, 35(3) :400-401

共引文献22

同被引文献10

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部