摘要
机器翻译中,在词性标注和句法语义分析阶段经常会遇到歧义。使用基于统计方法的词汇评分和句法语义评分就是对词性标注和句法语义分析阶段产生的歧义进行消歧。在用统计方法消歧时,经常遇到的一个现象就是数据稀疏问题。本文对词汇评分和句法语义评分遇到数据稀疏现象使用改进的Turing公式来平滑参数。给出平滑算法对词汇评分平滑的处理过程。在实验中给出语料与参数数量、正确率的实验结果。
Ambiguities are often met in parts of speech tagging, syntactic and semantic analytical phase for machine translation. Lexical, syntactic and semantic scores based on statistical method are used to disambiguate. In this case, a data sparse problem is often met. An improved Turing formula is used to smooth parameters for data sparse problem in lexical, syntactic and semantic score phase. The process of smoothing parameter for lexical score is given.The experimental results of corpora, parameter number and accuracy rate are obtained in application.
出处
《计算机应用与软件》
CSCD
北大核心
2001年第11期56-59,共4页
Computer Applications and Software