基于特征比较和最大熵模型的统计机器翻译错误检测

Error Detection for Statistical Machine Translation Based on Feature Comparison and Maximum Entropy Model Classifier

下载PDF

导出

摘要首先介绍3种典型的用于翻译错误检测和分类的单词后验概率特征,即基于固定位置的词后验概率、基于滑动窗的词后验概率和基于词对齐的词后验概率,分析其对错误检测性能的影响;然后,将其分别与语言学特征如词性、词及由LG句法分析器抽取的句法特征等进行组合,利用最大熵分类器预测翻译错误,并在汉英NIST数据集上进行实验验证和比较。实验结果表明,不同的单词后验概率对分类错误率的影响是显著的,并且在词后验概率基础上加入语言学特征的组合特征可以显著降低分类错误率,提高译文错误预测性能。 The authors firstly introduce three typical word posterior probabilities （WPP） for error detection and classification, which are fixed position WPP, sliding window WPP, and alignment-based WPP, and analyzes their impact on the detection performance. Then each WPP feature is combined with three linguistic features （Word, POS and LG Parsing knowledge） over the maximum entropy classifier to predict the translation errors. Experimental results on Chinese-to-English NIST datasets show that the influences of different WPP features on the classification error rate （CER） are significant, and the combination of WPP with linguistic features can significantly reduce the CER and improve the prediction capability of the classifier.

作者杜金华王莎

机构地区西安理工大学自动化与信息工程学院

出处《北京大学学报（自然科学版）》 EI CAS CSCD 北大核心 2013年第1期81-87,共7页 Acta Scientiarum Naturalium Universitatis Pekinensis

基金国家自然科学基金(61100085) 陕西省教育厅专项科研计划项目(11JK1029) 西安理工大学青年科技研究计划项目(105211017)资助

关键词错误检测词后验概率语言学特征最大熵分类器 error detection word posterior probability linguistic features maximum entropy classifier

分类号 TP391.2 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献14

1Yamada K, Knight K. A syntax-based statistical translation model//Proceedings of ACL-EACL. Tou- louse: Morgan Kaufmann, 2001:523-530.
2Koehn P, Och F J, Marcu D. Statistical phrase-based translation//Proceedings of HLT-NAACL. Edmonton: Association for Computational Linguistics, 2003: 127-133.
3Chiang D. A hierarchical phrase-based model for statistical machine translation//Proceedings of ACL. Ann Arbor: Association of Computational Linguistics, 2005:263-270.
4Gandrabur S, Foster G. Confidence estimation for translation prediction//Proceedings of HLT-NAACL. Sapporo: Association for Computational Linguistics,2003:95-102.
5Ueffing N, Macherey K, Ney H. Confidence measures for statistical machine translation // Proceedings of MT Summit IX. New Orleans: Springer-Verlag, 2003: 394-401.
6Blatz J, Fitzgerald E, estimation for machine COLING. Geneva: Yale 321 Foster G, et al. Confidence translation // Proceedings of University Press, 2004: 315-.
7Ueffing N, Ney H. Word-Level confidence estimation for machine translation. Computational Linguistics, 2007, 33(1): 9-40.
8Specia L, Cancedda N, Dymetman M, et al. Estimating the sentence-level quality of machine translation systems//Proceedings of the 13th EAMT. Barcelona: European Association for Machine Translation, 2009:28-35.
9Speeia L, Saunders C, Turchi M, et al. Improving the confidence of machine translation quality estimates// Proceedings of the 12th MT Summit. Ottawa: Inter- national Association for Machine Translation, 2009: 136-143.
10Xiong Deyi, Zhang Min, Li Haizhou. Error detection for statistical machine translation using linguistic features // Proceedings of the 48th ACL. Uppsala: Association for Computational Linguistics, 2010: 604-611.

1王莎,杜金华,刘丁.基于多特征融合的统计机器翻译译文错误检测[J].西安理工大学学报,2013,29(1):32-37.
2安强强,张蕾.基于依存树的中文语义角色标注[J].计算机工程,2010,36(4):161-163. 被引量：7
3张思聪,谢晓尧,景凤宣,徐洋.基于最大熵模型的XSS攻击检测模型[J].武汉大学学报（理学版）,2016,62(2):177-182. 被引量：7
4陈文亮,朱慕华,朱靖波,姚天顺.基于Bootstrapping的文本分类模型[J].中文信息学报,2005,19(2):86-92. 被引量：6
5王步康,王红玲,袁晓虹,周国栋.基于依存句法分析的中文语义角色标注[J].中文信息学报,2010,24(1):25-29. 被引量：23
6刘挺,车万翔,李生.基于最大熵分类器的语义角色标注[J].软件学报,2007,18(3):565-573. 被引量：73
7胡占义,杨长江,马颂德.Hough变换的新定义[J].计算机学报,1997,20(8):744-752. 被引量：11
8李艳翠,林莉媛,周国栋.基于有监督学习方法的多文档文本情感摘要[J].中文信息学报,2014,28(6):143-149. 被引量：3
9胡龙茂.中文在线评论的用户性别判定研究[J].通化师范学院学报,2016,37(12):69-72. 被引量：1
10丁金涛,王红玲,周国栋,朱巧明,钱培德.语义角色标注中特征优化组合研究[J].计算机应用与软件,2009,26(5):17-21. 被引量：7

北京大学学报（自然科学版）

2013年第1期

浏览历史

内容加载中请稍等...

基于特征比较和最大熵模型的统计机器翻译错误检测

参考文献14

相关作者

相关机构

相关主题

浏览历史