期刊文献+

引入集成学习的最大熵短语调序模型 被引量:3

An Ensemble Learning Method for Maximum Entropy Based Phrase Reordering Model
下载PDF
导出
摘要 基于最大熵的括号转录语法模型具有翻译能力强、模型训练简单的优点,成为近些年统计机器翻译研究的热点。然而,该模型存在短语调序实例样本分布不平衡的缺点。针对该问题,该文提出了一种引入集成学习的短语调序模型训练方法。在大规模数据集上的实验结果表明,我们的方法能有效改善调序模型的训练效果,显著提高翻译系统性能。 The Maximum Entropy Based BTG model becomes a hot topic in statistical machine translation in recent years due to its strong translation and easy to-train abilities. However, the distribution of reordering examples in this model is imbalanced. To solve this problem, we introduce an ensemble learning method for training phrase reor- dering model. Experimental results show that,the reordering model can reach a better training effect via our method and the performance of the translation system is improved significantly in a large-scale dataset.
出处 《中文信息学报》 CSCD 北大核心 2014年第1期87-93,共7页 Journal of Chinese Information Processing
基金 国家自然科学基金(61303082 61005052) 国家科技支撑计划(2012BAH14F03) 高等学校博士学科点专项科研基金(20120121120046)
关键词 最大熵 短语调序 不平衡分类 集成学习 maximum entropy phrase reordering imbalanced classifier ensemble learning
  • 相关文献

参考文献19

  • 1Dekai Wu. Stochastic Inversion Transduction Gram- mars and Bilingual Parsing of Parallel Corpora[J]. Computational linguistics, 1997,25(6) .. 377-403.
  • 2刘群.基于句法的统计机器翻译模型与方法[J].中文信息学报,2011,25(6):63-71. 被引量:16
  • 3Deyi Xiong,Qun Liu, Shouxun Lin. Maximum Entro- py Based Phrase Reordering Model for Statistical Ma- chine Translation[C]//Proceedings of the 21st Inter- national Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computa tional Linguistics,Sydney, Australia,2006: 521-528.
  • 4Deyi Xiong, Min Zhang, Aiti Aw. A linguistically an- notated reordering model for BTG-based statistical ma- chine translation[C]//Proceedings of the 46th Annual Meeting of the Association for Computational Linguis- tics on Human Language Technologies, Columbus, O- hio,USA,2008: 149 152.
  • 5Min Zhang, Haizhou Li. Tree kernel-based SVM with structured syntactic knowledge for BTG-based phrase reordering[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Suntec,Singapore,2009 : 698-707.
  • 6Jinsong Su, Yang Liu, Haitao Mi. Dependency-based bracketing transduction grammar for statistical ma- chine translation[C]//Proceedings of the 23rd Interna- tional Conference on Computational Linguistics, Bei- jing,China,2010 : 1185-1193.
  • 7Hanbin Chen,Jiancheng Wu, Jason S Chang. Learning bilingual linguistic reordering model for statistical ma- chine translation[C]//Proceedings of Human Lan- guage Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder Colorado, USA, 2009: 254-262.
  • 8孙萌,姚建民,吕雅娟,姜文斌,刘群.基于最大熵短语重排序模型的特征抽取算法改进[J].中文信息学报,2011,25(2):78-82. 被引量:3
  • 9Shoushan Li, Guodong Zhou, Zhongqing Wang. Im- balanced Sentiment Classification[C]//Proceedings ofthe 20th ACM international conference on information and knowledge management, Glasgow, Scotland, UK, 2011 2469-2472.
  • 10叶志飞,文益民,吕宝粮.不平衡分类问题研究综述[J].智能系统学报,2009,4(2):148-156. 被引量:72

二级参考文献63

  • 1全昌勤,何婷婷,姬东鸿,余绍文.基于多分类器决策的词义消歧方法[J].计算机研究与发展,2006,43(5):933-939. 被引量:8
  • 2Weiss G M. Mining with Rarity:A Unifying Framework[J]. SIGKDD Explorations, 2004,6(1) :7-19.
  • 3Weiss G M. Learning with Rare Cases and Small Disjunets [C]//Proc of the 12th Int'l Conf on Machine Learning, 1995:558-565.
  • 4Japkowicz N, Stephen S. The Class Imbalance Problem: A Systematic Study[J]. Intelligent Data Analysis Journal, 2002,6(5) :429 450.
  • 5Chawla N V, Bowyer K W, Hall I. O, et al. SMOTE: Synthetic Minority Over-Sampling Technique[J]. Journal of Artificial Intelligence Research, 2002,16(6) : 321-357.
  • 6Kubat M, Matwin S. Addressing the Curse of Imbalanced Data Sets:One Sided Sampling[C]//Proc of the 14th Int'l Conf on Machine Learning, 1997:179-186.
  • 7Chawla N, Lazarevic A, Hall L, et al. SMOTEBoost: Improving Prcdiction of the Minority Class in Boosting[C]// Proc of the 7th European Conf on Principles and Practice of Knowledge Discovery in Databases, 2003 : 107-119.
  • 8Fan W, Stofol S, Zhang J X. AdaCost: Misclassification Cost Sensitive Boosting[C]//Proc of the 16th Int'l Conf on Machine Learning, 1999: 97-105.
  • 9Joshi M V, Agarwal R C, Kumar V. Predicting Rare Classes: Can Boosting Make any Weak Learner Strong[C]//Proc of the 8th ACM SIGKDD Int'l Conf on Knowledge Discovery and Data Mining, 2002:297-306.
  • 10Zheng Z H, Srihari R. Optimally Combining Positive and Negative Features for Text Categorization[C]//Proc of the Int'l Conf on Machine Learning, 2003 : 241-245.

共引文献104

同被引文献17

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部