基于分析和生成的复述与SMT语料扩展被引量：3

Parse-realize based paraphrasing and SMT corpus enriching

下载PDF

导出

摘要为了解决统计机器翻译语料对调序现象覆盖不足的问题,采用复述方法对语料进行扩展.提出了一种基于依存分析和句子生成的复述方法.对句子进行依存分析得到依存树,然后从依存树生成多个自然语言句子.生成的句子与原句相比没有词汇上的改变,但可以在词序方面进行变换.实验表明方法在不引入额外资源的前提下,有效缓解了语料覆盖不足的问题,提高了机器翻译质量. To resolve the low-coverage problem of the statistic machine translation training corpus,a dependency parsing and sentence realization based paraphrasing method is proposed.The input sentence is first parsed into a dependency tree,and then the tree is realized into multiple natural language sentences.Although the generated sentences have the same lexical words,the expressions of word orders are re-arranged.The experiments shows that the paraphrasing method can be used to enlarge the bilingual corpus for statistic machine translation and the method efficiently relieves the low-coverage problem of training corpora without any extra resources,finally the translation quality is improved.

作者和为刘挺

机构地区哈尔滨工业大学计算机科学与技术学院

出处《哈尔滨工业大学学报》 EI CAS CSCD 北大核心 2013年第5期45-50,共6页 Journal of Harbin Institute of Technology

基金国家自然科学基金面上资助项目(61073126 61133012) 国家高技术研究发展计划重大资助项目(2011AA01A207)

关键词复述统计机器翻译依存分析句子生成 paraphrase statistic machine translation dependency parsing sentence realization

分类号 TP391.2 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献12

1BARZILAY R, MCKEOWN K R. Extracting parap- hrases from a parallel corpus [ C ]//Proceedings of the 39th Annual Meeting on Association for Computational Linguistics. Stroudsburg, PA : Association for Computational Linguistics, 2001 : 50 - 57.
2KOEHN P, OCH F J, MARCU D. Statistical phrase- based translation [ C ]//Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language. Stroudsburg, PA : Association forComputational Linguistics, 2003 : 48 - 54.
3HE Wei, ZHAO Shiqi, WANG Haifeng, et al. Enri- ching SMT training data via paraphrasing [ C ]/! Proceedings of the 5th International Joint Conference on Natural Language Processing. Chiang Mai, Thailand: IJCNLP, 2011 : 803 -810.
4BOND F, NICHOLS E, APPLING D S, et al. Improving statistical machine translation by paraphrasing the training data [ C ]//Proceedings of the International Workshop on Spoken Language Translation (IWSLT). USA : Hawaii, 2008 : 150 - 157.
5NAKOV P. Improved statistical machine translation using monolingual paraphrases [ C ]//Proceedings of the 2008 Cmfference on ECAI 2008: 18th European Conference on Artificial Intelligence. The Netherlands: IOS Press Amsterdam, 2008 : 338 - 342.
6HE Wei, WANG Haifeng, GUO Yuqing, et al. De- pendency based Chinese sentence realization [ C 1// Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Stroudsburg, PA : Association for Computational Linguistics, 2009:809 - 816.
7COVINGTON M A. A fundamental algorithm for dependency parsing [ C ]//Proceedings of the 39th Annual ACM Southeast Conference. New York: ACM, 2001 : 95 - 102.
8DU Jinhua, JIANG Jie, WAY A. Facilitating translation using source language paraphrase lattices [ C ]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA : Association for Computational Linguistics, 2010 : 420 - 429.
9KOEN P, HOANG Hien, BIRCH A, et al. Moses: open source toolkit for statistical machine translation [ C // Proceedings of the d5th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. ACL Demo and Poster Sessions. Stroudsburg, PA : Association for Computational Linguistics, 2007:177 - 180.
10OCH F J, NEY H. Improved statistical alignment models[ C ]//Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. Stroudsburg, PA : Association for Computational Linguistics, 2000 : 440 - 447.

同被引文献16

1宋培建,陈文波,曾庆丰,黄丽华.基于行为金融的投资收益率差异研究[J].软科学,2007,21(4):9-12. 被引量：3
2Zuhal C,Cigdem B.The Impact of Consumer Ethnocentrism,Product Involvement,and Product Knowledge on Country of Origin Effects:An Empirical Analysis on Turkish Consumers'Product Evaluation[J].Journal of International Consumer Marketing,2014,26(4):284-310.
3Gollwitzer P M,Sheeran P.Self-regulation of Consumer Decision Making and Behavior:The Role of Implementation Intentions[J].Journal of Consumer Psychology,2009,19:593-607.
4Gollwitzer P M,Sheeran P.Implementation Intentions and Goal Achievement:A Meta-analysis of Effects and Processes[J].Advances in Experimental Social Psychology,2006,38:69-119.
5Wu H.Examining Students'Online Interaction in A Live Video Streaming Environment Using Data Mining and Text Mining[J].Computers in Human Behavior,2013,29(1):90-102.
6郝媛媛,叶强,李一军.基于影评数据的在线评论有用性影响因素研究[J].管理科学学报,2010,13(8):78-88. 被引量：238
7马绍奇,焦璨,张敏强.社会网络分析在心理研究中的应用[J].心理科学进展,2011,19(5):755-764. 被引量：38
8刘挺,车万翔,李正华.语言技术平台[J].中文信息学报,2011,25(6):53-62. 被引量：50
9冯芷艳,郭迅华,曾大军,陈煜波,陈国青.大数据背景下商务管理研究若干前沿课题[J].管理科学学报,2013,16(1):1-9. 被引量：512
10李东进,吴波,李研.远程购物环境下退货对购后后悔影响研究[J].南开管理评论,2013,16(5):77-89. 被引量：21

引证文献3

1李苏,顾伯林.康莱特注射液结合辨证治疗晚期肺癌41例[J].实用中医药杂志,2000,16(2):8-8.
2张亚明,赵杨,王林.基于执行意向理论的网购评论行为反应模式研究[J].软科学,2016,30(7):118-123. 被引量：3
3贾承勋,赖华,余正涛,文永华,于志强.基于枢轴语言的汉越神经机器翻译伪平行语料生成[J].计算机工程与科学,2021,43(3):542-550. 被引量：6

二级引证文献9

1王林,骆冬嬴,释海璋,赵杨.基于执行意向的网购情景线索与行为反应关联模型研究[J].软科学,2018,32(1):118-121. 被引量：4
2张诗林.基于Bi-LSTM和CRF的中文网购评论中商品属性提取[J].计算机与现代化,2019(2):93-97. 被引量：4
3万年红,王雪蓉.基于情境大数据的移动网购行为执行意向预测模型[J].计算机时代,2019(4):26-29.
4贾承勋,赖华,余正涛,文永华,于志强.融合单语语言模型的汉越伪平行语料生成[J].计算机应用,2021,41(6):1652-1658. 被引量：2
5李洪政,冯冲,黄河燕.稀缺资源语言神经网络机器翻译研究综述[J].自动化学报,2021,47(6):1217-1231. 被引量：17
6景艳梅.机器翻译对提高云南高校图书馆开展留学生服务的思考[J].内蒙古科技与经济,2021(12):127-129. 被引量：1
7王晶,赵彩.基于平行语料库的神经机器英语翻译方法研究[J].自动化与仪器仪表,2021(8):5-8. 被引量：1
8黎家全,王丽清,李鹏,蒋晓敏,徐永跃.面向神经机器翻译的枢轴方法研究综述[J].计算机工程与应用,2022,58(16):49-55. 被引量：1
9杨雪晴.基于语音识别的英语翻译器设计[J].自动化与仪器仪表,2022(8):221-225. 被引量：3

1李堂秋,卢伟.基于语义的中文句子的直接生成方法[J].厦门大学学报（自然科学版）,1998,37(5):650-657. 被引量：1
2刘欣娟.句子生成能力对EFL学习者写作能力的制约作用[J].中国西部科技,2015,14(4):84-85.
3谭长庚,胡志刚,王鲲.一种受限的自然语言通信方法研究与实现[J].计算机工程与应用,2002,38(13):176-179.
4李芬兰.支持MIS的智能检索方法[J].广西师范大学学报（自然科学版）,1996,14(2):28-32.
5黄韵竹,韦玮,罗杨宇,李成荣.限定领域语言模型训练语料的词类扩展方法[J].计算机系统应用,2011,20(11):55-58. 被引量：1
6沈扬,陈海明.基于上下文依赖规则覆盖的句子生成[J].计算机工程与应用,2005,41(17):96-100. 被引量：3
7王泓皓,董韫美.基于产生式集划分的上下文无关语言句子生成[J].软件学报,2000,11(8):1030-1034. 被引量：3
8施心陵,王逍,张榆锋,汪源源.基于实时联想的医学诊断报告书语言生成器[J].云南大学学报（自然科学版）,2003,25(3):217-220. 被引量：1
9郑黎晓,许智武,陈海明.基于文法分支覆盖的短句子生成算法[J].软件学报,2011,22(11):2564-2576. 被引量：4
10徐晓丹.基于子主题和用户查询的多文档摘要系统[J].计算机系统应用,2011,20(3):112-115. 被引量：5

哈尔滨工业大学学报

2013年第5期

浏览历史

内容加载中请稍等...

基于分析和生成的复述与SMT语料扩展被引量：3

参考文献12

同被引文献16

引证文献3

二级引证文献9

相关作者

相关机构

相关主题

浏览历史

基于分析和生成的复述与SMT语料扩展 被引量：3

参考文献12

同被引文献16

引证文献3

二级引证文献9

相关作者

相关机构

相关主题

浏览历史

基于分析和生成的复述与SMT语料扩展被引量：3