插入语分类抽取研究方法探讨

PROBING RESEARCH METHOD OF CLASSIFIED EXTRACTION OF PARENTHESES

下载PDF

导出

摘要命名实体、术语的翻译对自然语言处理,机器翻译性能的影响越来越得到重视,但是这些翻译很难从现有的翻译词典中获得充足的信息。提出了一种从网页中自动获取高质量命名实体短语翻译对的方法,首次探索了对双语文本中对齐缺失部分自动补充的方法。该方法利用网页双语翻译对的特点,使用统计判别模型,融合多种识别特征自动挖掘网站中存在的双语短语翻译三元对。实验结果表明,采用该模型能高效处理命名实体双语翻译对,正确率达到95.6%。 The effect of translations of named entities and terms on many application systems such as NLP and machine translation attracts more and more attention.However,these translations are hard to attain sufficient information from current bilingual dictionary.In this paper we propose a method to automatically acquire high quality phrase translation pairs of the named entities from web corpora,and explore for the first time the automatic complementary way for the lost part of the bilingual corpora.The method utilises the features of bilingual translation pairs in web pages,uses a statistical discriminative model and combines with multiple recognising features to automatically mine ternary bilingual phrases translation pairs in web stations.Experimental results show that the use of the model can effectively deal with bilingual translation pairs of the named entities with high accuracy of 95.6%.

作者周宥良狄萍贡正仙周国栋

机构地区苏州大学计算机科学与技术学院

出处《计算机应用与软件》 CSCD 2011年第4期33-36,共4页 Computer Applications and Software

基金国家自然科学基金资助项目(60673041)

关键词自然语言处理中文信息处理双语翻译对命名实体短语对齐补充 Natural language processing（NLP） Chinese information processing Bilingual translation pairs Named entities phrase Complement of bilingual alignment

分类号 TP391.2 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献12

1郭稷,吕雅娟,刘群.一种有效的基于Web的双语翻译对获取方法[J].中文信息学报,2008,22(6):103-109. 被引量：11
2Zhang Y,Vines P. Using the Web for Automated Translation Extraction in Cross2Language Information Retrieval [ C]//Proceedings of SIGIR 2004: 162-169.
3Huang F,Zhang Y ,Vogel S. Mining Key Phrase Translations from Web Corpora [ C ]//Proceedings of HL T2EMNLP 2005:483 - 490.
4Huang F, Vogel S, Waibel A. Automatic extraction of named entity translingual equivalence based on muhi-feature cost minimization [ C ]//Proceedings of ACL 2003 workshop on Muhilingual and mixed21anguage named entity recognition,9-16.
5Huang F, Vogel S. Improved Named Entity Translation and Bilingual Named Entity Extraction[ C]//Proceedings of ICMI 2002 , 253-258.
6张永臣,孙乐,李飞,李文波,西野文人,于浩,方高林.基于Web数据的特定领域双语词典抽取[J].中文信息学报,2006,20(2):16-23. 被引量：11
7Naoaki Okazaki, Sophia Ananiadou. A Discriminative Alignment Model for Abbreviation Recognition International Conference On Computational Linguistics Proceedings of the 22nd International Conference on Computational Linguistics 2008.
8Andrew Galen , Jianfeng Gao. Scalable training of Ll-regularized loglinear models [ C ]//Proceedings of the 24th International Conference on Machine Learning ( ICML 2007), 2007:33-40.
9Blunsom Phil ,Trevor Cohn. Discriminative word alignment with conditional random fields [ C ]//Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics ( Coling-ACL 2006 ), 2006 : 65 - 72.
10Jain Alpa, Silviu Cucerzan, Saliha Azzam. Acronym-expansion recognition and ranking on the web [ C ]//Proceedings of the IEEE International Conference on Information Reuse and Integration ( IRI 2007 ), 2007:209 - 214.

二级参考文献21

1许勇,荀恩东,贾爱平,宋柔.基于互连网的术语定义获取系统[J].中文信息学报,2004,18(4):37-43. 被引量：13
2张永臣,孙乐,李飞,李文波,西野文人,于浩,方高林.基于Web数据的特定领域双语词典抽取[J].中文信息学报,2006,20(2):16-23. 被引量：11
3Y. Zhang and P. Vines. Using the Web for Automated Translation Extraction in Cross-Language Information Retrieval [C]//the Proceedings of SIGIR 2004, 162-169.
4F. Huang, Y. Zhang and S. Vogel. Mining Key Phrase Translations from Web Corpora[C]//the Proceedings of HLT-EMNLP 2005: 483-490.
5F. Huang, S. Vogel and A. Waibel. Automatic extraction of named entity translingual equivalence based on multi-feature cost minimization[C]//the Proceedings of ACL 2003 workshop on Multilingual and mixed-language named entity recognition,9-16.
6F. Huang and S. Vogel. Improved Named Entity Translation and Bilingual Named Entity Extraction [C]//the Proceedings of ICMI 2002, 253-258.
7Y. Zhang and P. Vines. Detection and Translation of OOV Terms Prior to Query Time[C]//the Proceed ings of SIGIR2004,524-525.
8G. H. Cao, J. F. Gao and J. Y. Nie. A System to Mine Large-Scale Bilingual Dictionaries from Monolingual Web Pages[C]//MT Summit XI, 57-64.
9M. Collins and N. Dully. New Ranking Algorithms for Parsing and Tagging: Kernel over Discrete Struc- tures, and the Voted Perceptron[C]//the Proceedings of ACL2002, 263-270.
10M. Collins and T. Koo. Discriminative Reranking for Natural Language Parsing[J]. Computational Linguistics,2005,31 25-70.

共引文献18

1郭稷,吕雅娟,刘群.一种有效的基于Web的双语翻译对获取方法[J].中文信息学报,2008,22(6):103-109. 被引量：11
2王东波,苏新宁.英汉双语句子级平行语料库自动构建[J].现代图书情报技术,2009(12):47-51. 被引量：4
3孙萌,梁颖红,葛运东,颜振祥,姚建民.基于平行语料库和网络的未登录词译文挖掘[J].江南大学学报（自然科学版）,2010,9(1):66-70.
4董燕举,白宇,蔡东风.基于Web的中英术语翻译获取方法研究[J].沈阳航空工业学院学报,2010,27(2):55-58. 被引量：2
5王东波,谢靖.英汉对照语言对自动获取[J].图书情报工作,2010,54(17):108-112.
6罗阳,季铎,张桂平,王莹莹.面向单一双语网页的双语资源挖掘方法[J].中文信息学报,2011,25(1):110-115. 被引量：5
7毛太田,傅佳.通过Web数据挖掘为VILA语言搜集多语种词汇[J].中国科技信息,2011(16):82-83.
8王澍,郑德权,赵铁军.大规模双语句对自动获取技术[J].智能计算机与应用,2012,2(3):72-75.
9张宁.自然语言处理中基于模板的汉语语句改写的方法[J].职业技术,2012(7):121-121.
10徐华,刘丹丹,钱龙华,周国栋.基于双语依存关系映射的中英文词表构建研究[J].中文信息学报,2013,27(1):15-20.

1王斌.基于未对齐汉英双语库的翻译对抽取[J].中文信息学报,2000,14(6):40-44. 被引量：4
2汉江边上.双语翻译令有道阅读胜人一筹[J].网友世界,2009(21):31-31.
3郭稷,吕雅娟,刘群.一种有效的基于Web的双语翻译对获取方法[J].中文信息学报,2008,22(6):103-109. 被引量：11
4王志军.不让Foxmail自动补全地址[J].计算机应用文摘,2007(10X):110-110.
5网络[J].电脑迷,2008,0(18):99-99.
6怎样让IE4.0自动补充网站地址字符？[J].电脑爱好者,2000(19):111-112.
7段建勇,闫启伟,张梅,胡熠.维基百科中翻译对的模板挖掘方法研究[J].中文信息学报,2015,29(2):190-198. 被引量：2
8李婕.浅析英语中的“插入语”[J].英语画刊（高级）,2014,0(8):11-12.
9豆豆.冲破语言的障碍免费在线翻译立大功[J].电脑爱好者（普及版）,2008,0(7):37-38.
10外语通——6款词典翻译类软件评测[J].大众软件,2009(16):32-39.

计算机应用与软件

2011年第4期

浏览历史

内容加载中请稍等...

插入语分类抽取研究方法探讨

参考文献12

二级参考文献21

共引文献18

相关作者

相关机构

相关主题

浏览历史