期刊文献+

基于自动句对齐的相似古文句子检索 被引量:15

Ancient Sentence Search Based on Sentence AutoAlignment in Parallel Corpus of Ancient and Modern Chinese
下载PDF
导出
摘要 随着语料库语言学的兴起,基于实例的机器翻译(EBMT)得到越来越多的研究。如何快速准确地构建大规模古今汉语平行语料库,以及从大量的对齐实例(句子级)中检索和输入句子最相似的源句子是基于实例的古今汉语机器翻译必须解决的问题。本文综合考虑句子长度、汉字字形、标点符号三个因素提出了古今汉语句子互译模型,基于遗传算法、动态规划算法实现了古今汉语的自动句对齐。接着为古文句子建立全文索引,基于汉字的信息熵,本文设计与实现一种高效的最相似古文句子检索算法。最后给出了自动句对齐和最相似古文句子检索的实验结果。 Along with the Corpus Linguistics' prosperity and development, the research on Example Based Machine Translation (EBMT) has a flourishing prospect. In this area, two problems must he solved: 1) Constructing a large --scale parallel corpus with high accuracy and speed. 2) Searching the most similar sentence with the input sentence from the huge aligned examples. This paper aimed at EBMT between ancient and modern Chinese. First, a new translation model was built which takes the length of the sentence, character information and punctuation into account at the same time. Then, a new approach for aligning bilingual sentences automatically was proposed based on genetic algorithm and Dynamic Programming. Finally, a new similarity method was given based on Chinese characters' information entropy. Experimental results showed that our methods achieved good performance.
出处 《中文信息学报》 CSCD 北大核心 2008年第2期87-91,105,共6页 Journal of Chinese Information Processing
基金 国家社科基金资助项目(05BYY022)
关键词 计算机应用 中文信息处理 古今汉语平行语料库 句子对齐 相似句子 基于实例的机器翻译 computer application Chinese information processing parallel corpus of ancient and modern Chinese sentence alignment similar sentence EBMT
  • 相关文献

参考文献9

  • 1李如龙.文言 白话 普通话 方言[J].语言文字应用,2003(4):2-9. 被引量:9
  • 2W. A. Gale,K. W. Church. A Program for Aligning Sentences in Bilingual Corpora[J]. Computational Linguistics, 1993, 19(1), 75-102.
  • 3P. F. Brown,J. C. Lai, R. L. Mercer. Aligning Sentences:in Parallel Corpora[A]. Proe. of the 29th Annual Meeting of the ACL-29[C]. 1991, 169-176.
  • 4M. Kay, Martian. Roscheisen. Text-Translation Alignment[J]. Computational Linguistics, 1993,19 ( 1 ) : 121-142.
  • 5S. F. Chen. Aligning Sentences in Bilingual Corpora Using Lexical Information[A]. Proc. of the 31st Annual Meeting of the ACL-31[C]. 1993,9-16.
  • 6Dekai Wu, Pascale Fung. Improving Chinese tokenization with linguistic filters on statistical lexical acquisition[A]. Morgan Kaufmann Publishers Inc. 1994.
  • 7Masahiko Haruno, Takefumi Yamazaki, High-performance bilingual text alignment using statistical and dictionary information[A]. Association for Computational Linguisticsc[C]. 1996. 131-138.
  • 8张艳,柏冈秀纪.基于长度的扩展方法的汉英句子对齐[J].中文信息学报,2005,19(5):31-36. 被引量:24
  • 9Christopher D.Manning,Hinrich Schtze,苑春法,等译,统计自然语言处理[M].北京:电子工业出版社,2007,292-309.

二级参考文献5

共引文献31

同被引文献260

引证文献15

二级引证文献110

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部