期刊文献+

基于平行语料和翻译概率的多语种词对齐方法 被引量:5

Words Alignment in Parallel Corpus Based on Translation Probability
下载PDF
导出
摘要 为了实现多语种词对齐,该文提出一种以点互信息为基础的翻译概率作为改进的多语种单词关联强度度量方法。首先,论证了在服从Zipf定律的普通频级词区域,单词间关联强度的点互信息度量法可简化为翻译概率;其次,对汉语、英语、朝鲜语平行语料进行句子对齐、分词和去停用词等预处理后计算平行语料单词之间的翻译概率,取翻译概率最高的前k个词作为候选翻译词,并通过优化处理提高了词对齐准确率。实验结果表明,该方法可以不完全依赖语料规模,在小规模语料中取得94%以上的准确率,为跨语言小众文献及低资源语言词对齐提供了技术基础。 In order to achieve multi-language word alignment,an improved multi-language word relevance measure based on PMI translation probability is proposed.Firstly,it is proved that the PMI measure method of the correlation strength between words can be simplified to translation probability in the region of ordinary frequency grade words obeying Zipf’s law.Secondly,the translation probability between parallel corpus words is calculated after pre-processing of Chinese,English and Korean parallel corpus,and the top k words with the highest translation probability are chosen as candidate translation words.Further optimization is applied to improve the word alignment accuracy.The experimental results show that this method can obtain more than 94%accuracy in small-scale corpus,which provides a solution to the low-resource language word alignment.
作者 杨飞扬 赵亚慧 崔荣一 易志伟 YANG Feiyang;ZHAO Yahui;CUI Rongyi;YI Zhiwei(Intelligent Information Processing Laboratory,Department of Computer Science&Technology,Yanbian University,Yanji,Jilin 133002,China)
出处 《中文信息学报》 CSCD 北大核心 2019年第12期37-44,共8页 Journal of Chinese Information Processing
基金 国家语委“十三五”科研规划项目(YB135-76) 延边大学外国语言文学世界一流学科建设科研项目(18YLPY13,18YLPY14)
关键词 词对齐 平行语料 翻译概率 Zipf定律 word alignment parallel corpus translation probability Zipf’s law
  • 相关文献

参考文献4

二级参考文献73

  • 1黄河燕,张克亮,张孝飞.基于本体的专业机器翻译术语词典研究[J].中文信息学报,2007,21(1):17-22. 被引量:10
  • 2孙大飞,Dempster A P, Laird N M, et al. Maximum likelihood from Incomplete data via the EM algorithm[J ]. Journal of the Royal Statistical Society, Series B, 1997,39(1) :1-38.
  • 3Meng X L, Rubin D B. Recent Extension to the EM algorithm[M]. Bayesian Statistics 4. Oxford: Oxford University Press, 1992: 307 - 320.
  • 4Andrieu C,Doucet A. Online Expection- Maximization Type Algorithms for Parameter Estimation in General State Space Models[C]//in Proc. IEEE Int. Conf. Aooustics, Speech, and Signal Processing. [s. l. ] : [s. n. ] ,2003:69- 72.
  • 5贾沛璋,朱征桃.最优估计及其应用[M].北京:科学出版社,1994.
  • 6Parzen E. On the estimation of a probability density function andmode [ J ]. Annals of Mathematical Statistics, 1962,33 : 1065 - 1076.
  • 7Wang A P, Wang H. Minimising entropy and mean tracking control for affine nonlinear and non - Gaussian dynamic stochastic system[J]. IEE Proceedings Control Theory & Application, 2005,151 (4) : 405 - 520.
  • 8Wang A P, Wang H, Tan J. Optimal Filtering for Multivariable Stochastic System via Residual Probability Density Function Shaping[ C]//Proceedings of SICE 2005 Annual Corderence. [s. l. ] : [s. n. ] ,2005:215 - 219.
  • 9Guo L, Wang H. Mininum entropy filtering for multivariate stochastic systems with non- Gaussian noises [ J ]. IEEE Transactions on Automatic Control,2006,51(4) :670 -695.
  • 10郝晓燕,刘伟,李茹,刘开瑛.汉语框架语义知识库及软件描述体系[J].中文信息学报,2007,21(5):96-100. 被引量:52

共引文献180

同被引文献32

引证文献5

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部