摘要
基于统计的词对齐方法需要大规模的双语语料作为输入,难以避免数据稀疏的问题并且算法时间开销大。针对句子或段落级的实时性对齐需求,提出了一种基于双向词典和语义相似度计算的高效词对齐算法,通过采用动态组块切分和匹配、基于知网的语义相似度计算、基于最大匹配的冲突消解和剪枝消歧等策略,有效地解决了由于翻译的灵活性和多样性带来的近似译文的词对齐问题。实验表明,该算法不仅继承了基于词典词对齐算法的优点,同时还改进了传统基于词典词对齐算法的不足,有效提升了词对齐的正确率和召回率,在小规模双语语料和实时性对齐方面具有更好的适用性。
Word-alignment based on statistical method requiresa large-scale bilingual corpus as input,soit is difficult to avoid the problem of data sparse and the algorithmtime overhead. This paper presents anefficient word-alignment algorithm based on bidirectional dictionary and semantic similarity calculation to satisfy the demand for real-time alignment of sentence or paragraph level. The approximate translation of word-alignment problem due to the flexibility and diversity of translation can beeffectively solved by taking dynamic block segmentation and matching,semantic similarity calculation based on the HowNet,the conflict resolution based on the maximum matching and the pruning disambiguation. Compared with the standard algorithm,the experimental results show that the accuracy rate and recall ratecan be effectively improved bythis alignment method on a small-scalebilingual corpus and real-timealignment with better adaptability.
出处
《沈阳航空航天大学学报》
2015年第2期67-74,共8页
Journal of Shenyang Aerospace University
基金
辽宁省百千万人才基金项目(项目编号:04021401)
关键词
词对齐
双向词典
动态组块切分和匹配
语义相似度计算
word-alignment
bidirectional dictionary
dynamic block segmentation and matching
semanticsimilarity calculation