摘要
电子词典是在机器翻译系统中包含的信息量最大的一个部件,电子词典包的质量和容量直接限定机器翻译的质量和应用范围。与一般的电子词典不同,机器翻译词典每个词条都要比一般的电子词典增加词类信息、语义类别信息和成语等。文章以频率统计和频率分布统计作为维汉机器翻译词典的词条收录原则,统计维吾尔文中常用的单词数目,论述维汉机器翻译词典的设计思想,用BNF形式语言和Jackson图描述维汉机器翻译词典应包含的词条信息,最后介绍词典的具体构造方法、词条排序原则、索引表和属性库的数据结构和词典信息的查找方法。试验表明该词典在解决维吾尔语词汇歧义、结构歧义、提高汉语译文准确率等方面较为有效。
Electronic Dictionary is the largest component of a Machine Translation system in the terms of the information it holds.The quality and size of the electronic dictionary limits the scope and coverage of a Machine Translation system.What makes the MT dictionary different from the generic dictionary is that each word contained in MT dictionary should have corresponding information for part of speech,semantics and idioms in which this word occurs.This paper statistics the number of commonly used Uighur words using word frequency statistics and frequency distribution statistics,discusses the design principal,describes the structure of the word information contained in the MT dictionary using BNF formation and Jackson chart,finally introduces construction and data structure,word sorting method,structure of index table and attribute library and search method of an electronic dictionary for Uighur-Chinese MT system.Experimental evidence indicates that this lexicon has obvious efficiency in lexical disambiguation,syntactical disambiguation,and improvement of accuracy of target language translation.
出处
《计算机工程与应用》
CSCD
北大核心
2006年第20期76-78,共3页
Computer Engineering and Applications
基金
国家自然科学基金资助项目(编号:60263004)