摘要
通过对语料库进行依存语法分析可以得出汉语中词与词之间的在语法和语义上的依存关系,再运用统计学方法可以计算出在两个汉语有序词对依存的概率,将此概率应用于拼音输入过程,使拼音输入法生成在语法和语义上更加合理的候选句。该方法与广泛运用的N元文法语言模型结合起来,可以在一定程度上改善N元文法语言模型无法描述远距离词与词之间关系的不足,从而使拼音整句输入的正确率得到14.68%提高。
The grammatical and semantic relationship between two Chinese words could be retrieved by dependency grammar analysis, even when the two words are far from each other in a sentence. By integrating this relationship in the traditional N-gram language model, a new combined model could be made to help Chinese input system generate better candidate sentences. Experiments show that the new model could overcome the shortcoming of N-gram model that it could only describe the word pairs being less than N words apart, thus improving sentence right rate of the system by 14.68%.
出处
《通信技术》
2013年第3期83-86,共4页
Communications Technology
基金
国家自然科学基金(批准号:61272441)
关键词
依存概率
远距离语言模型
拼音输入法
dependency probability
long-distance language model' pinyin input