摘要
中文输入法是中文信息处理的难题之一。随着互联网上中文用户的不断增加,中文输入法的重要性也变得日益突出。本文在对句子中长距离词汇依赖现象观察的基础上,抽取出语料库中的词汇搭配来获取长距离特征,并以此构建基于词汇搭配关系的拼音输入法系统;同时将词汇搭配的思想应用到拼音输入法的用户模型中,从而使我们的输入法系统能够辅助用户更加有效的输入。实验表明基于词汇搭配关系的改进方法对提高输入法的准确率有积极的作用。
Chinese input method is one of the key challenges in Chinese information processing. With the rapidly increase of the number of Chinese web surfers, the efficiency of the Chinese input method has becomes more and more important. Based on observations of the long-term dependencies in sentences, we implemented a collocation-based pinyin input system by using the collocations we extracted from large-scale corpus. This system has the ability to capture the long-term word collocations. The idea is further introduced into our personalization module of our Pinyin system to help the user input Chinese more efficiently. The experiment results show the methods we propose in this paper are promising.
出处
《中文信息学报》
CSCD
北大核心
2007年第4期105-110,共6页
Journal of Chinese Information Processing
关键词
计算机应用
中文信息处理
中文输入法
中文信息处理
统计语言模型
词汇搭配
长距离特征
用户模型
computer application
chinese information processing
Chinese input method
Chinese information processing
statistics language model
collocations
long-term dependence
user model