摘要
针对汉语语言模型中的数据稀疏问题,利用词语语义信息,将词语相似度同back-off平滑技术相结合,提出一种基于词语相似度的汉语语言模型平滑技术,并且设计了一种能够自动优化模型中各项参数的迭代算法,最后,将这种平滑技术由低阶语言模型推广到高阶语言模型中,将上述技术应用到音字转换领域。实验表明,这项技术使语言模型的性能获得了较大的提高,并有效地降低了音字转换系统的错误率。
By using word semantic information, this paper introduces a similarity-based smoothing algorithm for Chinese language modeling which combines word similarity calculation with back-off smoothing method, and presents an iterative method to optimize the parameters in the algorithm. Furthermore, the similarity-based smoothing algorithm is extended from low-level language model to high-level model. By applying the method to Pinyin-to-Chamcter conversion system, the experiment shows that the method improves the performance of language model significantly and reduces the error rate of pinyin-to-character conversion system effectively.
出处
《高技术通讯》
CAS
CSCD
北大核心
2006年第2期127-132,共6页
Chinese High Technology Letters
基金
国家自然科学基金(60435020)和863计划(2002AA117010-09)资助项目.
关键词
数据稀疏
语言模型
平滑
音字转换
知网
data sparseness, language model, smoothing, pinyin-to-character conversion, hownet