期刊文献+

一种基于相似度的汉语语言模型平滑技术及其在音字转换中的应用

A similarity-based smoothing algorithm for Chinese language modeling and its application in pinyin-to-character conversion
下载PDF
导出
摘要 针对汉语语言模型中的数据稀疏问题,利用词语语义信息,将词语相似度同back-off平滑技术相结合,提出一种基于词语相似度的汉语语言模型平滑技术,并且设计了一种能够自动优化模型中各项参数的迭代算法,最后,将这种平滑技术由低阶语言模型推广到高阶语言模型中,将上述技术应用到音字转换领域。实验表明,这项技术使语言模型的性能获得了较大的提高,并有效地降低了音字转换系统的错误率。 By using word semantic information, this paper introduces a similarity-based smoothing algorithm for Chinese language modeling which combines word similarity calculation with back-off smoothing method, and presents an iterative method to optimize the parameters in the algorithm. Furthermore, the similarity-based smoothing algorithm is extended from low-level language model to high-level model. By applying the method to Pinyin-to-Chamcter conversion system, the experiment shows that the method improves the performance of language model significantly and reduces the error rate of pinyin-to-character conversion system effectively.
出处 《高技术通讯》 CAS CSCD 北大核心 2006年第2期127-132,共6页 Chinese High Technology Letters
基金 国家自然科学基金(60435020)和863计划(2002AA117010-09)资助项目.
关键词 数据稀疏 语言模型 平滑 音字转换 知网 data sparseness, language model, smoothing, pinyin-to-character conversion, hownet
  • 相关文献

参考文献12

  • 1Jelinek F.Self-organized language modeling for speech recognition.Readings in Speech Recognition.San Mateo:Morgan kaufmann Publishers,1991.450-506
  • 2Rohini S,Charlotte B.Combining statistical and syntactic methods in recognizing handwritten sentences.In:AAAI Symposium:Probabilistic Approaches to Natural Language,1992.121-127
  • 3Peter F B,Stephen A D P,Vincent J D P,et al.The mathematics of statistical machine translation:parameter estimation.Computational Linguistics,1993,19(2):263-311
  • 4徐志明,王晓龙,姜守旭.一种语句级汉字输入技术的研究[J].高技术通讯,2000,10(1):51-55. 被引量:14
  • 5Joshua T G.A bit of progress in language modeling.Computer Speech & Language,2001,15(4):403-434
  • 6Irving J G.The population frequencies of species and the estimation of population parameters.Biometrika,1953,40:237-264
  • 7Jelinek F,Mercer R L.Interpolated estimation of markov source parameters from sparse data.In:Proceedings of the Workshop on Pattern Recognition in Practice,Amsterdam,1980:381-397
  • 8Slava M K.Estimation of probabilities from sparse data for the language model component of a speech recognizer.IEEE Transactions on Acoustics,Speeech and Signal Processing,1987,35(3):400-401
  • 9Reinhard K,Hermann N.Improved backing-off for m-gram language modeling.In:Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing,1995,1:181-184
  • 10Essen U,Volker S.Coocurrence smoothing for stochastic langunge modeling.In:Proceedings of International Coference on Acoustic,Speech and Signal Processing,1992:161-164

二级参考文献1

共引文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部