一种基于相似度的汉语语言模型平滑技术及其在音字转换中的应用

A similarity-based smoothing algorithm for Chinese language modeling and its application in pinyin-to-character conversion

下载PDF

导出

摘要针对汉语语言模型中的数据稀疏问题，利用词语语义信息，将词语相似度同back-off平滑技术相结合，提出一种基于词语相似度的汉语语言模型平滑技术，并且设计了一种能够自动优化模型中各项参数的迭代算法，最后，将这种平滑技术由低阶语言模型推广到高阶语言模型中，将上述技术应用到音字转换领域。实验表明，这项技术使语言模型的性能获得了较大的提高，并有效地降低了音字转换系统的错误率。 By using word semantic information, this paper introduces a similarity-based smoothing algorithm for Chinese language modeling which combines word similarity calculation with back-off smoothing method, and presents an iterative method to optimize the parameters in the algorithm. Furthermore, the similarity-based smoothing algorithm is extended from low-level language model to high-level model. By applying the method to Pinyin-to-Chamcter conversion system, the experiment shows that the method improves the performance of language model significantly and reduces the error rate of pinyin-to-character conversion system effectively.

作者肖镜辉王晓龙刘秉权

机构地区哈尔滨工业大学计算机科学与技术学院

出处《高技术通讯》 CAS CSCD 北大核心 2006年第2期127-132,共6页 Chinese High Technology Letters

基金国家自然科学基金（60435020）和863计划（2002AA117010-09）资助项目.

关键词数据稀疏语言模型平滑音字转换知网 data sparseness, language model, smoothing, pinyin-to-character conversion, hownet

分类号 TP391.41 [自动化与计算机技术—计算机应用技术] TN912.34 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献12

1Jelinek F.Self-organized language modeling for speech recognition.Readings in Speech Recognition.San Mateo:Morgan kaufmann Publishers,1991.450-506
2Rohini S,Charlotte B.Combining statistical and syntactic methods in recognizing handwritten sentences.In:AAAI Symposium:Probabilistic Approaches to Natural Language,1992.121-127
3Peter F B,Stephen A D P,Vincent J D P,et al.The mathematics of statistical machine translation:parameter estimation.Computational Linguistics,1993,19(2):263-311
4徐志明,王晓龙,姜守旭.一种语句级汉字输入技术的研究[J].高技术通讯,2000,10(1):51-55. 被引量：14
5Joshua T G.A bit of progress in language modeling.Computer Speech & Language,2001,15(4):403-434
6Irving J G.The population frequencies of species and the estimation of population parameters.Biometrika,1953,40:237-264
7Jelinek F,Mercer R L.Interpolated estimation of markov source parameters from sparse data.In:Proceedings of the Workshop on Pattern Recognition in Practice,Amsterdam,1980:381-397
8Slava M K.Estimation of probabilities from sparse data for the language model component of a speech recognizer.IEEE Transactions on Acoustics,Speeech and Signal Processing,1987,35(3):400-401
9Reinhard K,Hermann N.Improved backing-off for m-gram language modeling.In:Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing,1995,1:181-184
10Essen U,Volker S.Coocurrence smoothing for stochastic langunge modeling.In:Proceedings of International Coference on Acoustic,Speech and Signal Processing,1992:161-164

二级参考文献1

1王晓龙,王开铸,李仲荣,白小华.最少分词问题及其解法[J].科学通报,1989,34(13):1030-1032. 被引量：25

共引文献13

1顾平,朱巧明,李培峰,钱培德.智能型汉字数码输入技术的研究[J].中文信息学报,2006,20(4):100-105. 被引量：7
2张玮,孙乐,冯元勇,李文波,黄瑞红.词汇搭配和用户模型在拼音输入法中的应用[J].中文信息学报,2007,21(4):105-110. 被引量：6
3章森.基于混合字词网格的汉语音字转换问题的求解[J].计算机学报,2007,30(7):1145-1153. 被引量：5
4章森,刘磊,刁麓弘.基于混合语言模型的中文智能输入技术[J].北京工业大学学报,2007,33(9):997-1001.
5刘政怡,吴建国,刘慧婷.音节切分歧义方法研究[J].计算机技术与发展,2008,18(8):35-38. 被引量：1
6刘政怡,樊庆林,吴建国,李炜.基于输入法的通用存储结构[J].计算机工程与设计,2008,29(17):4554-4558.
7余衍炳,王轩,刘秉权,遇慧君,孙建国.面向小型移动设备的中文语句输入[J].哈尔滨工业大学学报,2008,40(9):1416-1420. 被引量：1
8刘政怡,吴建国,李炜.基于整句输入法的状态空间模型[J].计算机工程与应用,2008,44(30):153-156. 被引量：2
9张顺昌,孙乐.音字转换中分层解码模型的研究与改进[J].中文信息学报,2009,23(6):79-85. 被引量：2
10王忠建,王悦.面向少按键装置的汉语笔画输入方法[J].哈尔滨商业大学学报（自然科学版）,2010,26(3):329-333.

1曲卫民,张俊林,孙乐.基于主题的汉语语言模型的研究[J].计算机研究与发展,2003,40(9):1368-1374. 被引量：3
2杨琳,张建平,颜永红.特定领域的汉语语言模型平滑算法比较研究[J].计算机工程与应用,2006,42(32):14-16. 被引量：5
3王龙,杨俊安,陈雷,林伟,刘辉.基于循环神经网络的汉语语言模型并行优化算法[J].应用科学学报,2015,33(3):253-261. 被引量：7
4王龙,杨俊安,陈雷,林伟.基于循环神经网络的汉语语言模型建模方法[J].声学技术,2015,34(5):431-436. 被引量：5
5李书豪,陈宇,吕淑宝,张猛治.基于N-gram模型的中文分词前k优算法[J].智能计算机与应用,2016,6(6):31-35. 被引量：5
6王韦华,徐波.汉语语言模型的规模对统计机器翻译系统的影响[J].微计算机信息,2010,26(27):108-109. 被引量：1
7廖盛斌,朱晓亮.IEEE 802.11无线局域网中基于最优窗口的退避算法[J].计算机科学,2012,39(1):82-84. 被引量：1
8张俊林,孙乐,孙玉芳.一种改进的基于记忆的自适应汉语语言模型[J].中文信息学报,2005,19(1):8-13. 被引量：1
9刘秉权,王晓龙,王宇颖.一种多知识源汉语语言模型的研究与实现[J].计算机研究与发展,2002,39(2):231-235. 被引量：8
10王鉴全,季绍波.基于关联规则的自动构词算法研究[J].计算机科学,2014,41(11):256-259. 被引量：3

高技术通讯

2006年第2期

浏览历史

内容加载中请稍等...

一种基于相似度的汉语语言模型平滑技术及其在音字转换中的应用

参考文献12

二级参考文献1

共引文献13

相关作者

相关机构

相关主题

浏览历史