语言建模中最小化样本风险算法的研究和改进

A Study and Improvement of Minimum Sample Risk Methods for Language Modeling

下载PDF

导出

摘要目前,一些主流的判别学习算法只能优化光滑可导的损失函数,但在自然语言处理(natural language processing,简称NLP)中,很多应用的直接评价标准(如字符转换错误数(character error rate,简称CER))都是不可导的阶梯形函数.为解决此问题,研究了一种新提出的判别学习算法——最小化样本风险(minimum sample risk,简称MSR)算法.与其他判别训练算法不同,MSR算法直接使用阶梯形函数作为其损失函数.首先,对MSR算法的时空复杂性作了分析和提高;同时,提出了改进的算法MSR-II,使得特征之间相关性的计算更加稳定.此外,还通过大量领域适应性建模实验来考察MSR-II的鲁棒性.日文汉字输入实验的评测结果表明:(1)MSR/MSR-II显著优于传统三元模型,使错误率下降了20.9%;(2)MSR/MSR-II与另两类主流判别学习算法Boosting和Perceptron表现相当;(3)MSR-II不仅在时空复杂度上优于MSR,特征选择的稳定性也更高;(4)领域适应性建模的结果证明了MSR-II的良好鲁棒性.总之,MSR/MSR-II是一种非常有效的算法.由于其使用的是阶梯形的损失函数,因此可以广泛应用于自然语言处理的各个领域,如拼写校正和机器翻译. Most existing discriminative training methods adopt smooth loss functions that could be optimized directly. In natural language processing （NLP）, however, many applications adopt evaluation metrics taking a form as a step function, such as character error rate （CER）. To address the problem, a newly-proposed discriminative training method is analyzed, which is called minimum sample risk （MSR）. Unlike other discriminative methods, MSR directly takes a step function as its loss function. MSR is firstly analyzed and improved in time/space complexity. Then an improved version MSR-Ⅱ is proposed, which makes the computation of interference in the step of feature selection more stable. In addition, experiments on domain adaptation are conducted to investigate the robustness of MSR-Ⅱ. Evaluations on the task of Japanese text input show that：（1） MSR/MSR-Ⅱ significantly outperforms a traditional trigram model, reducing CER by 20.9%; （2） MSR/MSR-Ⅱ is comparable to the other two state-of-the-art discriminative algorithms, Boosting and Perceptron; （3） MSR-Ⅱ outperforms MSR not only in time/space complexity but also in the stability of feature selection; （4） Experimental results of domain adaptation show the robustness of MSR-Ⅱ. In all, MSR/MSR-Ⅱ is a quite effective algorithm. Given its step loss function, MSR/MSR-Ⅱ could be widely applied to many fields of NLE such as spelling check and machine translation.

作者袁伟高剑峰步丰林

机构地区上海交通大学计算机科学与工程系 Natural Language Processing Group

出处《软件学报》 EI CSCD 北大核心 2007年第2期196-204,共9页 Journal of Software

关键词语言建模判别训练算法输入法编辑器最小化样本风险领域适应性建模 language modeling discriminative training method input method editor minimum sample risk domain adaptation modeling

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1于浩,高剑峰,步丰林.一种新的语言模型判别训练方法[J].计算机学报,2005,28(10):1708-1715. 被引量：2

二级参考文献13

1Collins Michael, Koo Terry. Discriminative reranking for natural language parsing. 2002, to appear.
2Jelinek Fred. Self-organized language modeling for speech recognition. In: Waibel A., Lee K.F. eds.. Speech Recognition, San Mateo, CA: Morgan-Kaufmann, 1990, 450～506.
3Gao Jian-Feng, Hisamini Suzuki,Yang Wen. Exploiting headword dependency and predictive clustering for language modeling. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, USA, 2002,248～256.
4Collins Michael. Parameter estimation for statistical parsing models: Theory and practice of distribution-free methods. In: Harry Bunt, John Carroll, Giorgio Satta eds.. New Developments in Parsing Technology, Kluwer, 2004.
5Gao Jian-Feng, Hisamini Suzuki. Capturing long distance dependency in language modeling: An empirical study. In: Proceedings of FIJCNP-2004, Sanya, China, 2004, 200～208.
6Gao Jian-Feng, Joshua Goodman, Li Ming-Jin, Lee Kai-Fu. Toward a unified approach to statistical language modeling for Chinese. ACM TALIP, 2002, 1(1): 3～33.
7Juang Biing-Hwang, Wu Chou, Lee Chin-Hui. Minimum classification error rate methods for speech recognition. IEEE Transactions on Speech and Audio Processing. 1997, 5(3): 257～265.
8Duda Richard O., Hart Peter E., Stock David G.. Pattern Classification. John Wiley & Sons, Inc., 2001.
9Hoffgen Klauss U., van Horn Kevin S., Simon Hans U.. Robust trainability of single neurons. Journal of Computer and System Sciences, 1995, 50(1): 114～125.
10Och Franz. Minimum error rate training in statistical machine translation. ACL 2003.

共引文献1

1付慧,刘峡壁,贾云得.基于最大-最小相似度学习方法的文本提取[J].软件学报,2008,19(3):621-629. 被引量：1

1余绍华,范键夫.Windows NT4.0中五笔字型输入法的安装[J].微电脑世界,1998(18):65-65.
2金西.中文Windows3．1输入法编辑器及其应用[J].微计算机信息,1995,11(3):51-54. 被引量：1
3焦立名.Web Services应用适可性分析和提高性能方法[J].西部教育研究（内江）,2009,9(2):26-28.
4郑飞,陆鑫达.异构计算系统的三元模型与性能分析[J].上海交通大学学报,1997,31(8):12-15. 被引量：3
5储荷婷.语义网与信息检索[J].图书情报知识,2009,26(1):30-32. 被引量：7
6张亚飞.基于幂次变换和MSR的光照不均图像增强[J].电脑知识与技术,2012,8(8):5456-5458. 被引量：4
7郑咸义,段雄林.产生式系统的一种新匹配算法[J].华南理工大学学报（自然科学版）,1995,23(5):71-74.
8中文Windows3.1输入法编辑器及其应用[J].计算机技术,1995(1):45-48.
9吕继兴,蒋文科,臧悦利,屈滨.基于Windows IME汉字输入法的实现[J].河北农业大学学报,2003,26(z1):290-292. 被引量：5
10王玉芬,王亚培.Windows下类音输入法的设计[J].河南科技学院学报,2009,37(2):68-70.

软件学报

2007年第2期

浏览历史

内容加载中请稍等...

语言建模中最小化样本风险算法的研究和改进

参考文献1

二级参考文献13

共引文献1

相关作者

相关机构

相关主题

浏览历史