统计自然语言处理中的线性插值平滑技术被引量：4

Linear Interpolated Methods in Statistical Natural Language Processing

下载PDF

导出

摘要统计自然语言处理中,一个很复杂的问题是数据稀疏问题。主要有两种平滑方法解决:回退法和线性插值法。本文分析和比较了几种典型的线性插值方法,着重研究了它们所引发的词性聚类倾向。在此基础上,给出了2种改进的平滑方法。实验表明,改进的方法比原来的方法有更出色的平滑效果。 One of the complicated problems in statistical natural language processing is the data-sparseness problem. There are mainly two kinds of smoothing technologies to solve it, backing-off models and linear interpolated models. This article compares several typical linear interpolated methods, and focuses on studying the relationship between the smoothing parameters and the parts of speech. Besides, two improved methods are proposed. Our experiment results show that both of them outperform original ones.

作者张敬芝高强耿桦潘金贵

机构地区南京大学计算机软件新技术国家重点实验室

出处《计算机科学》 CSCD 北大核心 2007年第6期223-225,244,共4页 Computer Science

关键词统计语言模型数据稀疏问题平滑技术回退法线性插值法 N-GRAM Statistical language model,Data sparse problem, Smoothing technology,Backing- off methods,Linear interpolated methods, N-gram

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献10

1Chen S F,Goodman J.An Empirical Study of Smoothing Techniques for Language Modeling:[Technical Report TR-10-98].Computer Science Group,Harvard University,1998
2Gale W A,Church K W.What's wrong with adding one? In:N.Oostdijk,P.de Haan,eds.Corpus-Based Research into Language.Rodolpi,Amsterdam,1994
3Gale W A,Sampson G.Good-Turing frequency estimation without tears.In:Journal of Quantitative Linguistics,1995,2 (3):217～237
4Church K W,Gale W A.A comparison of the enhanced GoodTuring and deleted estimation methods for estimating probabilities of English bigrams.Computer Speech and Language,1991,5 (1):19～54
5Chen S F.Building probabilistic models for natural language.Harvard University,Cambridge,MA,1996
6Jelinek F,Mercer E L.Interpolated Estimation of Markov Source Parameters from Sparse Data.In:D.Gelsema and L.Kanal,eds.Pattern Recognition in Practice.North-Holland,1980
7Katz S.Estimation of probabilities from sparse data for the lan guage model component of a speech recognizer.IEEE ASSP,1997,35 (3):400～401
8Kneser R,Ney H.Improved Backing-off for M-Gram Language Modeling.In:Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing,1995,1:181～184
9Ney H,Essen U.On smoothing Techniques for Bigram-based Natural Language Modelling.In:Proceedings of the IEEE 1991 International Conference on Acoustic,Speech,and Signal Processing,Toranto,1991.251～258
10Manning C D,Schutze H.统计自然语言处理基础.苑春法,等译.电子工业出版社,2005

同被引文献23

1孙晋文,肖建国.基于SVM文本分类中的关键词学习研究[J].计算机科学,2006,33(11):182-184. 被引量：12
2高友福.语音的线性预测分析原理与算法[J].长江工程职业技术学院学报,2006,23(4):54-57. 被引量：3
3王卫玲,刘培玉,初建崇.一种改进的基于条件互信息的特征选择算法[J].计算机应用,2007,27(2):433-435. 被引量：23
4刘华.基于关键短语的文本分类研究[J].中文信息学报,2007,21(4):34-41. 被引量：14
5Joaehims T. A probabilistic analysis of the Roeehio algorithm with TFIDF for text categorization [ C ]//Proceedings of the Fourteenth International Conference on Machine Learning. 1997 : 143-151.
6Mladenic D. Machine Learning on Non-homogeneous, Distributed Text Data Mining[ D ]. Doctoral Dissertation:University of Ljubljana, 1998.
7Rosenfeld R. A maximum entropy to adaptive statistical language learning[ J ]. Computer Speech and Language, 1996, 10( 3 ) : 187-228.
8Yang Y,Pederson J O. A comparative study on feature selection in text categorization [ C]//Proceedings of the Fourteenth International Conference on Machine Learning. 1997,412-420.
9Woosung Kim, Sanjeev Khudanpur. Smoothing issues in the structured language model [ C]//Proc. 7th European Conf on Speech Communication and Technology. 2001:717-720.
10Kneser R, Ney H. hnproved backing-off for m-gram language modeling[ C]//Proc. ICASSP'95. 1995:181-184.

引证文献4

1赵敏涯.结合语言模型的自动文本分类的应用研究[J].计算机与现代化,2010(3):141-143.
2崔羽,蒙鑫,杨凡.基于C#利用概率法计算π值的研究[J].电子技术与软件工程,2017(6):140-140.
3艾山.吾买尔,早克热.卡德尔,买合木提.买买提,吐尔根.伊布拉音.基于C#的语言模型计算工具[J].电脑知识与技术,2013,9(11X):7590-7592. 被引量：2
4吴鹏,赵风海,黄洋.一种结合线性预测倒谱法和组合滑动窗口平滑法的基音周期估计改进算法[J].南开大学学报（自然科学版）,2019,52(2):29-33. 被引量：5

二级引证文献7

1张志强,张太红,吴倩,于志敏.基于语言模型的一种音字转换高效解码算法[J].智能计算机与应用,2016,6(5):38-41.
2张志强,张太红,董峦.一种基于词树的高效解码算法[J].计算机技术与发展,2017,27(8):43-46.
3郭琪雯,陈福恩.基于变步长LMS减噪的基音检测改进算法[J].计算机工程与设计,2020,41(10):2832-2837. 被引量：1
4张小勇,张国军,尚珍珍,王帅.用于单矢量水听器方位估计的加权直方图法[J].水下无人系统学报,2021,29(2):164-169. 被引量：4
5牟莉,佘石豪,孟玉茹.基于主体-延伸法的基音周期检测改进算法[J].西安工程大学学报,2022,36(1):121-128. 被引量：5
6张亚州,张海龙,张萌,王杰,冶鑫晨,王万琼,李嘉,杜旭.基于模板的瞬时RFI特征识别算法初探[J].天文研究与技术,2022,19(5):479-486.
7沈昭仰,师占群,甄冬,张浩,乔国朝.基于ES-ALPF的行星齿轮箱故障特征提取方法研究[J].河北工业大学学报,2024,53(1):28-34.

1胡守云.一种提高实时软件运行效率的技术—软件平滑技术[J].新兴科技,1990(3):47-49.
2丁承君,张明路,高铁红.采用线性插值法改进模糊控制器控制特性的研究[J].河北工业大学学报,1999,28(6):8-12. 被引量：2
3蒋先刚,许伦伦,赵莹.基于三维各向异性扩散的图像平滑及三维重构效果分析[J].华东交通大学学报,2010,27(3):78-82. 被引量：2
4刘桂芬.线性插值法查实验曲线软件设计[J].东北电力技术,1998(9):55-57. 被引量：1
5梁小利,孙洪淋.基于线性插值算法的图像缩放及实现[J].长沙通信职业技术学院学报,2008,7(2):49-51. 被引量：12
6姜会亮,郭振民,胡学龙.数字图像处理中几种平滑技术的研究比较[J].现代电子技术,2004,27(8):80-81. 被引量：9
7司广涛,李培峰,朱巧明,李军辉.基于最大熵模型的邮件过滤系统研究[J].计算机工程与应用,2006,42(32):119-121.
8柯翔敏,戴意瑜.服务器虚拟化技术应用研究[J].山东工业技术,2015(9):168-169.
9刘凤晨,刘庆文,胡玥,黄河.n-Gram/2L索引结构的存储与时间优化算法[J].计算机工程与应用,2008,44(5):180-183. 被引量：2
10吴开兴,沈志佳.N/2帧回退与十字切片相结合的镜头边界检测算法[J].科学技术与工程,2014,22(4):250-254.

计算机科学

2007年第6期

浏览历史

内容加载中请稍等...

统计自然语言处理中的线性插值平滑技术被引量：4

参考文献10

同被引文献23

引证文献4

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

统计自然语言处理中的线性插值平滑技术 被引量：4

参考文献10

同被引文献23

引证文献4

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

统计自然语言处理中的线性插值平滑技术被引量：4