期刊文献+

计算语言学中的语言模型 被引量:6

Language Model in Computational Linguistics
原文传递
导出
摘要 计算语言学中的语言模型可以分为基于规则的语言模型、基于统计的语言模型、基于神经网络的语言模型三种类型。基于规则的语言模型主要有短语结构语法模型和依存语法模型,此类语言模型在某些"子语言"的计算语言学应用系统中获得了一定的成功,但用它们来处理真实文本仍有很大的困难。基于统计的语言模型十分重视统计在模型构建中的作用,语言学知识主要使用概率和统计的计算从大规模真实的语料库中获取,这样获得的知识能够更加全面、准确地反映自然语言的真实面貌,因此,基于统计的语言模型在计算语言学中广泛地流行开来。21世纪以来出现了基于神经网络的语言模型,该模型比基于统计的语言模型更胜一筹,占据了当前自然语言处理研究的主流地位。 In computational linguistics,to directly process natural languages by computer,we need to formalize the linguistic problem mathematically,represent it by algorithm,and establish the language model.The language model is an abstract formal system of objective language.The study of language models has a great theoretical significance and application value for computational linguistics.There are three language models in computational linguistics:rule-based language model,statistics-based language model,and neural-network-based language model.The rule-based language model mainly includes phrase structure grammar and dependency grammar.Based on the phrase structure grammar,computational linguists proposed recursive transition network,augmented transition network,top-down parsing,bottom-up parsing,general syntactic processor,chart parsing,leftcorner parsing,CYK parsing,Earley algorithm,Tomita algorithm,tree-adjoining grammar,left-associative grammar.Afterward,they proposed complex-featurebased and unification-based language models like lexical functional grammar,functional unification grammar,PATR algorithm,definitive clause grammar,generalized phrase structure grammar,head-driven phrase structure grammar,multiple-branched&multiple-labeled tree model(MMT model),etc.Based on the dependency grammar,computational linguists proposed combinatory category grammar,word grammar,valence grammar,etc.This rule-based language model is successful in some sub-language fields of computational linguistics,but it is very difficult for the model to process large-scale and authentic texts.The statistics-based language model is very successful in the fields of character recognition,speech recognition,speech synthesis,and machine translation.Statistics-based language models include N-gram model,noisy channel model,hidden Markov model,Maximum entropy model,conditional random field model,probabilistic context-free grammar,lexicalized probabilistic contextfree grammar,dynamic programming algorithm,minimum edit distance algorithm,decision tree model,weighted automata,Viterbi algorithm,forward algorithm,forward-backward algorithm,etc.These statistical language models all place great emphasis on the role of statistics in their construction,and linguistic knowledge is mainly obtained from large-scale authentic corpora using probabilistic and statistical approaches so that the knowledge obtained is more comprehensive and accurate in reflecting the true aspects of natural language.Statistical models are becoming widely popular in computational linguistics.Since the 21 st century,the neural network model has been the mainstream of natural language processing.In a neural network language model,the context of a word is represented in terms of the word vector.Representing the context of a word in terms of a word vector,rather than by a precise,concrete word as in traditional rulebased language model and statistical language model,allows the neural network language model to generalize“unseen data”,which is superior to traditional rule-based language model and statistical language model.
作者 冯志伟 丁晓梅 FENG Zhiwei;DING Xiaomei(Shandong Key Laboratory of Language Resources Development and Application,Ludong University,Yantai,Shandong 264026,China;Dalian Maritime University,Dalian,Liaoning 116026,China)
出处 《外语电化教学》 CSSCI 北大核心 2021年第6期17-24,3,共9页 Technology Enhanced Foreign Language Education
基金 国家社会科学基金项目“基于平行语料库的俄汉语言学术语词典编纂研究”(项目编号:17BYY220)的阶段性成果。
关键词 计算语言学 语言模型 基于规则的语言模型 基于统计的语言模型 基于神经网络的语言模型 Computational Linguistics Language Model Rule-Based Language Model Statistics-Based Language Model Neural-Network-Based Language Model
  • 相关文献

参考文献8

二级参考文献34

  • 1冯洋,邵晨泽.神经机器翻译前沿综述[J].中文信息学报,2020(7):1-18. 被引量:33
  • 2冯志伟.语言学正面临战略转移的重要时刻[J].南开语言学刊,2013(1):7-19. 被引量:3
  • 3冯志伟.机器翻译——从梦想到现实[J].中国翻译,1999(5):52-55. 被引量:8
  • 4冯志伟.机器翻译——从梦想到现实[J].中国翻译,1999(4):38-41. 被引量:41
  • 5刘群.机器翻译技术现状与展望[J].集成技术,2012,1(1):48-54. 被引量:16
  • 6Markov A A. Essai d'une recherche statistique sur le texte du roman "Ougene Onegin" illustrant la liaison des epreuve en chain [J]. Bulletin del' Academie Imptriale des Sciences de St-Pttersbourg, 1913,7,153-162.
  • 7Baum L E, Petrie T, Soules G, et al.. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains [J]. Annals of Mathematical Statistics, 1970,41 (1): 164-171.
  • 8Jurafsky D, Martin J. Speech and Language Processing: An In-troduction to Natural Language Processing, Speech Recognition, and Computational Linguistics [M]. 2nd edition. Pearson Prentice Hall, 2009.
  • 9Eisner J. An interactive spreadsheet for teaching the forward-back- ward algorithm [C] //Proceedings of the ACL-02 Workshop on Ef- fective tools and methodologies for teaching natural language pro- cessing and computational linguistics, Philadelphia, 2002:10-18.
  • 10Rabiner L R. A tutorial on hidden Markov models and selected ap- plications in speech recognition [J]. Proceedings of the IEEE, 1989,77 (2): 257-286.

共引文献148

同被引文献51

引证文献6

二级引证文献42

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部