计算语言学中的语言模型被引量：6

Language Model in Computational Linguistics

导出

摘要计算语言学中的语言模型可以分为基于规则的语言模型、基于统计的语言模型、基于神经网络的语言模型三种类型。基于规则的语言模型主要有短语结构语法模型和依存语法模型,此类语言模型在某些"子语言"的计算语言学应用系统中获得了一定的成功,但用它们来处理真实文本仍有很大的困难。基于统计的语言模型十分重视统计在模型构建中的作用,语言学知识主要使用概率和统计的计算从大规模真实的语料库中获取,这样获得的知识能够更加全面、准确地反映自然语言的真实面貌,因此,基于统计的语言模型在计算语言学中广泛地流行开来。21世纪以来出现了基于神经网络的语言模型,该模型比基于统计的语言模型更胜一筹,占据了当前自然语言处理研究的主流地位。 In computational linguistics,to directly process natural languages by computer,we need to formalize the linguistic problem mathematically,represent it by algorithm,and establish the language model.The language model is an abstract formal system of objective language.The study of language models has a great theoretical significance and application value for computational linguistics.There are three language models in computational linguistics:rule-based language model,statistics-based language model,and neural-network-based language model.The rule-based language model mainly includes phrase structure grammar and dependency grammar.Based on the phrase structure grammar,computational linguists proposed recursive transition network,augmented transition network,top-down parsing,bottom-up parsing,general syntactic processor,chart parsing,leftcorner parsing,CYK parsing,Earley algorithm,Tomita algorithm,tree-adjoining grammar,left-associative grammar.Afterward,they proposed complex-featurebased and unification-based language models like lexical functional grammar,functional unification grammar,PATR algorithm,definitive clause grammar,generalized phrase structure grammar,head-driven phrase structure grammar,multiple-branched&multiple-labeled tree model(MMT model),etc.Based on the dependency grammar,computational linguists proposed combinatory category grammar,word grammar,valence grammar,etc.This rule-based language model is successful in some sub-language fields of computational linguistics,but it is very difficult for the model to process large-scale and authentic texts.The statistics-based language model is very successful in the fields of character recognition,speech recognition,speech synthesis,and machine translation.Statistics-based language models include N-gram model,noisy channel model,hidden Markov model,Maximum entropy model,conditional random field model,probabilistic context-free grammar,lexicalized probabilistic contextfree grammar,dynamic programming algorithm,minimum edit distance algorithm,decision tree model,weighted automata,Viterbi algorithm,forward algorithm,forward-backward algorithm,etc.These statistical language models all place great emphasis on the role of statistics in their construction,and linguistic knowledge is mainly obtained from large-scale authentic corpora using probabilistic and statistical approaches so that the knowledge obtained is more comprehensive and accurate in reflecting the true aspects of natural language.Statistical models are becoming widely popular in computational linguistics.Since the 21 st century,the neural network model has been the mainstream of natural language processing.In a neural network language model,the context of a word is represented in terms of the word vector.Representing the context of a word in terms of a word vector,rather than by a precise,concrete word as in traditional rulebased language model and statistical language model,allows the neural network language model to generalize“unseen data”,which is superior to traditional rule-based language model and statistical language model.

作者冯志伟丁晓梅 FENG Zhiwei;DING Xiaomei(Shandong Key Laboratory of Language Resources Development and Application,Ludong University,Yantai,Shandong 264026,China;Dalian Maritime University,Dalian,Liaoning 116026,China)

机构地区鲁东大学文学院大连海事大学外国语学院

出处《外语电化教学》 CSSCI 北大核心 2021年第6期17-24,3,共9页 Technology Enhanced Foreign Language Education

基金国家社会科学基金项目“基于平行语料库的俄汉语言学术语词典编纂研究”(项目编号:17BYY220)的阶段性成果。

关键词计算语言学语言模型基于规则的语言模型基于统计的语言模型基于神经网络的语言模型 Computational Linguistics Language Model Rule-Based Language Model Statistics-Based Language Model Neural-Network-Based Language Model

分类号 H319.3 [语言文字—英语]

引文网络
相关文献

参考文献8

1冯志伟.机器翻译与人工智能的平行发展[J].外国语,2018,41(6):35-48. 被引量：81
2冯志伟.基于短语和句法的统计机器翻译[J].燕山大学学报,2015,39(6):546-554. 被引量：20
3冯志伟.罗塞塔石碑与机器翻译[J].外语学刊,2020,0(1):1-17. 被引量：6
4冯志伟.神经网络、深度学习与自然语言处理[J].上海师范大学学报（哲学社会科学版）,2021,50(2):110-122. 被引量：21
5冯志伟.生成词向量的三种方法[J].外语电化教学,2021(1):18-26. 被引量：7
6冯志伟.词向量及其在自然语言处理中的应用[J].外语电化教学,2019(1):3-11. 被引量：25
7冯志伟,李颖.自然语言处理中的预训练范式[J].外语研究,2021,38(1):1-14. 被引量：13
8冯志伟.隐马尔可夫模型及其在自动词类标注中的应用[J].燕山大学学报,2013,37(4):283-298. 被引量：4

二级参考文献34

1冯洋,邵晨泽.神经机器翻译前沿综述[J].中文信息学报,2020(7):1-18. 被引量：33
2冯志伟.语言学正面临战略转移的重要时刻[J].南开语言学刊,2013(1):7-19. 被引量：3
3冯志伟.机器翻译——从梦想到现实[J].中国翻译,1999(5):52-55. 被引量：8
4冯志伟.机器翻译——从梦想到现实[J].中国翻译,1999(4):38-41. 被引量：41
5刘群.机器翻译技术现状与展望[J].集成技术,2012,1(1):48-54. 被引量：16
6Markov A A. Essai d'une recherche statistique sur le texte du roman "Ougene Onegin" illustrant la liaison des epreuve en chain [J]. Bulletin del' Academie Imptriale des Sciences de St-Pttersbourg, 1913,7,153-162.
7Baum L E, Petrie T, Soules G, et al.. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains [J]. Annals of Mathematical Statistics, 1970,41 (1): 164-171.
8Jurafsky D, Martin J. Speech and Language Processing: An In-troduction to Natural Language Processing, Speech Recognition, and Computational Linguistics [M]. 2nd edition. Pearson Prentice Hall, 2009.
9Eisner J. An interactive spreadsheet for teaching the forward-back- ward algorithm [C] //Proceedings of the ACL-02 Workshop on Ef- fective tools and methodologies for teaching natural language pro- cessing and computational linguistics, Philadelphia, 2002:10-18.
10Rabiner L R. A tutorial on hidden Markov models and selected ap- plications in speech recognition [J]. Proceedings of the IEEE, 1989,77 (2): 257-286.

共引文献148

1熊璨.论人工智能翻译的可能性——从翻译的三个层次看非文学与文学翻译[J].中外文化与文论,2020(2):106-115. 被引量：2
2邓海龙.Python词向量训练与应用技术解析[J].语料库语言学,2019,0(2):88-109.
3陈思宇.人工智能背景下机器翻译在不同文本中的对比分析研究[J].现代英语,2023(19):111-114.
4梁慧莹.机器翻译与人工翻译的碰撞与融合[J].现代英语,2021(19):34-36.
5王湘玲,陈广姣,周祥艳.国际机器翻译译后编辑认知研究路线图(2011—2021)[J].外国语,2023,46(5):90-100. 被引量：1
6周胜男.人工智能背景下机器翻译在不同文本中的应用与思考[J].科教导刊,2022(7):39-41.
7郑洋敏,陆敏.机器翻译的发展及其对翻译行业的影响[J].汉字文化,2022(S02):277-278. 被引量：1
8殷健,陶李春,冯志伟.“大外语”的“范式革命”与外语研究方法论创新——冯志伟教授访谈录[J].外语教学理论与实践,2022(1):8-14. 被引量：3
9戴祥,闫明.一种运用SPSS分析主轴密封供水流量趋势的方法[J].云南电业,2023(8):27-32.
10王传玺,程西江.西藏满那堆石坝主料场硐室爆破[J].爆破,2000,17(2):45-49.

同被引文献51

1冯洋,邵晨泽.神经机器翻译前沿综述[J].中文信息学报,2020(7):1-18. 被引量：33
2袁明光.语文建设的新课题——电脑写作[J].语文建设,1992(1):37-38. 被引量：2
3余谋昌.走出人类中心主义[J].自然辩证法研究,1994,10(7):8-14. 被引量：237
4冯志伟.基于语料库的机器翻译系统[J].术语标准化与信息技术,2010(1):28-35. 被引量：32
5冯志伟.计算语言学的历史回顾与现状分析[J].外国语,2011,34(1):9-17. 被引量：40
6俞建梁.国外FOXP2基因及其语言相关性研究二十年[J].现代外语,2011,34(3):310-316. 被引量：10
7冯志伟.机器翻译与人工智能的平行发展[J].外国语,2018,41(6):35-48. 被引量：81
8李业刚,黄河燕,史树敏,冯冲,苏超.多策略机器翻译研究综述[J].中文信息学报,2015,29(2):1-9. 被引量：21
9冯志伟.基于短语和句法的统计机器翻译[J].燕山大学学报,2015,39(6):546-554. 被引量：20
10崔启亮,雷学发.基于文本分层的人机交互翻译策略[J].当代外语研究,2016(3):46-52. 被引量：32

引证文献6

1冯志伟,丁晓梅.自然语言处理中的神经网络模型[J].当代外语研究,2022(4):98-110. 被引量：8
2李佐文,梁国杰.语言智能学科的内涵与建设路径[J].外语电化教学,2022(5):88-93. 被引量：7
3戴光荣,刘思圻.神经网络机器翻译:进展与挑战[J].外语教学,2023,44(1):82-89. 被引量：7
4冯志伟,张灯柯.GPT与语言研究[J].外语电化教学,2023(2):3-11. 被引量：18
5冯志伟,张灯柯.语言模型与人工智能[J].外语研究,2024,41(1):1-19. 被引量：4
6杨逸云,宋时磊.人工智能写作的历史回顾、概念界定及研究展望[J].湖北社会科学,2024(4):38-46.

二级引证文献42

1江琰.ChatGPT对高校外语教学的影响及应对策略[J].现代英语,2023(24):46-48.
2郭燕妮,谢雨馨.ChatGPT应用背景下人机协同国际中文汉字教学设计实践研究[J].汉字文化,2024(1):158-163.
3张海生.人工智能赋能学科建设:解释模型与逻辑解构[J].高校教育管理,2023,17(3):42-50. 被引量：3
4周建设,薛嗣媛.论语言智能教育[J].语言战略研究,2023,8(4):30-43. 被引量：3
5贺承浩,王泽辉,滕俊哲,王博,彭家凯,李奕欣.机器翻译综述[J].电脑知识与技术,2023,19(21):31-34. 被引量：1
6陈智超,汪国强,李飞,杨昭.基于Bi-LSTM与多尺度神经网络模型的番茄病害识别[J].江苏农业科学,2023,51(15):194-203. 被引量：2
7傅琳凌,刘磊.基于CiteSpace的机器翻译研究可视化分析[J].黑龙江科学,2023,14(15):1-5.
8王世杰,刘峰.大型语言模型对会计未来发展的影响[J].财务研究,2023(4):40-49.
9张凯歌,卢志刚,聂天常,李志伟,郭宇强.面向无人装备的智能边缘计算软技术分析[J].兵工学报,2023,44(9):2611-2621. 被引量：1
10高玉霞,任东升.生成式AI时代翻译制度建设的挑战与对策[J].外语电化教学,2023(4):9-15. 被引量：1

1仓正易,刘雪莹(指导).概率统计问题两则[J].高中数学教与学,2021(8):45-45.
2张洋,江铭虎.作者识别研究综述[J].自动化学报,2021,47(11):2501-2520. 被引量：2
3黄媛媛,陈莉萍.基于使用与规则的英语致使交替动词低被动和过度致使研究[J].外语与外语教学,2021(3):82-91. 被引量：1
4宋贝贝,王意颖.合成词的语义透明度:理论与实证[J].对外汉语研究,2020(1):121-133.
5曹宪,赵雪昆.基于统计机器学习算法的隐私保护在数据发布与数据挖掘中的应用分析[J].计算机应用文摘,2022,38(4):81-83. 被引量：1
6本刊记者.沉痛悼念本刊编委俞士汶先生[J].语言科学,2022,21(1):95-95.
7张晓嵩,李一川.计量经济学在宏观经济发展中的运用探析[J].商业文化,2022(6):41-42. 被引量：1
8孙畅.试论高中生数学统计推理能力及学生发展目标[J].进展,2021(17):43-44.
9刘霁.基于统计回归方法的综合英语与英语阅读课程期末成绩的相关分析及影响因素研究[J].海外英语,2022(3):17-19.
10何伟,沈维.理论语言学知识体系创新路径探讨[J].中国外语,2021,18(6):23-29.

外语电化教学

2021年第6期

浏览历史

内容加载中请稍等...

计算语言学中的语言模型被引量：6

参考文献8

二级参考文献34

共引文献148

同被引文献51

引证文献6

二级引证文献42

相关作者

相关机构

相关主题

浏览历史

计算语言学中的语言模型 被引量：6

参考文献8

二级参考文献34

共引文献148

同被引文献51

引证文献6

二级引证文献42

相关作者

相关机构

相关主题

浏览历史

计算语言学中的语言模型被引量：6