期刊文献+

基于Python自然语言处理工具包在语料库研究中的运用 被引量:8

The Application of NLTK Toolkit Based on Python in Corpus Research
下载PDF
导出
摘要 国内当前以语料库为基础的研究,在研究工具方面,多以Ant Conc、Power GREP为主,使用Python语言NLTK包进行数据处理分析的研究较少,限于软件自身设计,不能灵活地对研究方法提供支持。在研究中使用Python语言的NLTK处理包,使数据有了统一标准,避免了各类文字处理转换的麻烦,同时也弥补了Range等工具在句法分析、图形绘制、正则表达式检索等方面的缺憾。针对语料库研究的中文本分词、词形归并、文本检索统计等主要环节,简要介绍Python语言的NLTK自然语言处理包在语料库研究中的运用,并以古腾堡语料库中的简·奥斯丁小说《艾玛》为例,说明如何运用该自然语言处理包对语料进行加工处理。 According to the current domestic corpus based study,AntConc and PowerGREP are the main research tool.Few studies were done using the Python language NLTK packet for data processing and a-nalysis.It can not provide support to the research methods due to the design defect of the software.The Python language NLTK handling package was used in the study so that the data have uniform standards, avoiding the conversion of various types of word processing workshop trouble.It also makes up for the weakness of the range tool such as syntactic analysis,graphic,regular expression search etc.In this pa-per,it was briefly introduced that the application of NLTK processing package based on Python in corpus research.Then it takes the novel Emma written by Austen in Gutenberg corpus as an example to explain how to use the natural language processing to process the data.
作者 刘旭
出处 《昆明冶金高等专科学校学报》 CAS 2015年第5期65-69,93,共6页 Journal of Kunming Metallurgy College
关键词 PYTHON NLTK工具包 语料库研究 Python NLTK toolkit corpus research
  • 相关文献

参考文献4

  • 1BIBERD,CONRADS,REPPENR.Corpuslinguistics:investigatinglanguagestructureanduse[M].Cambridge:CambridgeUniversityPress,1998.
  • 2BIRDS,KLEINE,LOPERE.Naturallanguageprocessingwithpython[M].NewYork:OReillyMediaPress,2009.
  • 3PERKINSJ.PythontextprocessingwithNLTK2.0cookbook:Liteedition[M].Birmingham:PacktPublishingLtd,2011.
  • 4严华,王立非.PowerGREP与语料库加工[J].外语电化教学,2010(3):57-62. 被引量:7

二级参考文献8

  • 1孙海燕,陈永捷.中国英语学习者名词类联接的发展特征——基于赋码语料库的研究[J].外语教学与研究,2006,38(4):272-278. 被引量:27
  • 2Biber, D, et al. Corpus linguistics [ M ]. Cambridge : Cambridge University Press, 1998.
  • 3Burnard, L. Users reference guide: British National Corpus [ M ]. Oxford : Oxford University Press, 1995.
  • 4Hunston, S. Corpora in applied linguistics [ M ]. Cambridge: Cambridge University Press ,2002.
  • 5Leech, G. & Smith, N. BNC2 POS-tagging Manual [ WE/ OL] ,2000 < www. natcorp, ox. ac. uk/docs/bnc2guide, htm >.
  • 6Meyer, C. English corpus linguistics: an introduction[M]. Cambridge : Cambridge University Press ,2004.
  • 7谈言玲.A Corpus-Based Study of Modal Verbs can, could, may, might in Chinese Learners' Spoken English [ A ]. Unpublished MA Thesis. Yangzhou University, 2007.
  • 8严华. A Corpus - Based Study of "so" in Chinese Learners' Spoken and Written English [ A ]. Unpublished MA Thesis. Nanjing University, 2006.

共引文献6

同被引文献51

引证文献8

二级引证文献51

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部