期刊文献+

语料库短语序列提取系统的设计与开发 被引量:3

Design and Implementation of the Phraseological Sequence Extraction System from Large-scale Corpora
原文传递
导出
摘要 语料库短语序列提取一直是短语学研究的关键技术环节。囿于计算和操作的复杂性,前人研究多使用相对单一的统计方法测量和提取短语序列,导致提取的数据包含大量噪音。文章使用前沿的大数据处理手段和计算技术,实现了基于频数、互信息、边界熵等多种统计手段的短语序列提取方法,并研制开发了相应的系统。实验结果表明,该系统能够在普通计算机上支持千万词级规模的大型语料库运算,并能显著提高短语序列的提取质量。 The extraction of phraseological sequences from corpora has become one of the research hotspots in recent years, but due to the computational complexity, previous studies often used a single measurement method to extract the phraseological sequence, and their experimental results also constantly contained a lot of disturbing sequences. In this paper, by using the state-of-the-art big data processing method, we design a new extraction method based on frequency, mutual information and maximum boundary entropy, and we also develop a phraseological sequence extraction software The experimental results show that the software can support computing on a large-scale corpus of tens of millions of word tokens in the ordinary computer, and meanwhile it can reach a higher extraction precision of phraseological sequences in terms of both quantity and quality.
出处 《外语电化教学》 CSSCI 北大核心 2017年第4期9-16,共8页 Technology Enhanced Foreign Language Education
基金 国家社会科学基金项目(项目编号:13BYY074 14CYY049) 北京市社会科学基金项目(项目编号:16JDYYA001)的部分研究成果
关键词 语料库驱动 短语序列 自动提取 设计与开发 Corpus-Driven Approach Phraseological Sequence Automatic Extraction Design and hnplementation
  • 相关文献

参考文献4

二级参考文献77

  • 1卫乃兴.语义韵研究的一般方法[J].外语教学与研究,2002,34(4):300-307. 被引量:343
  • 2Channell, J. Corpus-based analysis of evaluative lexis [ C] //Wolfgang Teubert & Ramesh Krishnamurthy. Corpus Linguistics : Critical Concepts in Linguistics, Vol. 5. London and New York: Routledge, 2007: 244- 264.
  • 3Francis, G. A corpus-driven approach to grammar: Princi- ples, methods and examples [ C ] // M. Baker, G. Francis & E. Tognini-BoneUi. Text and Technology : In Honour of John Sinclair. Amsterdam : John Benjamins, 1993 : 137 - 156.
  • 4Hunston, S. Corpora in Applied Linguistics [ M ]. Cambridge: Cambridge University Press, 2002.
  • 5Hunston, S. & G. Francis. Pattern Grammar [ M ]. Amsterdam/Philadelphia: John Benjamins, 2000.
  • 6Louw, B. Irony in the text or insincerity in the writer? The diagnostic potential of semantic prosodies [ C ] // M. Baker, G. Francis & E. Tognini-Bonelli. Text and Technology: In Honour of John Sinclair. Amsterdam: John Benjamins, 1993 : 157 - 176.
  • 7Partington, A. Patterns and Meanings [ M ]. Amsterdam/ Philadelphia: John Benjamins, 1998.
  • 8Sinclair, J. The nature of evidence [ C ] // J. Sinclair. Looking Up : An Account of the COBUILD Project in Lexical Computing. London: Collins COBUILD, 1987:150-159.
  • 9Sinclair, J. Corpus Concordance Collocation [ M]. Oxford: Oxford University Press, 1991.
  • 10Sinclair, J. The search for units of meaning [ J ]. Textus, 1996, Ix: 75- 106.

共引文献107

同被引文献129

引证文献3

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部