期刊文献+

学习者口语语料自动词性赋码的信度研究 被引量:12

On POS Tagging Reliability for EFL Learners' Transcribed Spoken Data
下载PDF
导出
摘要 词性赋码是对语料库进行句法分析的必要前提,可极大地拓宽语料库研究范围。而现有的英语自动词性赋码器多为本族语的书面语设计,能否准确地为学习者口语语料进行自动赋码尚未见到研究报告。本研究采用Brill和CLAWS7两种词性赋码器分别为高低分组学习者口语语料进行自动赋码,统计赋码准确率,并比较两种赋码器对学习者口语特征的处理能力,目的在于考察基于规则的词性赋码器和基于概率的词性赋码器对学习者口语语料进行自动词性赋码的适用度,以期为中介语口语的句法特点研究提供依据。研究结果与梁茂成对中国学生书面语赋码准确率的考察结果基本一致,基于概率的词性赋码器更适用于为学习者口语语料进行自动赋码,赋码准确率较高且性能稳定,其赋码准确率受学习者口语语言水平影响不大,而基于规则的词性赋码器受学习者语言水平和口语特征(停顿、重复、缺乏语法性等)的影响较大。本研究为学习者口语句法特点研究提供重要依据,对误码编辑(tagediting)也有一定意义。 POS tagging is an essential prerequisite for further syntactic analyses and important in widening the research scope. However, corpus-based studies on syntactic aspects of second language acquisition have been rare due to the dubious reliability of POS taggers programmed to tag native speakers' written data. In this study, the Brill POS tagger and the CLAWS7 POS tagger have been used to assign POS tags to the same set of transcribed spoken data produced by a high-proficiency group and a low-proficiency group of EFL learners. The POS tags have then been manually examined, and the success rates of the taggers compared. Findings of the study support Liang's conclusion that the probability-based tagger is more viable and less affected by the proficiency level of the learners. Besides, the taggers are also examined for their ability to handle spoken features in the texts. It is revealed that pauses, repetitions and sentence fragments in learners' transcribed spoken data seriously affect the performance of the rule-based tagger. The study sheds important light on the feasibility of corpus-based studies on syntactic aspects of spoken interlanguage, and offers some implications for tag editing.
作者 王莉 梁茂成
出处 《外语教学》 CSSCI 北大核心 2007年第4期47-51,共5页 Foreign Language Education
关键词 词性赋码 中介语口语 语言水平 准确率 POS tagging spoken interlanguage proficiency level accuracy
  • 相关文献

参考文献12

  • 1Brill, E. Some advances in rule-based part of speech tagging[A].In Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94)[C]. Seattle, WaA:AAAI Press, 1994.
  • 2de Haan, P. Tagging non-native English with the TOSCA-ICLE tagger[A]. In Mair, C. & Hundt, M. (eds.). Corpus Linguistics and Linguistic Theory: Papers from the Twentieth International Conference on English Language Research on Computerized Corpora (ICAME 20)[C]. Freiburg im Breisgau,1999.
  • 3Garside, R. & Smith, N. A hybrid grammatical tagger:CLAWS4[A]. In Garside, R., Leech, G. & McEnery, T. (eds.). Corpus Annotation: Linguistic Information from Computer Text Corpora[C]. London: Longman, 1997.
  • 4Granger, S. A bird's-eye view of learner corpus research[A].In Granger, S., Hung, J.& Perch-Tyson, S. (eds.). Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching[C]. Amsterdam: Benjamins, 2002.
  • 5Granger, S., Hung, J. & Petch-Tyson, S. (eds.). Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching[C]. Amsterdam: Benjamins, 2002.
  • 6Jurafsky, D. & Martin, J. H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition[M]. NJ: Prentice Hall, 2000.
  • 7Leech, J. Preface[A]. In Granger, S. (ed.). Learner English on Computer[C]. London & New York:Longman, 1998.
  • 8Meunier, F. Computer tools for the analysis of learner corpora[A]. In Granger, S. (ed.). Learner English on Computer[C]. London & New York: Longman, 1998.
  • 9Teubert, W. My version of corpus linguistics[J]. International Journal of Corpus Linguistics, 2005,10(1) : 1-13.
  • 10Thomas, J. & Short, M. (eds.). Using Corpora for Language Research[C]. London: Longman, 1996.

二级参考文献17

  • 1梁燕,冯友,程良坤.近十年我国语料库实证研究综述[J].解放军外国语学院学报,2004,27(6):50-54. 被引量:29
  • 2Aarts, J. & S. Granger. 1998. Tag sequences in learner corpora: A key to interlanguage grammar and discourse [A]. In S. Granger (ed.). 1998.
  • 3Brill, E. 1992. A simple rule-based part of speech tagger [ A ]. In Proceedings of the DARPA Speech and Natural Language Workshop [C]. San Mateo, California: Morgan Kauffman.
  • 4Brill, E. 1994. Some advances in rule-based part of speech tagging [ A]. In Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94) [C]. Seattle, WaA:AAAI Press.
  • 5de Haan, P. 1999. Tagging non-native English with the TOSCA-ICLE tagger [A]. In C. Mair & M. Hundt (eds,). Corpus Linguistics and Linguistic Theory: Papers from the Twentieth International Conference on English Language Research on Computerized Corpora ( ICAME 20) [C]. Freiburg im Breisgau 1999.
  • 6Granger, S. 1996. From CA to CIA and back: An integrated approach to computerized bilingual and learner corpora [ A]. In K. Aijmer, B. Altenberg & M.Johansson (eds.). Languages in Contrast: Papers from a Symposium an Taxt-based Cross-linguistic Studies [C]. Lund: Lund University Press.
  • 7Granger, S. 1997. Automated retrieval of passives from native and learner corpora: Precision and recall [J ]. Journal of English Linguistics 25/4 : 365-374.
  • 8Granger, S. 1998. The computer learner corpus: A versatile new source of data for SLA research[A]. In S. Granger (ed.). 1998.
  • 9Granger, S. (ed.). 1998. Learner English an Computer [C]. London and New York: Longman.
  • 10Granger, S. 2002. A bird's-eye view of learner corpus research [A]. In S. Granger, J. Hung & S. Petch- Tyson (eds.). 2002. Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching [C]. Amsterdam: John Benjamins.

共引文献23

同被引文献195

引证文献12

二级引证文献24

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部