期刊文献+

针对中国学生英文文章的词性标注方法

A Part-of-Speech Tagging Algorithm for Essay Written by Chinese English Learner
原文传递
导出
摘要 提出了一种基于词向量的两层词性标注方法,使用少量人工提取的特征,大部分特征可使用词向量和第1层标注向量自动训练得到.该方法将标注集分成两类,分别作为不同层的标注集.首先,对容易标注的类别进行标注;然后,对难以标注的动词或者名词进行第2层标注,将其标注为具体的某类动词或名词.利用该方法对中国学生写的英语文章进行词性标注的准确率可从95.23%提高到95.63%,超过了现有基于词向量词性标注器对相同语料词性标注的准确率. A tagging algorithm about two layers part-of-speech base on word embedding was proposed.Only a few artificial features are needed in this algorithm, most features are replaced by word embedding and tagging vector that is got in the first layer.In addition, the tag set is divided into two categories, which are the tag sets of different layers.The ones which are easily to be tagged are tagged firstly in the first layer.Those tags which are hardly to be tagged as noun and verb are tagged in the second layer.Using this algorithm, the accuracy of part-of-speech tagging of essays written by Chinese English learner is improved from 95.23% to 95.63%, which outperforms the state-of-art word results of part-of-speech tagging of essays written by Chinese English learner based on vector based on word embedding.
出处 《北京邮电大学学报》 EI CAS CSCD 北大核心 2017年第2期16-20,共5页 Journal of Beijing University of Posts and Telecommunications
关键词 词性标注 中国学生 文章 词向量 part-of-speech tagging Chinese English learner essays word vector
  • 相关文献

参考文献2

二级参考文献12

  • 1李红.大学生英语写作常见错误归类分析[J].当代教育论坛(学科教育研究),2006(8):120-121. 被引量:2
  • 2Kristina Toutanova, Dan Klein, Christopher D Manning, et al. Feature-rich part-of-speech tagging with a cyclic dependency network [ C ] JJ Proceedings of NAACL-HLT 2003. Los Angeles, California: Association for Computa- tional Linguistics, 2003, 1: 173-180.
  • 3Ana Dmz-Negrillo, Detmar Meurers, Salvador Valera, et al. Towards interlanguagepos annotation for effective learner corpora in sla and fh [ J ]. Language Forum, 2010, 36: 1-15.
  • 4Mitchell Marcus, Beatrice Santorini, Mary Ann Marcink- iewicz. Building a large annotated corpus of English: the penn treebank [ J ]. Computational Linguistics, 1993, 19: 313-330.
  • 5Yoav Goldberg, Michael Elhadad. An efficient algorithm for easy-first non-directional dependency parsing [ C ]//Proceedings of NAACL 2010. Los Angeles, California: Association for Computational Linguistics, 2010: 742- 750.
  • 6Jun'ichi Kazama, Jun'ichi Tsujii. Evaluation and exten- sion of maximum entropy models with inequality con- straints [ C ]//Proceedings of EMNLP 2003. Honolulu, Hawaii: Association for Computational Linguistics, 2003 : 137-144.
  • 7Peter F. Brown, Peter V. Desouza, Robert L. Mercer, et al. Vincent J. Della Pietra, and Jenifer C. Lai. Class- based n-gram models of natural language [ J]. Computa- tional Linguistics, 1992,18 (4) : 467-479.
  • 8Alan Ritter, Sam Clark, Mausam, et al. Named entity recognition in tweets : an experimental study [C] //Pro- ceedings of EMNLP 2011. Honolulu, Hawaii: Associa- tion for Computational Linguistics, 2011 : 1524-1534.
  • 9Olutobi Owoputi, Brendan O Connor, Chris Dyer, et al. Improved part-of-speech tagging for online conversational text with word clusters [ C ] //Proceedings of NAACL- HLT 2013. Los Angeles, California: Association for Computational Linguistics, 2013: 380-390.
  • 10Adwait Ratnaparkhi. A maximum entropy model for part-of-speech tagging [ C ] // Proceedings of EMNLP 1997. Honolulu, Hawaii: Association for Computational Linguistics, 1996, 1: 133-142.

共引文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部