期刊文献+

基于篇章结构的英文作文自动评分方法 被引量:13

English Automated Essay Scoring Methods Based on Discourse Structure
下载PDF
导出
摘要 作文自动评分(Automated Essay Scoring AES)是指使用统计学、自然语言处理及语言学等领域的技术对作文进行评价和评分的系统。篇章结构分析是自然语言处理领域的一个重要研究方向,也是作文自动评分系统的重要组成部分之一。目前国外的作文自动评分系统虽有广泛应用,但对篇章结构评分的研究还存在不足,且对中国学生英语作文的针对性不强;国内对英语作文自动评分的研究处于起步阶段,忽视了篇章结构对英语作文评分的重要性。针对这些问题,提出一种基于篇章结构的英文作文自动评分方法,在词、句、段落3个层面上提取作文的词汇、句法以及结构等特征,并使用支持向量机、随机森林以及极端梯度上升等算法对篇章成分进行分类,最后构建线性回归模型对作文的篇章结构进行评分。实验结果表明,基于随机森林的篇章成分识别模型(Discourse Element Identification based Random Forest,DEI-RF)的准确率为94.13%;基于线性回归的篇章结构自动评分模型(Discourse Structures Scoring based Linear Regression,DSS-LR)在背景介绍段(Introduction)、论证段(Argumentation)以及让步段(Concession)的均方差可达到0.02,0.11和0.08。 Automated essay scoring is defined as the computer technology that evaluates and scores the composition,based on the technologies of statistics,natural language processing,linguistics and some other fields.Discourse structure analysis is not only an important research field of natural language processing,but also an important component of the AES system.Nowadays,AES system has widely application.However,there is not enough research on the structure of the essay,and the AES system does not focus on the Chinese students.The domestic researches on the AES are in infancy,ignoring the importance of discourse structure in essay scoring.In view of these problems,this paper proposed a method of automated essay scoring based on discourse structure.Firstly,the method extracts essay’s features,such as vocabulary,lexical and discourse structure from levels of words,sentences and paragraphs.Then,the composition of essays is classified by support vector machines,random forests and extreme gradient boosting,and then the linear regression model with the discourse element is constructed to score the compositions.The experimental results show that the accuracy of discourse element identification based random forest (DEI-RF) can reach 94.13%,and the mean squared error of automated discourse structure scoring based on linear regression (DSS-LR) model can reach 0.02 ,0.11 and 0.08 on introduction,argumentation and concession respectively.
作者 周明 贾艳明 周彩兰 徐宁 ZHOU Ming;JIA Yan-ming;ZHOU Cai-lan;XU Ning(School of Computer Science and Technology,Wuhan University of Technology,Wuhan 430070,China;Research Center for Artificial Intelligence and Big Data,Global Wisdom Inc,Beijing 100085,China;Hubei Key Laboratory of Transportation Internet of Things,Wuhan University of Technology,Wuhan 430070,China)
出处 《计算机科学》 CSCD 北大核心 2019年第3期234-241,共8页 Computer Science
关键词 作文自动评分 篇章成分 篇章结构分析 自然语言处理 随机森林 线性回归 Automated essay scoring Discourse element Discourse structure analysis Natural language processing Random forest Linear regression
  • 相关文献

参考文献5

二级参考文献66

  • 1Graff D. The 1998 broadcast news speech and language-model corpus. Slides from lecture at the 1997 DARPA Speech Recognition Workshop, Feb. 1997.
  • 2Rosenfeld R. A maximum entropy approach to adaptive statistical language modeling. Computer Speech and Language, 1996, 10:187-228.
  • 3Katz S M. Estimation of probabilities from sparse data for the language model component of speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing, 1987, ASSP35:400-401.
  • 4Jelinek F,Mercer R L. Interpolated estimation of Markov source parameters from sparse data. In:Proc. of the Workshop on Pattern Recognition in Practice, Amsterdam, The Netherlands: North-Holland, May 1980,381-397.
  • 5Magerman D M. Natural Language Parrsing as Statistical Pattern Recognition:[PhD Thesis]. Stanford University, 1994.
  • 6Bahl L R,Brown P F, De Souza P V, Mercer R L. A tree-based statistical language model for natural language speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing. 1989, 37(7): 1001-1008.
  • 7Rosenfeld R. Adaptive Statistical Language Modeling: A Maximum Entropy Approach: [PhD thesis]. Carnegie Mellon University, 1994- CMU Technical Report CMU-CS-94-138.
  • 8Darroch J, RatclifI D. Generalized iterative scaling for log-linear models. The annals of Mathematical statistics 1972, 43: 1470-1480.
  • 9Berger A L. Della Pietra S A, Della Pietra V J. A maximum entropy approach to natural language processing. Computational Linguistics 1996,22(1) : 39-71.
  • 10RosenIeld R. Two decades oI Statistical Language Modeling: Where Do We Go From Here? Proceedings of the IEEE, 2000, 88(8).

共引文献60

同被引文献103

引证文献13

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部