期刊文献+

基于文本语义离散度的自动作文评分关键技术研究 被引量:14

Research on Key Technology of Automatic Essay Scoring Based on Text Semantic Dispersion
下载PDF
导出
摘要 该文尝试从文本语义离散度的角度去提升自动作文评分的效果,提出了两种文本语义离散度的表示方法,并给出了数学化的计算公式。基于现有的LDA模型、段落向量、词向量等具体方法,提取出四种表征文本语义离散度的实例,应用于自动作文评分。该文从统计学角度将文本语义离散度向量化,从去中心化的角度将文本语义离散度矩阵化,并使用多元线性回归、卷积神经网络和循环神经网络三种方法进行对比实验。实验结果表明,在50篇作文的验证集上,在加入文本语义离散度特征后,预测分数与真实分数之间均方根误差最大降低10.99%,皮尔逊相关系数最高提升2.7倍。该表示方法通用性强,没有语种限制,可以扩展到任何语言。 Based on the existing methods,including LDA model,paragraph vector,word vector text,we extract four kinds of text semantic dispersion representations,and apply them on the automatic essay scoring.This paper gives a vector form of the text semantic dispersion from the statistical point of view and gives a matrix form from the perspective of decentralized text semantic dispersion,experimented on the multiple linear regression,convolution neural network and recurrent neural network.The results showed that,on the test data of 50 essays,after the addition of text semantic dispersion feature,the Root Mean Square Error is reduced by 10.99%and the Pearson correlation coefficient increases 2.7times.
出处 《中文信息学报》 CSCD 北大核心 2016年第6期173-181,共9页 Journal of Chinese Information Processing
基金 国家自然科学基金(61170189 61370126 61202239 U1636211) 国家863计划(2015AA016004 2014AA015105) 北京成像技术高精尖创新中心项目(BAICIT-2016001)
关键词 作文评分 语义离散度 神经网络 Automatic Essay Scoring semantic dispersion neural network
  • 相关文献

参考文献1

二级参考文献33

  • 1Attali, Y. and Burstein, J. Automated essay scoring with Erater V. 2.0 [ A ]. Paper presented at the Conference of the International Association for Educational Assessment (IAEA), Philadelphia, June 13 - 18, 2004.
  • 2Bachman, L. F. Fundamental considerations in language testing [ M ]. Oxford and New York: Oxford University Press, 1990.
  • 3Blok, H. , and de Glopper, K. 1992. Large scale writing assessment[A]. In L. Verhoeven and J. H. A. L. De Jong (eds.). The construct of language proficiency [ C ]. Amsterdam/Philadelphia: John Benjamins, 1992 : 101 - 111.
  • 4Burstein, J.C. , Kukich, K. , Wolff, S. , Lu, C. , Chodorow, M., Braden-Harder, L. & Harris, M.D. Automated scoring using a hybrid feature identification technique [ A ]. In The Proceedings of the annual meeting of the Association of Computation[ C], 1998a.
  • 5Burstein, J. C. , Kukieh, K. , Wolff, S. E. , Lu, C. , & Chodorow, M. Enriching automated scoring using discourse marking [ A ]. Paper presented at the Workshop on Discourse Relations and Discourse Marking at the annual meeting of the Association, 1998b.
  • 6Burstein, J. , Kukich, K. , Braden-Harder, L. , Chodorow, M. , Hua, S. & Kaplan, B. Computer analysis of essay content for automatic score prediction: A prototype automated scoring system for GMAT analytical writing assessment[ R]. (Research Report RR-98-15). Princeton, NJ: Educational Testing Service, 1998c.
  • 7Burstein, J. C., & Marcu, D. , Andreyev, S, & Chodorow, M. Towards automatic classification of discourse elements in essays [ A ]. In Proceedings of the 39th annual meeting of the Association for Computational Linguistics [ C ], France, 2001:90 - 92.
  • 8Chung, G. , & O'Neil, H. Jr. Methodological approaches to online scoring of essays [ R] ( Report No. CSE-TR-461 ). Los Angeles, CA: University of California, Los Angeles, Center for the Study of Evaluation, 1997.
  • 9Cohen, Y. , Ben-Simon, A. & Hovav, M. The effect of specific language features on the complexity of systems for automated essay scoring [ C ]. Paper presented at the IAEA 29th Annual Conference. Manchester, UK,2003.
  • 10Daigon, A. Computer grading of English composition [ J ]. English Journal 55.1, 1966:46 - 52.

共引文献183

同被引文献84

引证文献14

二级引证文献41

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部