期刊文献+

基于LDA和word2vec的英文作文跑题检测 被引量:3

Off-topic detection for English essays based on LDA and word2vec
下载PDF
导出
摘要 针对目前国内的英语作文辅助批阅系统缺少准确而高效的跑题检测算法的问题,提出了一种结合LDA和word2vec的跑题检测算法。该算法利用LDA模型对文档建模并通过word2vec对文档进行训练,利用得到的文档主题和词语之间的语义关系,对文档中各主题及其特征词计算概率加权和,最终通过设定合理阈值筛选出跑题作文。实验中通过改变文档的主题数而得到不同的F值,确定了最佳主题数。实验结果表明,所提出的方法比基于向量空间模型的方法更具有效性,可以检测到更多的跑题作文,并且准确率较高,F值达到89%以上,实现了作文跑题检测的智能化处理,可以有效地应用在英语作文教学中。 Aiming at the problem that the lack of accurate and efficient off-topic detection algorithm for the current English composition teaching system in China,this paper proposed an off-topic detection algorithm based on LDA and word2vec. The algorithm used LDA to model the documents and trained it with word2vec,with obtained semantic relation between document’s topic and words,calculated the probability weighted sum of each topic and its feature words in the document. Finally,by setting reasonable threshold,it selected the off-topic essays. According to the different F values for the different number of topics in the document,it determined the optimum number of topics in the experiment. The experimental results show that,compared to traditional vector space model,the proposed method can detect more off-topic essays with higher accuracy,and the F value is above 89%,which realizes the intelligent processing of off-topic essays detection,and may applies effectively in English essays teaching.
作者 曲强 崔荣一 赵亚慧 Qu Qiang;Cui Rongyi;Zhao Yahui(Laboratory of Intelligent Information Processing,Dept. of Computer Science & Technology,Yanbian University,Yanji Jilin 133002,China)
出处 《计算机应用研究》 CSCD 北大核心 2019年第2期415-419,共5页 Application Research of Computers
基金 国家语委"十二五"科研规划2015年度科研项目(YB125-178)
关键词 作文跑题检测 向量空间模型 潜在狄利克雷分配 词语间语义关系 off-topic essays detection vector space model(VSM) latent Dirichlet allocation(LDA) semantic relations between words
  • 相关文献

参考文献12

二级参考文献165

共引文献606

同被引文献14

引证文献3

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部