摘要
本文用文本挖掘的方法分析不同层次的大学生英语写作在词汇和主题构思方面的特征。分析的数据来至中国词网公司2016年举行的全国大学生百万同题写作项目收集的部分(2000篇)作文文本。不同层次学生分别来自四川和重庆的三所985高校和8所普通二本院校。研究问题包括:两类学生整体分数差异、词汇量、词频分布、词汇丰富度特征、主题词汇关联以及主题聚类特征。结果表明:985高校学生作文分数明显高于普通二本院校学生;词汇量、词汇丰富度方面,前者也明显高于后者,而词频分布和使用频度高的实义词汇的分布特征相似;与主题词共现程度高的组词,两者相似度高;985高校学生文本提取出三个主题,而普通二本院校学生文本呈现五个主题。
The paper aims to mining the lexical and thematic features of ESL writing by students from different levels of Chinese universities. The 2000 writing texts by students from 3 key universities and 8 ordinary universities in Chongqing and Sichuan Province are offered by China Wordnet Company who initiated English writing campaign in 2016. The research questions include whether students from different levels of universities are different in writing scores, vocabulary size, vocabulary frequency distribution, lexical richness, recycling index of key content words, co - occurrence of key words and thematic clusters. The results show that students from key universities score much higher , have larger vocabulary size and higher index in lexical richness than those from ordinary universities; that both have similar patterns in vocabulary frequency distribution, recycling index of key content words and co - occurrence of key words ; but that different thematic clusters are retrieved, with 3 clusters for key universities and 5 clusters for ordinary universities.
作者
汪顺玉
赵晴
WANG Shunyu ZHAO Qing
出处
《英语研究》
CSSCI
2017年第1期118-131,共14页
English Studies
基金
2015年教育部人文社科项目“基于文本挖掘的中国英语学习者英语水平评估研究”[教社科司函(2015)]的阶段性成果
关键词
文本挖掘
英语作文
词汇特征
主题聚类
text mining
ESL writing
lexical features
thematic cluster