摘要
文本挖掘是对具有丰富语义的文本进行分析,从而理解其所包含的内容和意义的过程。本研究运用文档相似性、对应分析、主题建模、多维尺度、特性分析等方法,以R和T-Lab为分析工具,以2018年全国“百万同题”英语写作大赛的全国英语写作语料库为研究材料,针对我国初中、高中、高职院校、普通高校和重点大学五个层次的英语学习者如何用英语讲好中国故事进行话语建构。分析发现初、高中和本科生呈现两类不同文本类型,初中与大学的文档相似度低,高中与大学的文档相似度高;中国故事比喻性构建的关键词比例占全部文本的91.1%,从字面意义构建的比例占8.9%;五类文本都呈现三个主题聚类,但五类文本主题的比例和主题内涵不一致。
Text mining is the process of analyzing and understanding texts with rich semantic meanings.This paper adopts the methods of document similarity,correspondence analysis,topic modeling,multidimensional scale and feature analysis with the National English Writing Competition corpus as the research subjects and R language and T-lab as analysis tools to investigate the different discourse models of how to tell Chinese story in English by Chinese students,who are classified into five categories based on their English competence.The results indicate that college students present different text types from junior and senior high school students,the document similarity between junior high school and college students is low,while senior high school and college students is high;the percentage of metaphorical construction of Chinese story is 91.1%,while literal construction is 8.9%;the five categories present three thematic clusters with different theme ratios and connotations.
出处
《天津外国语大学学报》
2020年第4期2-15,158,共15页
Journal of Tianjin Foreign Studies University
基金
教育部人文社会科学项目“基于文本挖掘的中国英语学习者英语写作能力评估研究”(15YJA740040)
国家社会科学重点项目“基于文本挖掘的中国政治话语国际传播研究”(18AYY006)。