摘要
本研究将英国国家语料库(BNC)和美国国家语料库(ANC)大规模海量笔语语料随机分为60个实验组和41个检验组,总计83,864个语篇对,通过计算机编程的手段对英语词汇重复率进行动态分析,建立了估算词汇重复率的数学模型,并运用60个实验组对此公式进行了检验。研究发现,词汇重复率曲线的分布较有规律,极值较少;词汇重复率变化曲线为非线性;词汇重复率预测公式误差较小,可以用于估算不同长度的真实语篇英语词汇重复率的理论数值。
This research randomly divided large-scale w ritten British National Corpus( BNC) and American National Corpus( ANC) into the experimental set and test set,w ith the former containing 60 samples and the latter 41 samples,totaling 83,864 pairs of texts. A dynamic analysis w as made to study the English vocabulary repeat rate by means of computer programs. A mathematic model to calculate vocabulary repeat rate w as established and then tested based on the 60 samples in the experimental set. Results show ed that the distribution curves for vocabulary repeat rates w ere nonlinear and regular,w ith only a few outliers; the inferred formula experienced a very small margin of error in the calculation of theoretical repeat rate,and can be used to estimate the theoretical values of vocabulary repeat rate for authentic English texts of different lengths.
出处
《外语与外语教学》
CSSCI
北大核心
2016年第4期87-95,105,共10页
Foreign Languages and Their Teaching
基金
2012年教育部人文社科项目"英语动态篇际词汇重复率研究"(项目编号:12YJA740116)的阶段性成果