摘要
语料库之间词汇或其他特征出现频率的差异研究是语料库语言学的一项基本研究内容,使用的统计方法主要是卡方检验。任何统计方法的应用都有一定的前提假设,由于词汇差异研究不能够完全符合卡方检验的要求,使得研究结果存在较大误差。可应用于词汇差异分析的其他统计方法包括对数似然率和秩和检验。实验证明,对数似然率与卡方检验类似,在词汇差异检验中都会受到样本量以及样本代表性的影响而产生统计偏误,秩和检验能在一定程度上解决这些问题,得到比较客观的统计结果。
Lexical differences or the differences of other linguistic features between corpora are one of the fundamental research topics in corpus linguistics.Due to mismatches between its basic assumptions and the conditions of lexical differences,the chi-square test,which is the main statistical technique for lexical differences between corpora,is likely to produce statistical errors when applied to this kind of tasks.Therefore,in this study other statistical techniques including the log-likelihood ratio test and the rank sum test are also applied to lexical differences between corpora.As the analysis indicates,the log-likelihood ratio test is experimentally similar to the chi-square test in examining lexical differences between corpora;they both tend to cause statistical errors due to such factors as sample size and sample representativeness.The rank sum test,however,can solve some of the relevant problems and obtain relatively objective statistical results.
出处
《现代外语》
CSSCI
北大核心
2010年第3期249-257,共9页
Modern Foreign Languages