摘要
语料库间多特征相似性比较可采用的统计方法包括卡方检验、秩相关检验和卡方相似性检验。以350个常用词汇为例的语料库统计实验研究表明,在较大样本的多特征语言研究中,卡方检验很容易得出语料库之间具有显著性差异的结论,秩相关检验同样容易得出参与比较的文体具有显著相关的结论,而卡方相似性检验采用统计量相对值作为推断的根据,可得到较为细致的语料库之间相似程度的研究结果。
Statistical techniques applying to multi-feature similarity comparison include chisquare test,rank correlation test and chi by degrees of freedom (CBDF) test.Taking 350 high frequency words as an example,the statistical experiment between corpora shows that,in multi-feature linguistic studies based on relatively large samples,chi-square test arrives at the conclusion of significant difference between corpora easily and rank correlation analysis reaches the conclusion of significant correlation between different linguistic styles easily as well.CBDF test,based on the relative statistical values between different corpora,draws a relatively elaborate conclusion about the degree of similarity between them.However,this technique requires high computerized text manipulation capability of researchers.
出处
《现代教育技术》
CSSCI
2010年第8期83-87,共5页
Modern Educational Technology