摘要
基于《红楼梦》文本,统计120回每一章回的单字字频数据形成字频向量作为每一回的特征向量,并推算它们之间的余弦相似系数,进而推算前80回之间、后40回之间、以及前80回与后40回交叉的相似系数平均水平,并分别用t检验与Wilcoxon秩和检验两种假设检验方法,考察三者之间差异的显著性水平,结果表明,如果认可前80回为曹雪芹一人所著,那么更有理由认为后40回也出自一人之手;前80回与后40回来自两位不同的著者,同时不排除前80回有后人增补的痕迹,后40回有曹雪芹遗留的残稿,差异均具有极其显著性水平。
Based on the text of A Dream of Red Mansions,the character frequency data of each of the 120 chapters is calculated to form the character frequency vector as the feature vector of each chapter,and the cosine similarity coefficient between the feature vectors is calculated.Then the average level of similarity coefficient of the first 80 chapters,that of the last 40 chapters,and that of the entire book as a whole are calculated respectively.The significance level of the differences among the three average levels of similarity coefficient are examined by using T test and Wilcoxon rank sum test respectively.The test results show that the differences are extremely significant.The conclusion therefore shows the following two possibilities:one is that if Cao Xueqin is considered as the sole author of the first 80 chapters,then there is more reason to argue that the last 40 chapters are also written by a single person;the other possibility is that the first 80 chapters and the last 40 chapters are from two different authors.At the same time,it is not excluded that the first 80 chapters have traces of additions by later generations,and the last 40 chapters show remnants written by Cao Xueqin.
作者
王宇琦
王晓刚
WANG Yu-qi;WANG Xiao-gang(Shanghai University,Shanghai 200444,China;Yangzhou Polytechnic College,Yangzhou 225009,China)
出处
《扬州职业大学学报》
2022年第3期27-31,43,共6页
Journal of Yangzhou Polytechnic College
关键词
《红楼梦》作者
字频向量
余弦相似系数
T检验
Wilcoxon秩和检验
the author(s)of A Dream of Red Mansions
character frequency vector
cosine similarity coefficient
T-test
Wilcoxon rank sum test