摘要
通过文本挖掘获得的词频数据对观念意识转变进行测度,利用认知行为特征对观念更新序列进行建模.为了能检测观念意识发生重大转变,采用Monte Carlo实验对两个基于非参的转折点分析方法进行了比较分析.主要发现是:1)根据观念意识转变的认知行为特征所构造的模拟实验,基于CPM框架下的三种非参检验方法,Mann-Whitney检验在检测转变点的功效水平和精确度上要优于Cramer-vonMises和Kolmogorov-Smirnov检验.2)基于偏离测度的非参迭代方法E-Divisive对转变点的检测性能总体上要优于CPM框架下的三种非参方法,但后者可以对容量较小的样本数据进行检测.3)利用文本挖掘,可以将以前只能语言描述的观念意识演进特征进行量化,并用图形分析进行直观呈现,成为一个有价值的实证分析工具.最后对文本挖掘数据作为一种非随机抽样数据,如何保证统计有效性做了补充讨论.
The article is designed to measure and model the evolution of social sense with text-mining data.The Monte Carlo experiments based on cognitive characteristics of social sense are conducted on comparing with competing nonparametric methods.The results indicate that 1) Under the CPM framework,Mann-Whitney test has better performance on power and detecting accuracy on testing change points of the evolution of social sense than Cramer-von-Mises and Kolmogorov-Simirnov test;2)Given enough samples,the performance of E-Divisive,a nonparametric iterative algorithm based on divergence measure,is better than the nonparametric methods under CPM.However,the latter can detect change points in small samples.3)Taking advantage of text-mining,the evolution of social sense can be measured,its characteristics can be explored with graphic analysis,and the method can be a useful empirical tool for social science.
出处
《数学的实践与认识》
北大核心
2016年第23期155-167,共13页
Mathematics in Practice and Theory
关键词
文本挖掘
X2距离
目的性抽样
偏离测度
purposive sampling
X^2 distance
text-mining
divergence measure