摘要
【目的/意义】高频词选取是共词分析中重要一环,高频词阈值选取结果会直接影响共词分析的效果。目前图书情报领域研究人员做共词分析时主要有自主确定法、高低频词分界公式法、普赖斯公式法等。笔者以词频g指数为基础构建了一种确定高频词阈值的方法,对上述高频词阈值确定方法进行实证研究,探究了不同高频词阈值确定方法选词的实际效果。【方法/过程】本文以中国知网中收录的新型冠状病毒肺炎主题论文为数据来源,利用不同高频词阈值确定方法选取对应高频词。利用excel统计数据并构造共词矩阵,借助spss软件对矩阵进行聚类分析。【结果/结论】笔者发现基于词频g指数的方法取得了良好的共词聚类效果,为该方法的实际应用做了有益的尝试。
【Purpose/significance】The selection of high frequency words is an important part of the co-word analysis that directly affect the effect of the co-word analysis.At present,there are several methods to determine the threshold of high-frequency words in the area of the library and information science,such as the author set the threshold,Donohue’s formula,price formula.The author establishing the method of defining high frequency words based on g index of the frequency words.The paper do an empirical study by using these methods,in order to explore the effects of words selecting by different kind of methods.【Method/process】The article take the papers about Covid-19 as the data source,using different methods of determining threshold to choose high frequency words.Excel is used to do data statistics and construct co-word matrix,then the cluster analysis of matrix is made by spss software.【Result/conclusion】The results of cluster analysis shows that the method which based on g index of the frequency words is a good way to define the threshold of high-frequency words.
作者
虞秋雨
徐跃权
YU Qiu-yu;XU Yue-quan(Library and Information Centre,Ningbo Tech University,Ningbo 315100,China;School of Information Science and Technology,Northeast Normal University,Changechun 130024,China)
出处
《情报科学》
CSSCI
北大核心
2020年第9期90-95,共6页
Information Science
基金
2020年浙江省高校图工委科研项目“词汇遴选方法在‘热点’类文章中的应用研究”(2020TKT023)。
关键词
高频词
共词分析
词频g指数
聚类分析
新冠肺炎
high frequency words
co-word analysis
g index of the frequency words
cluster analysis
COVID-19