摘要
针对某一领域的文献,如果两个研究对象同现的频率越高,则通常假设二者存在联系的可能性越大。从而促使共词分析、文献共引分析以及文献作者共著分析等共现分析方法的流行。然而,传统共现分析三个阶段中的前两个阶段存在一定的缺陷,从而导致最后得到的共现聚类分析的结果可能存在一定的误导性。为克服该缺陷,本文从关联规则挖掘领域引入了一种新的共现聚类分析方法——最大频繁项集挖掘,它将传统共现分析法的三个阶段压缩为一个阶段,充分利用了可以利用的各种信息,克服了传统方法的缺陷。通过实验分析发现,设置合适的最小支持度阈值,基本上可以得到比较满意的结果。
In documents for some area, if two research objects have higher co-occurrence frequency, then one usually assumes that there is higher probability an underlying link exists between the two objects. It is this reason that prompts the popularity of many co-occurrence analysis methods, such as co-word analysis, co-citation analysis, co-authorship analysis, etc. The process of traditional co-occurrence analysis often consists of three steps. However, there are problematic for the previous two steps, which may lead to some misleading co-occurrence clustering results. Therefore, this paper introduces a new method for co-occurrence clustering analysis--maximal frequent itemset mining--from association rule mining domain. This approach compresses three steps in the traditional co-occurrence clustering into one step, which simplifies greatly the resulting process. One of the most appealing characteristic of this approach is that it can make the best use of all available information, which overcomes the problem in the traditional co-occurrence analysis. Experimental results show that one can basically obtain satisfactory clustering results by setting a proper minimal support threshold.
出处
《情报学报》
CSSCI
北大核心
2012年第2期143-150,共8页
Journal of the China Society for Scientific and Technical Information
基金
)本研究受“十二五”国家科技支撑计划项目“面向外文科技知识组织体系的大规模语义计算关键技术研究”(2011BAH10804)
中国科学技术信息研究所预研项目“科技文献深层领域主题监测及主题演化规律揭示”(YY-201129)
江苏省社会科学基金项目“数字报纸的自动标引研究”(09TQC011)和教育部人文社会科学研究项目“电子报纸内容深加工研究”(09YJC870014)资助.
关键词
共现分析
共词分析
聚类分析
最大频繁项集
层次聚类
co-occurrence analysis ,co-word analysis, clustering analysis, maximal frequent itemset,hierarchical clustering