摘要
[目的/意义]标签聚类能够发现标签群体中蕴含的知识和语义结构,从而缓解标签所面临的歧义、模糊等问题,对于提升资源的检索效率、改善用户的使用体验、促进社会化标注系统的深化应用具有十分重要的意义。[过程/方法]提出一种基于资源内容聚类的标签聚类方法,该方法首先利用谱聚类算法对资源的词特征进行聚类,获取资源内容的K个特征簇,然后利用点互信息测量标签与这K特征簇的相关性,最后依据最大相关性原则将标签全体聚类成K个类簇。[结果/结论]实验结果表明,由于有效利用了资源的内容这一重要信息,提出的方法与基于VSM的K-M eans聚类方法和基于VSM的凝聚式层次聚类方法相比,获取了更好的聚类效果。
[ Purpose/Significance ] As tag clustering can discover knowledge and semantic structure from the tag groups and alleviate the fuzzy, ambiguity and other issues of them, It is of great significance in improving the retrieval efficiency of web resource, user experience and promoting the further application of social tagging system. [ Method/Process] A method of tag clustering based on resource contents was proposed in this paper. The process: firstly using spectral clustering algorithm to cluster the word characteristics of resources and get K attribute clusters of resources contents ; then using point mutual information to measure the correlation of labs and k attribute clusters ; finally mapping these labs into the K clusters based on the principles of maximum correlation. [ Result/Conclusion] The experimental results show that the proposed tag clustering method can obtain better clustering results than traditional methods, mainly due to the effective use of resource contents.
出处
《情报杂志》
CSSCI
北大核心
2016年第11期141-145,150,共6页
Journal of Intelligence
基金
国家自然科学基金项目"基于领域本体的煤矿安全数据融合方法及应用"(编号:51474007)
教育部人文社会科学研究项目"社会化标注环境下的标签层次关系发现方法研究"(编号:13YJCZH077)
安徽省人文社会科学重点研究基地项目"基于多源异构信息融合的煤矿安全管理体系与方法研究"(编号:SK2014A042)和"煤矿安全风险领域本体的构建技术及应用研究"(编号:SK2015A082)资助
关键词
社会化标注系统
标签聚类
资源内容
点互信息
Social tagging system
tag clustering
resource contents
pointwise mutual information