摘要
在深入分析社会标注系统中用户、标签及被标注Web资源之间的关联关系的基础上,提出了基于用户标签的Web资源语义描述获取算法,并基于所获取的Web资源语义描述及其与用户之间的关联关系,利用一种迭代的聚类算法对社会标注系统中的Web资源进行基于语义的聚类,该聚类算法通过迭代不断加强被聚类资源间的一致性信息,从而能够克服传统聚类算法所面临的数据稀疏以及性能问题。研究表明,对Web资源所处环境的各种关联关系的深入分析,能够帮助用户更好地理解和操作相关Web资源,尤其是对于本身特征不充分或难以获取的Web资源来说,关联关系的分析研究具有十分重要的意义。
By analyzing the correlations between users, tags and Web resources in social annotation systems, this paper proposes an algorithm to acquire the semantic descriptions of Web resources based on users' tags. And based on the acquired semantic descriptions and the correlations between the descriptions and users, an iterative algorithm is proposed for semantic clustering of the Web resources in social annotation systems. By mutually reinforcing the agreed information between Web resources during the clustering process, the clustering algorithm can tackle, to some extent, the challenges faced by traditional clustering algorithms such as the data sparseness and the performance constraints. The research illustrates the importance of the analysis of the correlations in the environment of Web resources, especially to those whose features are not sufficient or difficult to acquire.
出处
《高技术通讯》
CAS
CSCD
北大核心
2012年第1期48-54,共7页
Chinese High Technology Letters
基金
863计划(2007AA01Z132),国家自然科学基金(60435010),973计划(2007CB311004)和国家科技支撑计划(2006BAC08B06)资助项目.
关键词
社会标注
语义抽取
语义聚类算法
广义关联
social annotation, semantic extraction, semantic clustering algorithm, general correlation