摘要
网络新闻内容除了直接的文本信息之外,通常还使用高度语义概括后的标签信息对新闻中出现的图片、音视频等多媒体信息进行描述,使得新闻内容中出现了不同语义层次、不同粒度的内容概念(直接的文本特征与标签特征)描述.文本特征维度通常较高,导致特征数较少的视图在聚类中的作用被弱化.同时,各个视图对聚类簇结构的贡献程度不一样.针对以上两个问题,本文首先在每个单独的视图上,进行混合粒度的统一操作(针对不同粒度进行统一的标签生成处理);在此基础上,借助信息熵良好的不确定性表示特性,对不同的视图进行加权融合,最后进行聚类操作.不同数据集的仿真实验证明了本文所提方法的有效性和可行性.
In addition to direct text information,online news content often uses highly semantically summarized tag information to describe multimedia information such as pictures,audio and video appearing in the news,making news content conceptual descriptions with different semantic levels and different granularities appear(direct text and label features).The feature dimensions of texts are usually higher,resulting in a view with fewer features having a weaker role in clustering.At the same time,each view contributes differently to the cluster structure.In view of the above two problems,this article first performs a unified operation of mixed granularity(uniform label generation processing for different granularities) on each individual view;on this basis,it uses the uncertainty of good information entropy to express characteristics Perform weighted fusion on different views,and finally perform clustering operations.Simulation experiments on different data sets have proved the effectiveness and feasibility of the proposed method.
作者
代劲
胡艳
DAI Jin;HU Yan(School of Software Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2021年第4期719-724,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61936001)资助
重庆市自然科学基金基础研究与前沿探索项目(cstc2017jcyjAX0408)资助。
关键词
混合粒度
新闻数据
多视图聚类
视图权重
mixed-granularity
news data
multi-view clustering
view weight