摘要
如何从海量多媒体文章中自动识别高质量内容是信息推荐、搜索引擎等系统的核心功能之一.现有的方法在训练中依赖大量的人工标注数据.针对其未考虑社交媒体中的社交信息和视觉内容的问题,提出一种基于正无标记(positive and unlabeled, PU)学习的图卷积高质量文章内容识别模型--基于PU学习的图卷积网络(graph convolutional network based on positive and unlabeled learning, GCN-PU),在统一的框架中使用一个异构网络同时建模社交媒体文章的文本和社交信息,并在该网络上使用图卷积网络来融合这些信息得到高阶特征.另外,使用多媒体文章的全局视觉布局信息来捕捉文章的综合视觉质量特征,用于补充图卷积网络输出的高阶特征.最后,在训练机制和损失函数中引入了PU学习来充分利用社交媒体中大量未标注的文章信息.在真实社交媒体数据集上的实验结果表明,相比于现有的方法, GCN-PU方法的F值提升了3%以上.
How to automatically recognize high-quality content from a large number of multimedia articles is one of the core functions of information recommendation,search engine and other systems.Existing methodsrely on a large amount of manual annotated data in training.In addition,visual and social information in social media is often not considered.This paper proposes a high-quality article content recognition model of graph convolutional network based on positive and unlabeled learning,named GCN-PU,which uses a heterogeneous network to simultaneously model the text and social information of social media articles in a unified framework.A graph convolutional network is used on the network to fuse the information to obtain high-order features.In addition,the global visual layout information of the multimedia article is used to capture the comprehensive visual quality characteristics of the article,which is used to complement the high-order features of the graph convolutional network output.Finally,we introduce positive and unlabeled learning into the training and loss functions to take advantage of the large amount of unlabeled article information in social media.Experimental results on real social media datasets show that GCN-PU has improved F-score by more than 3%over current best approaches.
作者
赵泉
胡骏
方全
钱胜胜
徐常胜
Zhao Quan;Hu Jun;Fang Quan;Qian Shengsheng;Xu Changsheng(School of Computer and Information,Hefei University of Technology,Hefei 230009;National Laboratory of Pattern Recognition,Institute of Automation,Chinese Academy of Sciences,Beijing 100190)
出处
《计算机辅助设计与图形学学报》
EI
CSCD
北大核心
2020年第6期943-949,共7页
Journal of Computer-Aided Design & Computer Graphics
基金
国家自然科学基金(61432019,61702509,61802405,61720106006)。
关键词
社交媒体
多媒体文章
质量识别
正无标记学习
图卷积网络
social media
multimedia article
quality identification
positive and unlabeled learning
graph convolutional network