面向社交媒体的高质量内容识别被引量：2

High-Quality Content Recognition in Social Media

下载PDF

导出

摘要如何从海量多媒体文章中自动识别高质量内容是信息推荐、搜索引擎等系统的核心功能之一.现有的方法在训练中依赖大量的人工标注数据.针对其未考虑社交媒体中的社交信息和视觉内容的问题,提出一种基于正无标记(positive and unlabeled, PU)学习的图卷积高质量文章内容识别模型--基于PU学习的图卷积网络(graph convolutional network based on positive and unlabeled learning, GCN-PU),在统一的框架中使用一个异构网络同时建模社交媒体文章的文本和社交信息,并在该网络上使用图卷积网络来融合这些信息得到高阶特征.另外,使用多媒体文章的全局视觉布局信息来捕捉文章的综合视觉质量特征,用于补充图卷积网络输出的高阶特征.最后,在训练机制和损失函数中引入了PU学习来充分利用社交媒体中大量未标注的文章信息.在真实社交媒体数据集上的实验结果表明,相比于现有的方法, GCN-PU方法的F值提升了3%以上. How to automatically recognize high-quality content from a large number of multimedia articles is one of the core functions of information recommendation,search engine and other systems.Existing methodsrely on a large amount of manual annotated data in training.In addition,visual and social information in social media is often not considered.This paper proposes a high-quality article content recognition model of graph convolutional network based on positive and unlabeled learning,named GCN-PU,which uses a heterogeneous network to simultaneously model the text and social information of social media articles in a unified framework.A graph convolutional network is used on the network to fuse the information to obtain high-order features.In addition,the global visual layout information of the multimedia article is used to capture the comprehensive visual quality characteristics of the article,which is used to complement the high-order features of the graph convolutional network output.Finally,we introduce positive and unlabeled learning into the training and loss functions to take advantage of the large amount of unlabeled article information in social media.Experimental results on real social media datasets show that GCN-PU has improved F-score by more than 3%over current best approaches.

作者赵泉胡骏方全钱胜胜徐常胜 Zhao Quan;Hu Jun;Fang Quan;Qian Shengsheng;Xu Changsheng(School of Computer and Information,Hefei University of Technology,Hefei 230009;National Laboratory of Pattern Recognition,Institute of Automation,Chinese Academy of Sciences,Beijing 100190)

机构地区合肥工业大学计算机与信息学院中国科学院自动化研究所模式识别国家重点实验室

出处《计算机辅助设计与图形学学报》 EI CSCD 北大核心 2020年第6期943-949,共7页 Journal of Computer-Aided Design & Computer Graphics

基金国家自然科学基金(61432019,61702509,61802405,61720106006)。

关键词社交媒体多媒体文章质量识别正无标记学习图卷积网络 social media multimedia article quality identification positive and unlabeled learning graph convolutional network

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1黄志娥,谢佳莉,荀恩东.HSK自动作文评分的特征选取研究[J].计算机工程与应用,2014,50(6):118-122. 被引量：17

二级参考文献14

1任春艳.HSK作文评分客观化探讨[J].汉语学习,2004(6):58-67. 被引量：17
2刘殉．对外汉语教育学引论[M]．北京：北京语言大学出版社，2000：128．
3国家对外汉语教学领导小组办公室．汉语水平词汇与汉字等级大纲[M]．北京：北京语言学院出版社，1992．
4梁茂成,文秋芳.国外作文自动评分系统评述及启示[J].外语电化教学,2007(5):18-24. 被引量：185
5葛诗利,陈潇潇.国外自动作文评分技术研究[J].外语电化教学,2007(5):25-29. 被引量：35
6Page E B.Project essay grade:PEG[M]//Shermis M D, Burstein J.Automated Essay Scoring: A Cross-Disciplinary Perspective.Mahwah, N J: Lawrence Erlbanm Associates, 2003 : 43-54.
7Landauer T K, Laham D, Foltz P W.Automated essay scor- ing and annotation of essays with the intelligent essay assessor[M]//Shermis M D, Burstein J.Automated Essay Scoring: A Cross-Disciplinary Perspective.Mahwah, N J: Lawrence Erlbaum Associates, 2003 : 87-112.
8Burstein J.The e-rater scoring engine: automated essay scor- ing with natural language processing[M]//Shermis M D, Burstein J.Automated Essay Scoring: a Cross-Disciplinary Perspective.Mahwah, N J: Lawrence Erlbaum Associates, 2003: 113-122.
9Elliot S.IntelliMetric: from here to validity[M]//Shermis M D,Burstein J.Automated Essay Scoring:A Cross-Disci- plinary Perspective.Mahwah,NJ:Lawrence Erlbaum Asso- ciates, 2003 : 71-86.
10Rudner L M, Liang T.Automated essay scoring using Bayes' theorem[J].The Journal of Technology, Learning and Assessment, 2002 ( 2 ) : 3-21.