摘要
为克服获取大量关系标记样本的昂贵代价,提出基于协同训练的半监督图文关系抽取模型,以利用大量无标记的数据来提升图文关系抽取的准确性。首先,基于图像和文本2种模态构建图像视图和文本语义视图,在标记数据集上训练2种不同视图的分类器;然后,将2种视图下的数据分别交叉输入另一视图的分类器,充分挖掘标记数据和未标记数据的信息,输出更准确的分类结果;最后,2种视图下的分类器对未标记数据进行预测,以输出一致的结果。在公开数据集VRD和VG上的实验结果显示,与6种较新的关系检测方法相比,该文方法图像视图和语义视图参数在VRD数据集上分别提升了2.24%、1.41%,在VG数据集上提升了3.59%。
In order to overcome the expensive cost of obtaining a large number of relational labeled samples,a semi-supervised image-text relationship extraction model based on co-training is proposed to improve the accuracy of image-text relationship extraction by using a large amount of unlabeled data.First,an image view and a text semantic view are constructed based on two modalities of image and text,and classifiers of two different views are trained on the labeled dataset;then,the data under the two views are crossed into the classifier of the other view,fully mining the information of labeled data and unlabeled data to output more accurate classification results;finally,the classifiers are used in both views to predict unlabeled data to output consistent results.The experimental results on the public datasets VRD and VG show that compared with 6 current state-of-the-art relationship detection methods,the proposed method improves by 2.24%and 1.41%respectively in the VRD dataset for the image view and text semantic view,and 3.59%in the VG dataset.
作者
王亚萍
王智强
王元龙
梁吉业
Wang Yaping;Wang Zhiqiang;Wang Yuanlong;Liang Jiye(School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China;Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan 030006,China)
出处
《南京理工大学学报》
CAS
CSCD
北大核心
2024年第4期451-459,共9页
Journal of Nanjing University of Science and Technology
基金
国家自然科学基金(61876103,61906111)。
关键词
协同训练
半监督
多模态
关系抽取
视觉关系检测
co-training
semi-supervised
multimodal
relationship extraction
visual relationship detection