中文图文数据集构建

Construction of Chinese Image⁃Text Dataset

导出

摘要为了从更深层次语义信息来描述图像,定义了图像、文本、事件文本、事件图、图像关键要素和文本关键要素六元组,以六元组为对象建立了中文图文数据集。基于事件语义模型与事件语义标注规范,在对采集的图像文本标注事件语义角色与事件关系的基础上,采用事件图对其语义进行形式化表示。对该中文图文数据集的统计分析表明,该数据集中各个事件语义角色都有所涉及且分布适中,图文对应区域数量相对句子长度表现适中,图文对数据质量较高。 In order to disclose semantic information underlying image,this paper defines a six-tuple that contains image,text,textual events,event graph,image key elements and text key elements,and one Chinese image-text dataset has been constructed with this kind of six-tuple.Based on the using event semantic model and event semantic annotation guideline,the event graphs are used to represent semantic information of texts coupled with image on event semantic roles and event semantic relations.The statistical analysis of the Chinese image-text dataset shows that the semantic roles of each event in this dataset are involved and moderately distributed,and the number of corresponding areas of image-text is moderately relative to the sentence length,so the image-text dataset has a high quality.

作者邓洲刘茂福胡慧君冯文贺 DENG Zhou;LIU Maofu;HU Huijun;FENG Wenhe(College of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan 430065,Hubei,China;Key Laboratory of Intelligent Information Processing and Real-Time Industrial System in Hubei Province,Wuhan University of Science and Technology,Wuhan 430065,Hubei,China;Language Engineering and Computing Laboratory,Guangdong University of Foreign Studies,Guangzhou 510420,Guangdong,China)

机构地区武汉科技大学计算机科学与技术学院武汉科技大学智能信息处理与实时工业系统湖北省重点实验室广东外语外贸大学语言工程与计算实验室

出处《武汉大学学报（理学版）》 CAS CSCD 北大核心 2020年第3期253-260,共8页 Journal of Wuhan University:Natural Science Edition

基金国家社科基金重大研究计划(11&ZD189) 湖北省教育厅人文社会科学研究项目(17Y018)。

关键词图像标题生成事件语义角色图文数据集事件图 image captioning event semantic role image-text dataset event graph

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]