摘要
为了从更深层次语义信息来描述图像,定义了图像、文本、事件文本、事件图、图像关键要素和文本关键要素六元组,以六元组为对象建立了中文图文数据集。基于事件语义模型与事件语义标注规范,在对采集的图像文本标注事件语义角色与事件关系的基础上,采用事件图对其语义进行形式化表示。对该中文图文数据集的统计分析表明,该数据集中各个事件语义角色都有所涉及且分布适中,图文对应区域数量相对句子长度表现适中,图文对数据质量较高。
In order to disclose semantic information underlying image,this paper defines a six-tuple that contains image,text,textual events,event graph,image key elements and text key elements,and one Chinese image-text dataset has been constructed with this kind of six-tuple.Based on the using event semantic model and event semantic annotation guideline,the event graphs are used to represent semantic information of texts coupled with image on event semantic roles and event semantic relations.The statistical analysis of the Chinese image-text dataset shows that the semantic roles of each event in this dataset are involved and moderately distributed,and the number of corresponding areas of image-text is moderately relative to the sentence length,so the image-text dataset has a high quality.
作者
邓洲
刘茂福
胡慧君
冯文贺
DENG Zhou;LIU Maofu;HU Huijun;FENG Wenhe(College of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan 430065,Hubei,China;Key Laboratory of Intelligent Information Processing and Real-Time Industrial System in Hubei Province,Wuhan University of Science and Technology,Wuhan 430065,Hubei,China;Language Engineering and Computing Laboratory,Guangdong University of Foreign Studies,Guangzhou 510420,Guangdong,China)
出处
《武汉大学学报(理学版)》
CAS
CSCD
北大核心
2020年第3期253-260,共8页
Journal of Wuhan University:Natural Science Edition
基金
国家社科基金重大研究计划(11&ZD189)
湖北省教育厅人文社会科学研究项目(17Y018)。