摘要
实际生活中,大多数视频均含有若干动作或物体,简单的单句描述难以展现视频中的全部信息,而各类长视频中,教学视频步骤清晰、逻辑明确,容易从中提取特征并使用深度学习相关算法进行实验验证,从长视频中提取复杂信息成为研究人员日益关注的问题之一.为此,文中收集整理了一个命名为iMakeup的大规模的美妆类教学视频数据集,其包含总时长256 h的热门50类2000个长视频,以及12823个短视频片段,每个片段均根据视频的逻辑步骤顺序进行划分,并标注起止时间和自然语句描述.文中主要通过视频网站下载收集原始视频,并请志愿者对视频的详细内容进行人工标注;同时统计分析了此数据集的规模大小和文本内容,并与其他类似研究领域的若干数据集进行对比;最后,展示了在此数据集上进行视频语义内容描述的基线实验效果,验证了此数据集在视频语义内容描述任务中的可行性.iMakeup数据集在收集整理时注重内容多样性和类别完整性,包含丰富的视觉、听觉甚至统计信息.除了基本的视频语义内容描述任务之外,该数据集还可用于视频分割、物体检测、时尚智能化推荐等多个前沿领域.
Automatically describing images or videos with natural language sentences(a.k.a.image/video captioning)has increasingly received significant attention.Most related works focused on generating one caption sentence for an image or a short video.While most videos in our daily life contain numerous actions or objects de facto,it is hard to describe complicated information involved in these videos with a single sentence.How to learn information from long videos has become a compelling problem.The number of large-scale dataset for such task is limited.Instructional videos are a unique type of videos that have distinct and attractive characteristics for learning.Makeup instructional videos are very popular on commercial video websites.Hence,we present a large-scale makeup instructional video dataset named iMakeup,containing 2000 videos that are equally distributed over 50 topics.The total duration of this dataset is about 256 hours,containing about 12 823 video clips in total which are segmented based on makeup procedures.We describe the collection and annotation process of our dataset;analyze the scale,the text statistics and diversity in comparison with other video dataset for similar problems.We then present the results of our baseline video caption models on this dataset.The iMakeup dataset contains information from both visual and auditory modalities with a large coverage and diversity of content.Despite for video captioning,it can be used in an extensive range of problems,such as video segmentation,object detection,intelligent fashion recommendation,etc.
作者
林霄竹
金琴
陈师哲
Lin Xiaozhu;Jin Qin;Chen Shizhe(Multimedia Computing Laboratory,School of Information,Renmin University of China,Beijing 100872)
出处
《计算机辅助设计与图形学学报》
EI
CSCD
北大核心
2019年第8期1350-1357,共8页
Journal of Computer-Aided Design & Computer Graphics
基金
国家自然科学基金(61772535)
国家重点研发计划(2016YFB1001202)