摘要
当前主流的基于人工标注方法的样本数据集构建方法耗时耗力,无法构建大规模标准数据集。针对提高数据集的标注效率并依靠深度学习模型的准确性,构建了一种半自动标注数据的方法。通过人工标注少量数据来训练算法模型,利用新构建的模型对大型数据集进行检测识别,选取置信度不高的部分,经过人工审查后加入训练集,经过不断地循环迭代,逐步形成大规模标准数据集。实验结果表明,课题设计的半自动化标注方法能大幅缩短人工标注的时间,并且每次迭代循环都能不同程度的提高算法模型检测识别的准确率。
The current mainstream method of constructing sample data set based on manual annotation methods is time-consuming and labor-intensive,and it has no way to construct large-scale standard data set.Aiming at improving the labeling efficiency of data sets and relying on the accuracy of deep learning models,a semi-automatic data labeling method is constructed.The method is to train the model by manually labeling a small number of data,use the newly constructed model to detect and recognize large data set,select the parts with low confidence,and join the training set after manual review,and gradually form a large-scale standard data set after continuous loop iterations.The experimental results show that the semi-automatic labeling method designed by the subject can greatly shorten the time of manual labeling,and each iteration cycle can improve the accuracy of algorithm model detection and recognition to varying degrees.
作者
白雪冰
韩志峰
蒋龙泉
黄云刚
冯瑞
BAI Xuebing;HAN Zhifeng;JIANG Longquan;HUANG Yungang;FENG Rui(Academy for Engineering&Technology,Fudan University,Shanghai 200243,China;Software School,Fudan University,Shanghai 200243,China;School of Computer Science School,Fudan University,Shanghai 200243,China;Shanghai Haichao Institute For New Technologies,Shanghai 200070,China)
出处
《微型电脑应用》
2021年第8期9-13,17,共6页
Microcomputer Applications
基金
上海市科委一次性项目(202068400859-80001)
重大项目(AWS15J005)
上海市科委项目(20511101502)
上海市科委项目(20DZ1100205)。
关键词
半自动标注
标准数据集
深度学习
音视频
semi-automatic
standard data set
deep learning
audio and video data