We present a lightweight and efficient semisupervised video object segmentation network based on the space-time memory framework.To some extent,our method solves the two difficulties encountered in traditional video o...We present a lightweight and efficient semisupervised video object segmentation network based on the space-time memory framework.To some extent,our method solves the two difficulties encountered in traditional video object segmentation:one is that the single frame calculation time is too long,and the other is that the current frame’s segmentation should use more information from past frames.The algorithm uses a global context(GC)module to achieve highperformance,real-time segmentation.The GC module can effectively integrate multi-frame image information without increased memory and can process each frame in real time.Moreover,the prediction mask of the previous frame is helpful for the segmentation of the current frame,so we input it into a spatial constraint module(SCM),which constrains the areas of segments in the current frame.The SCM effectively alleviates mismatching of similar targets yet consumes few additional resources.We added a refinement module to the decoder to improve boundary segmentation.Our model achieves state-of-the-art results on various datasets,scoring 80.1%on YouTube-VOS 2018 and a J&F score of 78.0%on DAVIS 2017,while taking 0.05 s per frame on the DAVIS 2016 validation dataset.展开更多
基金partially supported by the National Natural Science Foundation of China(Grant Nos.61802197,62072449,and 61632003)the Science and Technology Development Fund,Macao SAR(Grant Nos.0018/2019/AKP and SKL-IOTSC(UM)-2021-2023)+1 种基金the Guangdong Science and Technology Department(Grant No.2020B1515130001)University of Macao(Grant Nos.MYRG2020-00253-FST and MYRG2022-00059-FST).
文摘We present a lightweight and efficient semisupervised video object segmentation network based on the space-time memory framework.To some extent,our method solves the two difficulties encountered in traditional video object segmentation:one is that the single frame calculation time is too long,and the other is that the current frame’s segmentation should use more information from past frames.The algorithm uses a global context(GC)module to achieve highperformance,real-time segmentation.The GC module can effectively integrate multi-frame image information without increased memory and can process each frame in real time.Moreover,the prediction mask of the previous frame is helpful for the segmentation of the current frame,so we input it into a spatial constraint module(SCM),which constrains the areas of segments in the current frame.The SCM effectively alleviates mismatching of similar targets yet consumes few additional resources.We added a refinement module to the decoder to improve boundary segmentation.Our model achieves state-of-the-art results on various datasets,scoring 80.1%on YouTube-VOS 2018 and a J&F score of 78.0%on DAVIS 2017,while taking 0.05 s per frame on the DAVIS 2016 validation dataset.