Seeing through dense occlusions and reconstructing scene images is an important but challenging task.Traditional framebased image de-occlusion methods may lead to fatal errors when facing extremely dense occlusions du...Seeing through dense occlusions and reconstructing scene images is an important but challenging task.Traditional framebased image de-occlusion methods may lead to fatal errors when facing extremely dense occlusions due to the lack of valid information available from the limited input occluded frames.Event cameras are bio-inspired vision sensors that record the brightness changes at each pixel asynchronously with high temporal resolution.However,synthesizing images solely from event streams is ill-posed since only the brightness changes are recorded in the event stream,and the initial brightness is unknown.In this paper,we propose an event-enhanced multi-modal fusion hybrid network for image de-occlusion,which uses event streams to provide complete scene information and frames to provide color and texture information.An event stream encoder based on the spiking neural network(SNN)is proposed to encode and denoise the event stream efficiently.A comparison loss is proposed to generate clearer results.Experimental results on a largescale event-based and frame-based image de-occlusion dataset demonstrate that our proposed method achieves state-of-the-art performance.展开更多
基金supported by National Natural Science Funds of China (Nos. 62088102 and 62021002)Beijing Natural Science Foundation, China (No. 4222025)
文摘Seeing through dense occlusions and reconstructing scene images is an important but challenging task.Traditional framebased image de-occlusion methods may lead to fatal errors when facing extremely dense occlusions due to the lack of valid information available from the limited input occluded frames.Event cameras are bio-inspired vision sensors that record the brightness changes at each pixel asynchronously with high temporal resolution.However,synthesizing images solely from event streams is ill-posed since only the brightness changes are recorded in the event stream,and the initial brightness is unknown.In this paper,we propose an event-enhanced multi-modal fusion hybrid network for image de-occlusion,which uses event streams to provide complete scene information and frames to provide color and texture information.An event stream encoder based on the spiking neural network(SNN)is proposed to encode and denoise the event stream efficiently.A comparison loss is proposed to generate clearer results.Experimental results on a largescale event-based and frame-based image de-occlusion dataset demonstrate that our proposed method achieves state-of-the-art performance.