Unlike existing fully-supervised approaches,we rethink colorectal polyp segmentation from an out-of-distribution perspective with a simple but effective self-supervised learning approach.We leverage the ability of mas...Unlike existing fully-supervised approaches,we rethink colorectal polyp segmentation from an out-of-distribution perspective with a simple but effective self-supervised learning approach.We leverage the ability of masked autoencoders-self-supervised vision transformers trained on a reconstruction task-to learn in-distribution representations,here,the distribution of healthy colon images.We then perform out-of-distribution reconstruction and inference,with feature space standardisation to align the latent distribution of the diverse abnormal samples with the statistics of the healthy samples.We generate per-pixel anomaly scores for each image by calculating the difference between the input and reconstructed images and use this signal for out-of-distribution(i.e.,polyp)segmentation.Experimental results on six benchmarks show that our model has excellent segmentation performance and generalises across datasets.Our code is publicly available at https://github.com/GewelsJI/Polyp-OOD.展开更多
Most polyp segmentation methods use convolutional neural networks(CNNs)as their backbone,leading to two key issues when exchanging information between the encoder and decoder:(1)taking into account the differences in ...Most polyp segmentation methods use convolutional neural networks(CNNs)as their backbone,leading to two key issues when exchanging information between the encoder and decoder:(1)taking into account the differences in contribution between different-level features,and(2)designing an effective mechanism for fusing these features.Unlike existing CNN-based methods,we adopt a transformer encoder,which learns more powerful and robust representations.In addition,considering the image acquisition influence and elusive properties of polyps,we introduce three standard modules,including a cascaded fusion module(CFM),a camouflage identification module(CIM),and a similarity aggregation module(SAM).Among these,the CFM is used to collect the semantic and location information of polyps from high-level features;the CIM is applied to capture polyp information disguised in low-level features,and the SAM extends the pixel features of the polyp area with high-level semantic position information to the entire polyp area,thereby effectively fusing cross-level features.The proposed model,named Polyp-PVT,effectively suppresses noises in the features and significantly improves their expressive capabilities.Extensive experiments on five widely adopted datasets show that the proposed model is more robust to various challenging situations(e.g.,appearance changes,small objects,and rotation)than existing representative methods.The proposed model is available at https://github.com/DengPingFan/Polyp-PVT.展开更多
We present the first comprehensive video polyp segmentation(VPS)study in the deep learning era.Over the years,developments in VPS are not moving forward with ease due to the lack of a large-scale dataset with fine-gra...We present the first comprehensive video polyp segmentation(VPS)study in the deep learning era.Over the years,developments in VPS are not moving forward with ease due to the lack of a large-scale dataset with fine-grained segmentation annotations.To address this issue,we first introduce a high-quality frame-by-frame annotated VPS dataset,named SUN-SEG,which contains 158690colonoscopy video frames from the well-known SUN-database.We provide additional annotation covering diverse types,i.e.,attribute,object mask,boundary,scribble,and polygon.Second,we design a simple but efficient baseline,named PNS+,which consists of a global encoder,a local encoder,and normalized self-attention(NS)blocks.The global and local encoders receive an anchor frame and multiple successive frames to extract long-term and short-term spatial-temporal representations,which are then progressively refined by two NS blocks.Extensive experiments show that PNS+achieves the best performance and real-time inference speed(170 fps),making it a promising solution for the VPS task.Third,we extensively evaluate 13 representative polyp/object segmentation models on our SUN-SEG dataset and provide attribute-based comparisons.Finally,we discuss several open issues and suggest possible research directions for the VPS community.Our project and dataset are publicly available at https://github.com/GewelsJI/VPS.展开更多
文摘Unlike existing fully-supervised approaches,we rethink colorectal polyp segmentation from an out-of-distribution perspective with a simple but effective self-supervised learning approach.We leverage the ability of masked autoencoders-self-supervised vision transformers trained on a reconstruction task-to learn in-distribution representations,here,the distribution of healthy colon images.We then perform out-of-distribution reconstruction and inference,with feature space standardisation to align the latent distribution of the diverse abnormal samples with the statistics of the healthy samples.We generate per-pixel anomaly scores for each image by calculating the difference between the input and reconstructed images and use this signal for out-of-distribution(i.e.,polyp)segmentation.Experimental results on six benchmarks show that our model has excellent segmentation performance and generalises across datasets.Our code is publicly available at https://github.com/GewelsJI/Polyp-OOD.
文摘Most polyp segmentation methods use convolutional neural networks(CNNs)as their backbone,leading to two key issues when exchanging information between the encoder and decoder:(1)taking into account the differences in contribution between different-level features,and(2)designing an effective mechanism for fusing these features.Unlike existing CNN-based methods,we adopt a transformer encoder,which learns more powerful and robust representations.In addition,considering the image acquisition influence and elusive properties of polyps,we introduce three standard modules,including a cascaded fusion module(CFM),a camouflage identification module(CIM),and a similarity aggregation module(SAM).Among these,the CFM is used to collect the semantic and location information of polyps from high-level features;the CIM is applied to capture polyp information disguised in low-level features,and the SAM extends the pixel features of the polyp area with high-level semantic position information to the entire polyp area,thereby effectively fusing cross-level features.The proposed model,named Polyp-PVT,effectively suppresses noises in the features and significantly improves their expressive capabilities.Extensive experiments on five widely adopted datasets show that the proposed model is more robust to various challenging situations(e.g.,appearance changes,small objects,and rotation)than existing representative methods.The proposed model is available at https://github.com/DengPingFan/Polyp-PVT.
基金supported by the National Natural Science Foundation of China(No.62072223)supported by the Natural Science Foundation of Fujian Province,China(No.2020J01131199)。
文摘We present the first comprehensive video polyp segmentation(VPS)study in the deep learning era.Over the years,developments in VPS are not moving forward with ease due to the lack of a large-scale dataset with fine-grained segmentation annotations.To address this issue,we first introduce a high-quality frame-by-frame annotated VPS dataset,named SUN-SEG,which contains 158690colonoscopy video frames from the well-known SUN-database.We provide additional annotation covering diverse types,i.e.,attribute,object mask,boundary,scribble,and polygon.Second,we design a simple but efficient baseline,named PNS+,which consists of a global encoder,a local encoder,and normalized self-attention(NS)blocks.The global and local encoders receive an anchor frame and multiple successive frames to extract long-term and short-term spatial-temporal representations,which are then progressively refined by two NS blocks.Extensive experiments show that PNS+achieves the best performance and real-time inference speed(170 fps),making it a promising solution for the VPS task.Third,we extensively evaluate 13 representative polyp/object segmentation models on our SUN-SEG dataset and provide attribute-based comparisons.Finally,we discuss several open issues and suggest possible research directions for the VPS community.Our project and dataset are publicly available at https://github.com/GewelsJI/VPS.