Unlike existing fully-supervised approaches,we rethink colorectal polyp segmentation from an out-of-distribution perspective with a simple but effective self-supervised learning approach.We leverage the ability of mas...Unlike existing fully-supervised approaches,we rethink colorectal polyp segmentation from an out-of-distribution perspective with a simple but effective self-supervised learning approach.We leverage the ability of masked autoencoders-self-supervised vision transformers trained on a reconstruction task-to learn in-distribution representations,here,the distribution of healthy colon images.We then perform out-of-distribution reconstruction and inference,with feature space standardisation to align the latent distribution of the diverse abnormal samples with the statistics of the healthy samples.We generate per-pixel anomaly scores for each image by calculating the difference between the input and reconstructed images and use this signal for out-of-distribution(i.e.,polyp)segmentation.Experimental results on six benchmarks show that our model has excellent segmentation performance and generalises across datasets.Our code is publicly available at https://github.com/GewelsJI/Polyp-OOD.展开更多
The past decade has witnessed the impressive and steady development of single-modal AI technologies in several fields,thanks to the emergence of deep learning.Less studied,however,is multi-modal AI-commonly considered...The past decade has witnessed the impressive and steady development of single-modal AI technologies in several fields,thanks to the emergence of deep learning.Less studied,however,is multi-modal AI-commonly considered the next generation of AI-which utilizes complementary context concealed in different-modality inputs to improve performance.Humans naturally learn to form a global concept from multiple modalities(i.e.,sight,hearing,touch,smell,and taste),even when some are incomplete or missing.Thus,in addition to the two popular modalities(vision and language),other types of data such as depth,infrared information,and events are also important for multi-modal learning in real-world scenes.展开更多
文摘Unlike existing fully-supervised approaches,we rethink colorectal polyp segmentation from an out-of-distribution perspective with a simple but effective self-supervised learning approach.We leverage the ability of masked autoencoders-self-supervised vision transformers trained on a reconstruction task-to learn in-distribution representations,here,the distribution of healthy colon images.We then perform out-of-distribution reconstruction and inference,with feature space standardisation to align the latent distribution of the diverse abnormal samples with the statistics of the healthy samples.We generate per-pixel anomaly scores for each image by calculating the difference between the input and reconstructed images and use this signal for out-of-distribution(i.e.,polyp)segmentation.Experimental results on six benchmarks show that our model has excellent segmentation performance and generalises across datasets.Our code is publicly available at https://github.com/GewelsJI/Polyp-OOD.
文摘The past decade has witnessed the impressive and steady development of single-modal AI technologies in several fields,thanks to the emergence of deep learning.Less studied,however,is multi-modal AI-commonly considered the next generation of AI-which utilizes complementary context concealed in different-modality inputs to improve performance.Humans naturally learn to form a global concept from multiple modalities(i.e.,sight,hearing,touch,smell,and taste),even when some are incomplete or missing.Thus,in addition to the two popular modalities(vision and language),other types of data such as depth,infrared information,and events are also important for multi-modal learning in real-world scenes.