With the recent tremendous advances of computer graphics rendering and image editing technologies,computergenerated fake images,which in general do not reflect what happens in the reality,can now easily deceive the in...With the recent tremendous advances of computer graphics rendering and image editing technologies,computergenerated fake images,which in general do not reflect what happens in the reality,can now easily deceive the inspection of human visual system.In this work,we propose a convolutional neural network(CNN)-based model to distinguish computergenerated(CG)images from natural images(NIs)with channel and pixel correlation.The key component of the proposed CNN architecture is a self-coding module that takes the color images as input to extract the correlation between color channels explicitly.Unlike previous approaches that directly apply CNN to solve this problem,we consider the generality of the network(or subnetwork),i.e.,the newly introduced hybrid correlation module can be directly combined with existing CNN models for enhancing the discrimination capacity of original networks.Experimental results demonstrate that the proposed network outperforms state-of-the-art methods in terms of classification performance.We also show that the newly introduced hybrid correlation module can improve the classification accuracy of different CNN architectures.展开更多
We propose a feature-fusion network for pose estimation directly from RGB images without any depth information in this study.First,we introduce a two-stream architecture consisting of segmentation and regression strea...We propose a feature-fusion network for pose estimation directly from RGB images without any depth information in this study.First,we introduce a two-stream architecture consisting of segmentation and regression streams.The segmentation stream processes the spatial embedding features and obtains the corresponding image crop.These features are further coupled with the image crop in the fusion network.Second,we use an efficient perspective-n-point(E-PnP)algorithm in the regression stream to extract robust spatial features between 3D and 2D keypoints.Finally,we perform iterative refinement with an end-to-end mechanism to improve the estimation performance.We conduct experiments on two public datasets of YCB-Video and the challenging Occluded-LineMOD.The results show that our method outperforms state-of-the-art approaches in both the speed and the accuracy.展开更多
基金supported by the National Key Research and Development Program of China under Grant No.2019YFB2204104the Beijing Natural Science Foundation of China under Grant No.L182059+1 种基金the National Natural Science Foundation of China under Grant Nos.61772523,61620106003,and 61802406,Alibaba Group through Alibaba Innovative Research Programthe Joint Open Research Fund Program of State Key Laboratory of Hydroscience and Engineering and Tsinghua-Ningxia Yinchuan Joint Institute of Internet of Waters on Digital Water Governance.
文摘With the recent tremendous advances of computer graphics rendering and image editing technologies,computergenerated fake images,which in general do not reflect what happens in the reality,can now easily deceive the inspection of human visual system.In this work,we propose a convolutional neural network(CNN)-based model to distinguish computergenerated(CG)images from natural images(NIs)with channel and pixel correlation.The key component of the proposed CNN architecture is a self-coding module that takes the color images as input to extract the correlation between color channels explicitly.Unlike previous approaches that directly apply CNN to solve this problem,we consider the generality of the network(or subnetwork),i.e.,the newly introduced hybrid correlation module can be directly combined with existing CNN models for enhancing the discrimination capacity of original networks.Experimental results demonstrate that the proposed network outperforms state-of-the-art methods in terms of classification performance.We also show that the newly introduced hybrid correlation module can improve the classification accuracy of different CNN architectures.
基金the National Key Research and Development Program of China under Grant No.2021YFB1715900the National Natural Science Foundation of China under Grant Nos.12022117 and 61802406+2 种基金the Beijing Natural Science Foundation under Grant No.Z190004the Beijing Advanced Discipline Fund under Grant No.115200S001Alibaba Group through Alibaba Innovative Research Program.
文摘We propose a feature-fusion network for pose estimation directly from RGB images without any depth information in this study.First,we introduce a two-stream architecture consisting of segmentation and regression streams.The segmentation stream processes the spatial embedding features and obtains the corresponding image crop.These features are further coupled with the image crop in the fusion network.Second,we use an efficient perspective-n-point(E-PnP)algorithm in the regression stream to extract robust spatial features between 3D and 2D keypoints.Finally,we perform iterative refinement with an end-to-end mechanism to improve the estimation performance.We conduct experiments on two public datasets of YCB-Video and the challenging Occluded-LineMOD.The results show that our method outperforms state-of-the-art approaches in both the speed and the accuracy.