The Computational Visual Media(CVM)conference series is intended to provide a major international forum for exchanging novel research ideas and significant computational methods that either underpin or apply visual me...The Computational Visual Media(CVM)conference series is intended to provide a major international forum for exchanging novel research ideas and significant computational methods that either underpin or apply visual media.The primary goal is to promote the cross-disciplinary research to amalgamate aspects of computer graphics,computer vision,machine learning,image and video processing,visualization and geometric computing.The main topics of interest to CVM include classification,composition,retrieval,synthesis,cognition and understanding of visual media(e.g.,images,video,3D geometry).展开更多
Training a generic objectness measure to produce object proposals has recently become of significant interest. We observe that generic objects with well-defined closed boundaries can be detected by looking at the norm...Training a generic objectness measure to produce object proposals has recently become of significant interest. We observe that generic objects with well-defined closed boundaries can be detected by looking at the norm of gradients, with a suitable resizing of their corresponding image windows to a small fixed size. Based on this observation and computational reasons, we propose to resize the window to 8 × 8 and use the norm of the gradients as a simple 64 D feature to describe it, for explicitly training a generic objectness measure. We further show how the binarized version of this feature, namely binarized normed gradients(BING), can be used for efficient objectness estimation, which requires only a few atomic operations(e.g., add, bitwise shift, etc.). To improve localization quality of the proposals while maintaining efficiency, we propose a novel fast segmentation method and demonstrate its effectiveness for improving BING's localization performance, when used in multithresholding straddling expansion(MTSE) postprocessing. On the challenging PASCAL VOC2007 dataset, using 1000 proposals per image and intersectionover-union threshold of 0.5, our proposal method achieves a 95.6% object detection rate and 78.6% mean average best overlap in less than 0.005 second per image.展开更多
In this paper we present a novel automatic background substitution approach for live video. The objective of background substitution is to extract the foreground from the input video and then combine it with a new bac...In this paper we present a novel automatic background substitution approach for live video. The objective of background substitution is to extract the foreground from the input video and then combine it with a new background. In this paper, we use a color line model to improve the Gaussian mixture model in the background cut method to obtain a binary foreground segmentation result that is less sensitive to brightness differences. Based on the high quality binary segmentation results, we can automatically create a reliable trimap for alpha matting to refine the segmentation boundary. To make the composition result more realistic, an automatic foreground color adjustment step is added to make the foreground look consistent with the new background. Compared to previous approaches, our method can produce higher quality binary segmentation results, and to the best of our knowledge, this is the first time such an automatic and integrated background substitution system has been proposed which can run in real time, which makes it practical for everyday applications.展开更多
In this paper we propose an image magnification reconstruction method. In recent years many interpolation algorithms have been proposed for image magnification, but all of them have defects to some degree, such as jag...In this paper we propose an image magnification reconstruction method. In recent years many interpolation algorithms have been proposed for image magnification, but all of them have defects to some degree, such as jaggies and blurring. To solve these problems, we propose applying post-processing which consists of edge-aware level set diffusion and bilateral filtering. After the initial interpolation, the contours of the image are identified. Next, edge-aware level set diffusion is applied to these significant contours to remove the jaggies, followed by bilateral filtering at the same locations to reduce the blurring created by the initial interpolation and level set diffusion. These processes produce sharp contours without jaggies and preserve the details of the image. Results show that the overall RMS error of our method barely increases while the contour smoothness and sharpness are substantially improved.展开更多
The retrieval of non-rigid 3D shapes is an important task. A common technique is to simplify this problem to a rigid shape retrieval task by producing a bending-invariant canonical form for each shape in the dataset t...The retrieval of non-rigid 3D shapes is an important task. A common technique is to simplify this problem to a rigid shape retrieval task by producing a bending-invariant canonical form for each shape in the dataset to be searched. It is common for these techniques to attempt to "unbend" a shape by applying multidimensional scaling(MDS) to the distances between points on the mesh, but this leads to unwanted local shape distortions. We instead perform the unbending on the skeleton of the mesh, and use this to drive the deformation of the mesh itself. This leads to computational speed-up, and reduced distortion of local shape detail. We compare our method against other canonical forms: our experiments show that our method achieves state-of-the-art retrieval accuracy in a recent canonical forms benchmark, and only a small drop in retrieval accuracy over the state-of-the-art in a second recent benchmark, while being significantly faster.展开更多
Humans have the ability to perceive kinetic depth effects, i.e., to perceived 3 D shapes from 2 D projections of rotating 3 D objects. This process is based on a variety of visual cues such as lighting and shading eff...Humans have the ability to perceive kinetic depth effects, i.e., to perceived 3 D shapes from 2 D projections of rotating 3 D objects. This process is based on a variety of visual cues such as lighting and shading effects. However, when such cues are weak or missing, perception can become faulty, as demonstrated by the famous silhouette illusion example of the spinning dancer. Inspired by this, we establish objective and subjective evaluation models of rotated3 D objects by taking their projected 2 D images as input. We investigate five different cues: ambient luminance, shading, rotation speed, perspective, and color difference between the objects and background.In the objective evaluation model, we first apply3 D reconstruction algorithms to obtain an objective reconstruction quality metric, and then use quadratic stepwise regression analysis to determine weights of depth cues to represent the reconstruction quality. In the subjective evaluation model, we use a comprehensive user study to reveal correlations with reaction time and accuracy, rotation speed, and perspective. The two evaluation models are generally consistent, and potentially of benefit to inter-disciplinary research into visual perception and 3 D reconstruction.展开更多
Image colorization is a classic and important topic in computer graphics,where the aim is to add color to a monochromatic input image to produce a colorful result.In this survey,we present the history of colorization ...Image colorization is a classic and important topic in computer graphics,where the aim is to add color to a monochromatic input image to produce a colorful result.In this survey,we present the history of colorization research in chronological order and summarize popular algorithms in this field.Early work on colorization mostly focused on developing techniques to improve the colorization quality.In the last few years,researchers have considered more possibilities such as combining colorization with NLP(natural language processing)and focused more on industrial applications.To better control the color,various types of color control are designed,such as providing reference images or color-scribbles.We have created a taxonomy of the colorization methods according to the input type,divided into grayscale,sketch-based and hybrid.The pros and cons are discussed for each algorithm,and they are compared according to their main characteristics.Finally,we discuss how deep learning,and in particular Generative Adversarial Networks(GANs),has changed this field.展开更多
Recently,there has been an upsurge of activity in image-based non-photorealistic rendering(NPR),and in particular portrait image stylisation,due to the advent of neural style transfer(NST).However,the state of perform...Recently,there has been an upsurge of activity in image-based non-photorealistic rendering(NPR),and in particular portrait image stylisation,due to the advent of neural style transfer(NST).However,the state of performance evaluation in this field is poor,especially compared to the norms in the computer vision and machine learning communities.Unfortunately,the task of evaluating image stylisation is thus far not well defined,since it involves subjective,perceptual,and aesthetic aspects.To make progress towards a solution,this paper proposes a new structured,threelevel,benchmark dataset for the evaluation of stylised portrait images.Rigorous criteria were used for its construction,and its consistency was validated by user studies.Moreover,a new methodology has been developed for evaluating portrait stylisation algorithms,which makes use of the different benchmark levels as well as annotations provided by user studies regarding the characteristics of the faces.We perform evaluation for a wide variety of image stylisation methods(both portrait-specific and general purpose,and also both traditional NPR approaches and NST)using the new benchmark dataset.展开更多
文摘The Computational Visual Media(CVM)conference series is intended to provide a major international forum for exchanging novel research ideas and significant computational methods that either underpin or apply visual media.The primary goal is to promote the cross-disciplinary research to amalgamate aspects of computer graphics,computer vision,machine learning,image and video processing,visualization and geometric computing.The main topics of interest to CVM include classification,composition,retrieval,synthesis,cognition and understanding of visual media(e.g.,images,video,3D geometry).
基金supported by the National Natural Science Foundation of China (Nos. 61572264, 61620106008)
文摘Training a generic objectness measure to produce object proposals has recently become of significant interest. We observe that generic objects with well-defined closed boundaries can be detected by looking at the norm of gradients, with a suitable resizing of their corresponding image windows to a small fixed size. Based on this observation and computational reasons, we propose to resize the window to 8 × 8 and use the norm of the gradients as a simple 64 D feature to describe it, for explicitly training a generic objectness measure. We further show how the binarized version of this feature, namely binarized normed gradients(BING), can be used for efficient objectness estimation, which requires only a few atomic operations(e.g., add, bitwise shift, etc.). To improve localization quality of the proposals while maintaining efficiency, we propose a novel fast segmentation method and demonstrate its effectiveness for improving BING's localization performance, when used in multithresholding straddling expansion(MTSE) postprocessing. On the challenging PASCAL VOC2007 dataset, using 1000 proposals per image and intersectionover-union threshold of 0.5, our proposal method achieves a 95.6% object detection rate and 78.6% mean average best overlap in less than 0.005 second per image.
基金supported by the National HighTech R&D Program of China (Project No. 2012AA011903)the National Natural Science Foundation of China (Project No. 61373069)+1 种基金the Research Grant of Beijing Higher Institution Engineering Research CenterTsinghua–Tencent Joint Laboratory for Internet Innovation Technology
文摘In this paper we present a novel automatic background substitution approach for live video. The objective of background substitution is to extract the foreground from the input video and then combine it with a new background. In this paper, we use a color line model to improve the Gaussian mixture model in the background cut method to obtain a binary foreground segmentation result that is less sensitive to brightness differences. Based on the high quality binary segmentation results, we can automatically create a reliable trimap for alpha matting to refine the segmentation boundary. To make the composition result more realistic, an automatic foreground color adjustment step is added to make the foreground look consistent with the new background. Compared to previous approaches, our method can produce higher quality binary segmentation results, and to the best of our knowledge, this is the first time such an automatic and integrated background substitution system has been proposed which can run in real time, which makes it practical for everyday applications.
基金supported by the National Natural Science Foundation of China under Grant Nos.60703003 and 60641002
文摘In this paper we propose an image magnification reconstruction method. In recent years many interpolation algorithms have been proposed for image magnification, but all of them have defects to some degree, such as jaggies and blurring. To solve these problems, we propose applying post-processing which consists of edge-aware level set diffusion and bilateral filtering. After the initial interpolation, the contours of the image are identified. Next, edge-aware level set diffusion is applied to these significant contours to remove the jaggies, followed by bilateral filtering at the same locations to reduce the blurring created by the initial interpolation and level set diffusion. These processes produce sharp contours without jaggies and preserve the details of the image. Results show that the overall RMS error of our method barely increases while the contour smoothness and sharpness are substantially improved.
基金supported by EPSRC Research Grant No. EP/J02211X/1
文摘The retrieval of non-rigid 3D shapes is an important task. A common technique is to simplify this problem to a rigid shape retrieval task by producing a bending-invariant canonical form for each shape in the dataset to be searched. It is common for these techniques to attempt to "unbend" a shape by applying multidimensional scaling(MDS) to the distances between points on the mesh, but this leads to unwanted local shape distortions. We instead perform the unbending on the skeleton of the mesh, and use this to drive the deformation of the mesh itself. This leads to computational speed-up, and reduced distortion of local shape detail. We compare our method against other canonical forms: our experiments show that our method achieves state-of-the-art retrieval accuracy in a recent canonical forms benchmark, and only a small drop in retrieval accuracy over the state-of-the-art in a second recent benchmark, while being significantly faster.
基金supported by Tianjin NSF(Nos.18JCYBJC41300 and 18ZXZNGX00110)National Natural Science Foundation of China(No.61972216)the Open Project Program of the State Key Laboratory of Virtual Reality Technology and Systems,Beihang University(No.VRLAB2019B04)
文摘Humans have the ability to perceive kinetic depth effects, i.e., to perceived 3 D shapes from 2 D projections of rotating 3 D objects. This process is based on a variety of visual cues such as lighting and shading effects. However, when such cues are weak or missing, perception can become faulty, as demonstrated by the famous silhouette illusion example of the spinning dancer. Inspired by this, we establish objective and subjective evaluation models of rotated3 D objects by taking their projected 2 D images as input. We investigate five different cues: ambient luminance, shading, rotation speed, perspective, and color difference between the objects and background.In the objective evaluation model, we first apply3 D reconstruction algorithms to obtain an objective reconstruction quality metric, and then use quadratic stepwise regression analysis to determine weights of depth cues to represent the reconstruction quality. In the subjective evaluation model, we use a comprehensive user study to reveal correlations with reaction time and accuracy, rotation speed, and perspective. The two evaluation models are generally consistent, and potentially of benefit to inter-disciplinary research into visual perception and 3 D reconstruction.
基金This work was supported by grants from the National Nat-ural Science Foundation of China(No.61872440,No.62061136007 and No.62102403)the Beijing Municipal Natural Science Foun-dation for Distinguished Young Scholars(No.JQ21013)+1 种基金the Youth Innovation Promotion Association CAS,Royal Society Newton Advanced Fellowship(No.NAF\R2\192151)the Open Project Program of State Key Laboratory of Virtual Reality Technology and Systems,Beihang University(No.VRLAB2022C07).
文摘Image colorization is a classic and important topic in computer graphics,where the aim is to add color to a monochromatic input image to produce a colorful result.In this survey,we present the history of colorization research in chronological order and summarize popular algorithms in this field.Early work on colorization mostly focused on developing techniques to improve the colorization quality.In the last few years,researchers have considered more possibilities such as combining colorization with NLP(natural language processing)and focused more on industrial applications.To better control the color,various types of color control are designed,such as providing reference images or color-scribbles.We have created a taxonomy of the colorization methods according to the input type,divided into grayscale,sketch-based and hybrid.The pros and cons are discussed for each algorithm,and they are compared according to their main characteristics.Finally,we discuss how deep learning,and in particular Generative Adversarial Networks(GANs),has changed this field.
文摘Recently,there has been an upsurge of activity in image-based non-photorealistic rendering(NPR),and in particular portrait image stylisation,due to the advent of neural style transfer(NST).However,the state of performance evaluation in this field is poor,especially compared to the norms in the computer vision and machine learning communities.Unfortunately,the task of evaluating image stylisation is thus far not well defined,since it involves subjective,perceptual,and aesthetic aspects.To make progress towards a solution,this paper proposes a new structured,threelevel,benchmark dataset for the evaluation of stylised portrait images.Rigorous criteria were used for its construction,and its consistency was validated by user studies.Moreover,a new methodology has been developed for evaluating portrait stylisation algorithms,which makes use of the different benchmark levels as well as annotations provided by user studies regarding the characteristics of the faces.We perform evaluation for a wide variety of image stylisation methods(both portrait-specific and general purpose,and also both traditional NPR approaches and NST)using the new benchmark dataset.