In recent years we have witnessed a large interest in surface deformation techniques. This has been a reaction that can be attributed to the ability to develop techniques which are detail-preserving. Space deformation...In recent years we have witnessed a large interest in surface deformation techniques. This has been a reaction that can be attributed to the ability to develop techniques which are detail-preserving. Space deformation techniques, on the other hand, received less attention, but nevertheless they have many advantages over surface-based techniques. This paper explores the potential of these two approaches to deformation and discusses the opportunities that the fusion of the two may lead to.展开更多
Traditional image resizing methods usually work in pixel space and use various saliency measures.The challenge is to adjust the image shape while trying to preserve important content.In this paper we perform image res...Traditional image resizing methods usually work in pixel space and use various saliency measures.The challenge is to adjust the image shape while trying to preserve important content.In this paper we perform image resizing in feature space using the deep layers of a neural network containing rich important semantic information.We directly adjust the image feature maps,extracted from a pre-trained classification network,and reconstruct the resized image using neuralnetwork based optimization.This novel approach leverages the hierarchical encoding of the network,and in particular,the high-level discriminative power of its deeper layers,that can recognize semantic regions and objects,thereby allowing maintenance of their aspect ratios.Our use of reconstruction from deep features results in less noticeable artifacts than use of imagespace resizing operators.We evaluate our method on benchmarks,compare it to alternative approaches,and demonstrate its strengths on challenging images.展开更多
Humans regularly interact with their surrounding objects.Such interactions often result in strongly correlated motions between humans and the interacting objects.We thus ask:"Is it possible to infer object proper...Humans regularly interact with their surrounding objects.Such interactions often result in strongly correlated motions between humans and the interacting objects.We thus ask:"Is it possible to infer object properties from skeletal motion alone,even without seeing the interacting object itself?"In this paper,we present a fine-grained action recognition method that learns to infer such latent object properties from human interaction motion alone.This inference allows us to disentangle the motion from the object property and transfer object properties to a given motion.We collected a large number of videos and 3 D skeletal motions of performing actors using an inertial motion capture device.We analyzed similar actions and learned subtle differences between them to reveal latent properties of the interacting objects.In particular,we learned to identify the interacting object,by estimating its weight,or its spillability.Our results clearly demonstrate that motions and interacting objects are highly correlated and that related object latent properties can be inferred from 3 D skeleton sequences alone,leading to new synthesis possibilities for motions involving human interaction.Our dataset is available at http://vcc.szu.edu.cn/research/2020/IT.html.展开更多
In this paper we introduce a video post-processing method that enhances the rhythm of a dancing performance,in the sense that the dancing movements are more in time to the beat of the music.The dancing performance as ...In this paper we introduce a video post-processing method that enhances the rhythm of a dancing performance,in the sense that the dancing movements are more in time to the beat of the music.The dancing performance as observed in a video is analyzed and segmented into motion intervals delimited by motion beats.We present an image-space method to extract the motion beats of a video by detecting frames at which there is a significant change in direction or motion stops.The motion beats are then synchronized with the music beats such that as many beats as possible are matched with as little as possible time-warping distortion to the video.We show two applications for this cross-media synchronization:one where a given dance performance is enhanced to be better synchronized with its original music,and one where a given dance video is automatically adapted to be synchronized with different music.展开更多
A metric for natural image patches is an important tool for analyzing images. An efficient means of learning one is to train a deep network to map an image patch to a vector space, in which the Euclidean distance refl...A metric for natural image patches is an important tool for analyzing images. An efficient means of learning one is to train a deep network to map an image patch to a vector space, in which the Euclidean distance reflects patch similarity. Previous attempts learned such an embedding in a supervised manner,requiring the availability of many annotated images. In this paper, we present an unsupervised embedding of natural image patches, avoiding the need for annotated images. The key idea is that the similarity of two patches can be learned from the prevalence of their spatial proximity in natural images. Clearly, relying on this simple principle, many spatially nearby pairs are outliers. However, as we show, these outliers do not harm the convergence of the metric learning. We show that our unsupervised embedding approach is more effective than a supervised one or one that uses deep patch representations. Moreover, we show that it naturally lends itself to an efficient self-supervised domain adaptation technique onto a target domain that contains a common foreground object.展开更多
Visualizing high-dimensional data on a 2D canvas is generally challenging.It becomes significantly more difficult when multiple time-steps are to be presented,as the visual clutter quickly increases.Moreover,the chall...Visualizing high-dimensional data on a 2D canvas is generally challenging.It becomes significantly more difficult when multiple time-steps are to be presented,as the visual clutter quickly increases.Moreover,the challenge to perceive the significant temporal evolution is even greater.In this paper,we present a method to plot temporal high-dimensional data in a static scatterplot;it uses the established PCA technique to project data from multiple time-steps.The key idea is to extend each individual displacement prior to applying PCA,so as to skew the projection process,and to set a projection plane that balances the directions of temporal change and spatial variance.We present numerous examples and various visual cues to highlight the data trajectories,and demonstrate the effectiveness of the method for visualizing temporal data.展开更多
We introduce ExquiMo, a collaborative modeling tool which enables novice users to work together to generate interesting, and even creative, 3D shapes. Inspired by an Exquisite Corpse gameplay, our tool allocates disti...We introduce ExquiMo, a collaborative modeling tool which enables novice users to work together to generate interesting, and even creative, 3D shapes. Inspired by an Exquisite Corpse gameplay, our tool allocates distinct parts of a shape to multiple players who model the assigned parts in a sequence. Our approach is motivated by the understanding that effective surprise leads to creative outcomes. Hence, to maintain the surprise factor of the output, we conceal the previously modeled parts from the most recent player. Part designs from individual players are fused together to produce an often unexpected and novel end result. We investigate the effectiveness of collaboration on the output designs by conducting a sequence of user studies to validate the hypotheses formed based on our research questions. Results of the user studies are supportive of our hypotheses that multi-user collaborative 3D modeling via ExquiMo tends to lead to more creative novice designs according to the commonly used criteria for creativity: novelty and surprise.展开更多
In image-to-image translation the goal is to learn a mapping from one image domain to another.In the case of supervised approaches the mapping is learned from paired samples.However,collecting large sets of image pair...In image-to-image translation the goal is to learn a mapping from one image domain to another.In the case of supervised approaches the mapping is learned from paired samples.However,collecting large sets of image pairs is often either prohibitively expensive or not possible.As a result,in recent years more attention has been given to techniques that learn the mapping from unpaired sets.In our work,we show that injecting implicit pairs into unpaired sets strengthens the mapping between the two domains,improves the compatibility of their distributions,and leads to performance boosting of unsupervised techniques by up to 12%across several measurements.The competence of the implicit pairs is further displayed with the use of pseudo-pairs,i.e.,paired samples which only approximate a real pair.We demonstrate the effect of the approximated implicit samples on image-to-image translation problems,where such pseudo-pairs may be synthesized in one direction,but not in the other.We further show that pseudo-pairs are significantly more effective as implicit pairs in an unpaired setting,than directly using them explicitly in a paired setting.展开更多
Understanding semantic similarity among images is the core of a wide range of computer graphics and computer vision applications.However,the visual context of images is often ambiguous as images that can be perceived ...Understanding semantic similarity among images is the core of a wide range of computer graphics and computer vision applications.However,the visual context of images is often ambiguous as images that can be perceived with emphasis on different attributes.In this paper,we present a method for learning the semantic visual similarity among images,inferring their latent attributes and embedding them into multi-spaces corresponding to each latent attribute.We consider the multi-embedding problem as an optimization function that evaluates the embedded distances with respect to qualitative crowdsourced clusterings.The key idea of our approach is to collect and embed qualitative pairwise tuples that share the same attributes in clusters.To ensure similarity attribute sharing among multiple measures,image classification clusters are presented to,and solved by users.The collected image clusters are then converted into groups of tuples,which are fed into our group optimization algorithm that jointly infers the attribute similarity and multi-attribute embedding.Our multi-attribute embedding allows retrieving similar objects in different attribute spaces.Experimental results show that our approach outperforms state-of-the-art multi-embedding approaches on various datasets,and demonstrate the usage of the multi-attribute embedding in image retrieval application.展开更多
文摘In recent years we have witnessed a large interest in surface deformation techniques. This has been a reaction that can be attributed to the ability to develop techniques which are detail-preserving. Space deformation techniques, on the other hand, received less attention, but nevertheless they have many advantages over surface-based techniques. This paper explores the potential of these two approaches to deformation and discusses the opportunities that the fusion of the two may lead to.
文摘Traditional image resizing methods usually work in pixel space and use various saliency measures.The challenge is to adjust the image shape while trying to preserve important content.In this paper we perform image resizing in feature space using the deep layers of a neural network containing rich important semantic information.We directly adjust the image feature maps,extracted from a pre-trained classification network,and reconstruct the resized image using neuralnetwork based optimization.This novel approach leverages the hierarchical encoding of the network,and in particular,the high-level discriminative power of its deeper layers,that can recognize semantic regions and objects,thereby allowing maintenance of their aspect ratios.Our use of reconstruction from deep features results in less noticeable artifacts than use of imagespace resizing operators.We evaluate our method on benchmarks,compare it to alternative approaches,and demonstrate its strengths on challenging images.
基金supported in part by Shenzhen Innovation Program(JCYJ20180305125709986)National Natural Science Foundation of China(61861130365,61761146002)+1 种基金GD Science and Technology Program(2020A0505100064,2015A030312015)DEGP Key Project(2018KZDXM058)。
文摘Humans regularly interact with their surrounding objects.Such interactions often result in strongly correlated motions between humans and the interacting objects.We thus ask:"Is it possible to infer object properties from skeletal motion alone,even without seeing the interacting object itself?"In this paper,we present a fine-grained action recognition method that learns to infer such latent object properties from human interaction motion alone.This inference allows us to disentangle the motion from the object property and transfer object properties to a given motion.We collected a large number of videos and 3 D skeletal motions of performing actors using an inertial motion capture device.We analyzed similar actions and learned subtle differences between them to reveal latent properties of the interacting objects.In particular,we learned to identify the interacting object,by estimating its weight,or its spillability.Our results clearly demonstrate that motions and interacting objects are highly correlated and that related object latent properties can be inferred from 3 D skeleton sequences alone,leading to new synthesis possibilities for motions involving human interaction.Our dataset is available at http://vcc.szu.edu.cn/research/2020/IT.html.
文摘In this paper we introduce a video post-processing method that enhances the rhythm of a dancing performance,in the sense that the dancing movements are more in time to the beat of the music.The dancing performance as observed in a video is analyzed and segmented into motion intervals delimited by motion beats.We present an image-space method to extract the motion beats of a video by detecting frames at which there is a significant change in direction or motion stops.The motion beats are then synchronized with the music beats such that as many beats as possible are matched with as little as possible time-warping distortion to the video.We show two applications for this cross-media synchronization:one where a given dance performance is enhanced to be better synchronized with its original music,and one where a given dance video is automatically adapted to be synchronized with different music.
文摘A metric for natural image patches is an important tool for analyzing images. An efficient means of learning one is to train a deep network to map an image patch to a vector space, in which the Euclidean distance reflects patch similarity. Previous attempts learned such an embedding in a supervised manner,requiring the availability of many annotated images. In this paper, we present an unsupervised embedding of natural image patches, avoiding the need for annotated images. The key idea is that the similarity of two patches can be learned from the prevalence of their spatial proximity in natural images. Clearly, relying on this simple principle, many spatially nearby pairs are outliers. However, as we show, these outliers do not harm the convergence of the metric learning. We show that our unsupervised embedding approach is more effective than a supervised one or one that uses deep patch representations. Moreover, we show that it naturally lends itself to an efficient self-supervised domain adaptation technique onto a target domain that contains a common foreground object.
基金the Israel Science Foundation(Grant No.2366/16 and 2472/17)。
文摘Visualizing high-dimensional data on a 2D canvas is generally challenging.It becomes significantly more difficult when multiple time-steps are to be presented,as the visual clutter quickly increases.Moreover,the challenge to perceive the significant temporal evolution is even greater.In this paper,we present a method to plot temporal high-dimensional data in a static scatterplot;it uses the established PCA technique to project data from multiple time-steps.The key idea is to extend each individual displacement prior to applying PCA,so as to skew the projection process,and to set a projection plane that balances the directions of temporal change and spatial variance.We present numerous examples and various visual cues to highlight the data trajectories,and demonstrate the effectiveness of the method for visualizing temporal data.
文摘We introduce ExquiMo, a collaborative modeling tool which enables novice users to work together to generate interesting, and even creative, 3D shapes. Inspired by an Exquisite Corpse gameplay, our tool allocates distinct parts of a shape to multiple players who model the assigned parts in a sequence. Our approach is motivated by the understanding that effective surprise leads to creative outcomes. Hence, to maintain the surprise factor of the output, we conceal the previously modeled parts from the most recent player. Part designs from individual players are fused together to produce an often unexpected and novel end result. We investigate the effectiveness of collaboration on the output designs by conducting a sequence of user studies to validate the hypotheses formed based on our research questions. Results of the user studies are supportive of our hypotheses that multi-user collaborative 3D modeling via ExquiMo tends to lead to more creative novice designs according to the commonly used criteria for creativity: novelty and surprise.
文摘In image-to-image translation the goal is to learn a mapping from one image domain to another.In the case of supervised approaches the mapping is learned from paired samples.However,collecting large sets of image pairs is often either prohibitively expensive or not possible.As a result,in recent years more attention has been given to techniques that learn the mapping from unpaired sets.In our work,we show that injecting implicit pairs into unpaired sets strengthens the mapping between the two domains,improves the compatibility of their distributions,and leads to performance boosting of unsupervised techniques by up to 12%across several measurements.The competence of the implicit pairs is further displayed with the use of pseudo-pairs,i.e.,paired samples which only approximate a real pair.We demonstrate the effect of the approximated implicit samples on image-to-image translation problems,where such pseudo-pairs may be synthesized in one direction,but not in the other.We further show that pseudo-pairs are significantly more effective as implicit pairs in an unpaired setting,than directly using them explicitly in a paired setting.
基金This study was funded by National Key Research&Develop-ment Plan of China(No.2016YFB1001404)National Natural Science Foundation of China(No.61602273).
文摘Understanding semantic similarity among images is the core of a wide range of computer graphics and computer vision applications.However,the visual context of images is often ambiguous as images that can be perceived with emphasis on different attributes.In this paper,we present a method for learning the semantic visual similarity among images,inferring their latent attributes and embedding them into multi-spaces corresponding to each latent attribute.We consider the multi-embedding problem as an optimization function that evaluates the embedded distances with respect to qualitative crowdsourced clusterings.The key idea of our approach is to collect and embed qualitative pairwise tuples that share the same attributes in clusters.To ensure similarity attribute sharing among multiple measures,image classification clusters are presented to,and solved by users.The collected image clusters are then converted into groups of tuples,which are fed into our group optimization algorithm that jointly infers the attribute similarity and multi-attribute embedding.Our multi-attribute embedding allows retrieving similar objects in different attribute spaces.Experimental results show that our approach outperforms state-of-the-art multi-embedding approaches on various datasets,and demonstrate the usage of the multi-attribute embedding in image retrieval application.