Generally, there are two approaches for solving the problem of human pose estimation from monocular images. One is the learning-based approach, and the other is the model-based approach. The former method can estimate...Generally, there are two approaches for solving the problem of human pose estimation from monocular images. One is the learning-based approach, and the other is the model-based approach. The former method can estimate the poses rapidly but has the disadvantage of low estimation accuracy. While the latter method is able to accurately estimate the poses, its computational cost is high. In this paper, we propose a method to integrate the learning-based and model-based approaches to improve the estimation precision. In the learning-based approach, we use regression analysis to model the mapping from visual observations to human poses. In the model-based approach, a particle filter is employed on the results of regression analysis. To solve the curse of the dimensionality problem, the eigenspace of each motion is learned using Principal Component Analysis (PCA). Finally, the proposed method was estimated using the CMU Graphics Lab Motion Capture Database. The RMS error of human joint angles was 6.2 degrees using our method, an improvement of up to 0.9 degrees compared to the method without eigenspaces.展开更多
This paper introduces a mixed music analysis method using extended specmurt analysis. Conventional specmurt can only analyze a multi-pitch music signal from a single instrument and cannot analyze a mixed music signal ...This paper introduces a mixed music analysis method using extended specmurt analysis. Conventional specmurt can only analyze a multi-pitch music signal from a single instrument and cannot analyze a mixed music signal that has several different types of instruments being played at the same time. To analyze a mixed music signal, extended specmurt is proposed. We regard the observed spectrum extracted from the mixed music as the summation of the observed spectra corresponding to each instrument. The mixed music has as many unknown fundamental frequency distributions as the number of instruments since the observed spectrum of a single instrument can be expressed as a convolution of the common harmonic structure and the fundamental frequency distribution. The relation among the observed spectrum, the common harmonic structure and the fundamental frequency distribution is transformed into a matrix representation in order to obtain the unknown fundamental frequency distributions. The equation is called extended specmurt, and the matrix of unknown components can be obtained by using a pseudo inverse matrix. The experimental result shows the effectiveness of the proposed method.展开更多
This paper describes a method for reducing sudden noise using noise detection and classification methods, and noise power estimation. Sudden noise detection and classification have been dealt with in our previous stud...This paper describes a method for reducing sudden noise using noise detection and classification methods, and noise power estimation. Sudden noise detection and classification have been dealt with in our previous study. In this paper, GMM-based noise reduction is performed using the detection and classification results. As a result of classification, we can determine the kind of noise we are dealing with, but the power is unknown. In this paper, this problem is solved by combining an estimation of noise power with the noise reduction method. In our experiments, the proposed method achieved good performance for recognition of utterances overlapped by sudden noises.展开更多
In recent years, Convolutional Neural Networks (CNNs) have enabled unprecedented progress on a wide range of computer vision tasks. However, training large CNNs is a resource-intensive task that requires specialized G...In recent years, Convolutional Neural Networks (CNNs) have enabled unprecedented progress on a wide range of computer vision tasks. However, training large CNNs is a resource-intensive task that requires specialized Graphical Processing Units (GPU) and highly optimized implementations to get optimal performance from the hardware. GPU memory is a major bottleneck of the CNN training procedure, limiting the size of both inputs and model architectures. In this paper, we propose to alleviate this memory bottleneck by leveraging an under-utilized resource of modern systems: the device to host bandwidth. Our method, termed CPU offloading, works by transferring hidden activations to the CPU upon computation, in order to free GPU memory for upstream layer computations during the forward pass. These activations are then transferred back to the GPU as needed by the gradient computations of the backward pass. The key challenge to our method is to efficiently overlap data transfers and computations in order to minimize wall time overheads induced by the additional data transfers. On a typical work station with a Nvidia Titan X GPU, we show that our method compares favorably to gradient checkpointing as we are able to reduce the memory consumption of training a VGG19 model by 35% with a minimal additional wall time overhead of 21%. Further experiments detail the impact of the different optimization tricks we propose. Our method is orthogonal to other techniques for memory reduction such as quantization and sparsification so that they can easily be combined for further optimizations.展开更多
文摘Generally, there are two approaches for solving the problem of human pose estimation from monocular images. One is the learning-based approach, and the other is the model-based approach. The former method can estimate the poses rapidly but has the disadvantage of low estimation accuracy. While the latter method is able to accurately estimate the poses, its computational cost is high. In this paper, we propose a method to integrate the learning-based and model-based approaches to improve the estimation precision. In the learning-based approach, we use regression analysis to model the mapping from visual observations to human poses. In the model-based approach, a particle filter is employed on the results of regression analysis. To solve the curse of the dimensionality problem, the eigenspace of each motion is learned using Principal Component Analysis (PCA). Finally, the proposed method was estimated using the CMU Graphics Lab Motion Capture Database. The RMS error of human joint angles was 6.2 degrees using our method, an improvement of up to 0.9 degrees compared to the method without eigenspaces.
文摘This paper introduces a mixed music analysis method using extended specmurt analysis. Conventional specmurt can only analyze a multi-pitch music signal from a single instrument and cannot analyze a mixed music signal that has several different types of instruments being played at the same time. To analyze a mixed music signal, extended specmurt is proposed. We regard the observed spectrum extracted from the mixed music as the summation of the observed spectra corresponding to each instrument. The mixed music has as many unknown fundamental frequency distributions as the number of instruments since the observed spectrum of a single instrument can be expressed as a convolution of the common harmonic structure and the fundamental frequency distribution. The relation among the observed spectrum, the common harmonic structure and the fundamental frequency distribution is transformed into a matrix representation in order to obtain the unknown fundamental frequency distributions. The equation is called extended specmurt, and the matrix of unknown components can be obtained by using a pseudo inverse matrix. The experimental result shows the effectiveness of the proposed method.
文摘This paper describes a method for reducing sudden noise using noise detection and classification methods, and noise power estimation. Sudden noise detection and classification have been dealt with in our previous study. In this paper, GMM-based noise reduction is performed using the detection and classification results. As a result of classification, we can determine the kind of noise we are dealing with, but the power is unknown. In this paper, this problem is solved by combining an estimation of noise power with the noise reduction method. In our experiments, the proposed method achieved good performance for recognition of utterances overlapped by sudden noises.
文摘In recent years, Convolutional Neural Networks (CNNs) have enabled unprecedented progress on a wide range of computer vision tasks. However, training large CNNs is a resource-intensive task that requires specialized Graphical Processing Units (GPU) and highly optimized implementations to get optimal performance from the hardware. GPU memory is a major bottleneck of the CNN training procedure, limiting the size of both inputs and model architectures. In this paper, we propose to alleviate this memory bottleneck by leveraging an under-utilized resource of modern systems: the device to host bandwidth. Our method, termed CPU offloading, works by transferring hidden activations to the CPU upon computation, in order to free GPU memory for upstream layer computations during the forward pass. These activations are then transferred back to the GPU as needed by the gradient computations of the backward pass. The key challenge to our method is to efficiently overlap data transfers and computations in order to minimize wall time overheads induced by the additional data transfers. On a typical work station with a Nvidia Titan X GPU, we show that our method compares favorably to gradient checkpointing as we are able to reduce the memory consumption of training a VGG19 model by 35% with a minimal additional wall time overhead of 21%. Further experiments detail the impact of the different optimization tricks we propose. Our method is orthogonal to other techniques for memory reduction such as quantization and sparsification so that they can easily be combined for further optimizations.