A machine learning based speech enhancement method is proposed to improve the intelligibility of whispered speech. A binary mask estimated by a two-class support vector machine (SVM) classifier is used to synthesize...A machine learning based speech enhancement method is proposed to improve the intelligibility of whispered speech. A binary mask estimated by a two-class support vector machine (SVM) classifier is used to synthesize the enhanced whisper. A novel noise robust feature called Gammatone feature cosine coefficients (GFCCs) extracted by an auditory periphery model is derived and used for the binary mask estimation. The intelligibility performance of the proposed method is evaluated and compared with the traditional speech enhancement methods. Objective and subjective evaluation results indicate that the proposed method can effectively improve the intelligibility of whispered speech which is contaminated by noise. Compared with the power subtract algorithm and the log-MMSE algorithm, both of which do not improve the intelligibility in lower signal-to-noise ratio (SNR) environments, the proposed method has good performance in improving the intelligibility of noisy whisper. Additionally, the intelligibility of the enhanced whispered speech using the proposed method also outperforms that of the corresponding unprocessed noisy whispered speech.展开更多
In the paper a referral system to assist the medical experts in the screening/referral of diabetic retinopathy is suggested. The system has been developed by a sequential use of different existing mathematical techniq...In the paper a referral system to assist the medical experts in the screening/referral of diabetic retinopathy is suggested. The system has been developed by a sequential use of different existing mathematical techniques. These techniques involve speeded up robust features(SURF), K-means clustering and visual dictionaries(VD). Three databases are mixed to test the working of the system when the sources are dissimilar. When experiments were performed an area under the curve(AUC) of 0.9343 was attained. The results acquired from the system are promising.展开更多
This article presents a good robust and real-time system scheme of the mobile robot obstacle detection and navigation, which principle of work is based on the feature descriptor SURF. In this scheme, firstly, the imag...This article presents a good robust and real-time system scheme of the mobile robot obstacle detection and navigation, which principle of work is based on the feature descriptor SURF. In this scheme, firstly, the image information of the mobile robot path was captured by the binocular camera; then the feature points were extracted and corresponding matched using SURF to the binocular images as the undetected obstacles; finally fixed the position of the objective by the parallax between the matching points combining with the binocular vision calibration model. Theoretical derivation and experimental results show that this scheme is more accurate for the detection and navigation of the interest points. It has fast matching speed and high accuracy and low error. So, it has certain practical effect and popularizing value for the mobile robot real-time obstacle avoidance and navigation.展开更多
Local invariant algorithm applied in downward-looking image registration,usually computes the camera's pose relative to visual landmarks.Generally,there are three requirements in the process of image registration whe...Local invariant algorithm applied in downward-looking image registration,usually computes the camera's pose relative to visual landmarks.Generally,there are three requirements in the process of image registration when using these approaches.First,the algorithm is apt to be influenced by illumination.Second,algorithm should have less computational complexity.Third,the depth information of images needs to be estimated without other sensors.This paper investigates a famous local invariant feature named speeded up robust feature(SURF),and proposes a highspeed and robust image registration and localization algorithm based on it.With supports from feature tracking and pose estimation methods,the proposed algorithm can compute camera poses under different conditions of scale,viewpoint and rotation so as to precisely localize object's position.At last,the study makes registration experiment by scale invariant feature transform(SIFT),SURF and the proposed algorithm,and designs a method to evaluate their performances.Furthermore,this study makes object retrieval test on remote sensing video.For there is big deformation on remote sensing frames,the registration algorithm absorbs the Kanade-Lucas-Tomasi(KLT) 3-D coplanar calibration feature tracker methods,which can localize interesting targets precisely and efficiently.The experimental results prove that the proposed method has a higher localization speed and lower localization error rate than traditional visual simultaneous localization and mapping(vSLAM) in a period of time.展开更多
Different devices in the recent era generated a vast amount of digital video.Generally,it has been seen in recent years that people are forging the video to use it as proof of evidence in the court of justice.Many kin...Different devices in the recent era generated a vast amount of digital video.Generally,it has been seen in recent years that people are forging the video to use it as proof of evidence in the court of justice.Many kinds of researches on forensic detection have been presented,and it provides less accuracy.This paper proposed a novel forgery detection technique in image frames of the videos using enhanced Convolutional Neural Network(CNN).In the initial stage,the input video is taken as of the dataset and then converts the videos into image frames.Next,perform pre-sampling using the Adaptive Rood Pattern Search(ARPS)algorithm intended for reducing the useless frames.In the next stage,perform preprocessing for enhancing the image frames.Then,face detection is done as of the image utilizing the Viola-Jones algorithm.Finally,the improved Crow Search Algorithm(ICSA)has been used to select the extorted features and inputted to the Enhanced Convolutional Neural Network(ECNN)classifier for detecting the forged image frames.The experimental outcome of the proposed system has achieved 97.21%accuracy compared to other existing methods.展开更多
A novel method based on interval temporal syntactic model was proposed to recognize human activities in video flow. The method is composed of two parts: feature extract and activities recognition. Trajectory shape des...A novel method based on interval temporal syntactic model was proposed to recognize human activities in video flow. The method is composed of two parts: feature extract and activities recognition. Trajectory shape descriptor, speeded up robust features(SURF) and histograms of optical flow(HOF) were proposed to represent human activities, which provide more exhaustive information to describe human activities on shape, structure and motion. In the process of recognition, a probabilistic latent semantic analysis model(PLSA) was used to recognize sample activities at the first step. Then, an interval temporal syntactic model, which combines the syntactic model with the interval algebra to model the temporal dependencies of activities explicitly, was introduced to recognize the complex activities with a time relationship. Experiments results show the effectiveness of the proposed method in comparison with other state-of-the-art methods on the public databases for the recognition of complex activities.展开更多
Robust and efficient vision systems are essential in such a way to support different kinds of autonomous robotic behaviors linked to the capability to interact with the surrounding environment, without relying on any ...Robust and efficient vision systems are essential in such a way to support different kinds of autonomous robotic behaviors linked to the capability to interact with the surrounding environment, without relying on any a priori knowledge. Within space missions, above all those involving rovers that have to explore planetary surfaces, vision can play a key role in the improvement of autonomous navigation functionalities: besides obstacle avoidance and hazard detection along the traveling, vision can in fact provide accurate motion estimation in order to constantly monitor all paths executed by the rover. The present work basically regards the development of an effective visual odometry system, focusing as much as possible on issues such as continuous operating mode, system speed and reliability.展开更多
It is illegal to spread and transmit pornographic images over internet,either in real or in artificial format.The traditional methods are designed to identify real pornographic images and they are less efficient in de...It is illegal to spread and transmit pornographic images over internet,either in real or in artificial format.The traditional methods are designed to identify real pornographic images and they are less efficient in dealing with artificial images.Therefore,criminals turn to release artificial pornographic images in some specific scenes,e.g.,in social networks.To efficiently identify artificial pornographic images,a novel bag-of-visual-words based approach is proposed in the work.In the bag-of-words(Bo W)framework,speeded-up robust feature(SURF)is adopted for feature extraction at first,then a visual vocabulary is constructed through K-means clustering and images are represented by an improved Bo W encoding method,and finally the visual words are fed into a learning machine for training and classification.Different from the traditional BoW method,the proposed method sets a weight on each visual word according to the number of features that each cluster contains.Moreover,a non-binary encoding method and cross-matching strategy are utilized to improve the discriminative power of the visual words.Experimental results indicate that the proposed method outperforms the traditional method.展开更多
This paper describes a brain-inspired simultaneous localization and mapping (SLAM) system using oriented features from accelerated segment test and rotated binary robust independent elementary (ORB) features of R...This paper describes a brain-inspired simultaneous localization and mapping (SLAM) system using oriented features from accelerated segment test and rotated binary robust independent elementary (ORB) features of RGB (red, green, blue) sensor for a mobile robot. The core SLAM system, dubbed RatSLAM, can construct a cognitive map using information of raw odometry and visual scenes in the path traveled. Different from existing RatSLAM system which only uses a simple vector to represent features of visual image, in this paper, we employ an efficient and very fast descriptor method, called ORB, to extract features from RCB images. Experiments show that these features are suitable to recognize the sequences of familiar visual scenes. Thus, while loop closure errors are detected, the descriptive features will help to modify the pose estimation by driving loop closure and localization in a map correction algorithm. Efficiency and robustness of our method are also demonstrated by comparing with different visual processing algorithms.展开更多
Long duration visual tracking of targets is quite challenging for computer vision, because the environments may be cluttered and distracting. Illumination variations and partial occlusions are two main difficulties in...Long duration visual tracking of targets is quite challenging for computer vision, because the environments may be cluttered and distracting. Illumination variations and partial occlusions are two main difficulties in real world visual tracking. Existing methods based on hostile appearance information cannot solve these problems effectively. This paper proposes a feature-based dynamic tracking approach that can track objects with partial occlusions and varying illumination. The method represents the tracked object by an invariant feature model. During the tracking, a new pyramid matching algorithm was used to match the object template with the observations to determine the observation likelihood. This matching is quite efficient in calculation and the spatial constraints among these features are also embedded. Instead of complicated optimization methods, the whole model is incorporated into a Bayesian filtering framework. The experiments on real world sequences demonstrate that the method can track objects accurately and robustly even with illumination variations and partial occlusions.展开更多
This work demonstrates the use of the nonlinear time-frequency distribution (NLTFD) of a discrete time energy operator (DTEO) based on amplitude modulation-frequency modulation demodulation techniques as a feature i...This work demonstrates the use of the nonlinear time-frequency distribution (NLTFD) of a discrete time energy operator (DTEO) based on amplitude modulation-frequency modulation demodulation techniques as a feature in speech recognition. The duration distribution based hidden Markov module in a speaker independent large vocabulary mandarin speech recognition system was reconstructed from the feature vectors in the front-end detection stage. The goal was to improve the performance of the existing system by combining new features to the baseline feature vector. This paper also deals with errors associated with using a pre-emphasis filter in the front end processing of the present scheme, which causes an increase in the noise energy at high frequencies above 4 kHz and in some cases degrades the recognition accuracy. The experimental results show that eliminating the pre-emphasis filters from the pre-processing stage and using NLTFD with compensated DTEO combined with Mel frequency cepstrum components give a 21.95% reduction in the relative error rate compared to the conventional technique with 25 candidates used in the test.展开更多
基金The National Natural Science Foundation of China (No.61231002,61273266,51075068,60872073,60975017, 61003131)the Ph.D.Programs Foundation of the Ministry of Education of China(No.20110092130004)+1 种基金the Science Foundation for Young Talents in the Educational Committee of Anhui Province(No. 2010SQRL018)the 211 Project of Anhui University(No.2009QN027B)
文摘A machine learning based speech enhancement method is proposed to improve the intelligibility of whispered speech. A binary mask estimated by a two-class support vector machine (SVM) classifier is used to synthesize the enhanced whisper. A novel noise robust feature called Gammatone feature cosine coefficients (GFCCs) extracted by an auditory periphery model is derived and used for the binary mask estimation. The intelligibility performance of the proposed method is evaluated and compared with the traditional speech enhancement methods. Objective and subjective evaluation results indicate that the proposed method can effectively improve the intelligibility of whispered speech which is contaminated by noise. Compared with the power subtract algorithm and the log-MMSE algorithm, both of which do not improve the intelligibility in lower signal-to-noise ratio (SNR) environments, the proposed method has good performance in improving the intelligibility of noisy whisper. Additionally, the intelligibility of the enhanced whispered speech using the proposed method also outperforms that of the corresponding unprocessed noisy whispered speech.
文摘In the paper a referral system to assist the medical experts in the screening/referral of diabetic retinopathy is suggested. The system has been developed by a sequential use of different existing mathematical techniques. These techniques involve speeded up robust features(SURF), K-means clustering and visual dictionaries(VD). Three databases are mixed to test the working of the system when the sources are dissimilar. When experiments were performed an area under the curve(AUC) of 0.9343 was attained. The results acquired from the system are promising.
文摘This article presents a good robust and real-time system scheme of the mobile robot obstacle detection and navigation, which principle of work is based on the feature descriptor SURF. In this scheme, firstly, the image information of the mobile robot path was captured by the binocular camera; then the feature points were extracted and corresponding matched using SURF to the binocular images as the undetected obstacles; finally fixed the position of the objective by the parallax between the matching points combining with the binocular vision calibration model. Theoretical derivation and experimental results show that this scheme is more accurate for the detection and navigation of the interest points. It has fast matching speed and high accuracy and low error. So, it has certain practical effect and popularizing value for the mobile robot real-time obstacle avoidance and navigation.
基金supported by the National Natural Science Foundation of China (60802043)the National Basic Research Program of China(973 Program) (2010CB327900)
文摘Local invariant algorithm applied in downward-looking image registration,usually computes the camera's pose relative to visual landmarks.Generally,there are three requirements in the process of image registration when using these approaches.First,the algorithm is apt to be influenced by illumination.Second,algorithm should have less computational complexity.Third,the depth information of images needs to be estimated without other sensors.This paper investigates a famous local invariant feature named speeded up robust feature(SURF),and proposes a highspeed and robust image registration and localization algorithm based on it.With supports from feature tracking and pose estimation methods,the proposed algorithm can compute camera poses under different conditions of scale,viewpoint and rotation so as to precisely localize object's position.At last,the study makes registration experiment by scale invariant feature transform(SIFT),SURF and the proposed algorithm,and designs a method to evaluate their performances.Furthermore,this study makes object retrieval test on remote sensing video.For there is big deformation on remote sensing frames,the registration algorithm absorbs the Kanade-Lucas-Tomasi(KLT) 3-D coplanar calibration feature tracker methods,which can localize interesting targets precisely and efficiently.The experimental results prove that the proposed method has a higher localization speed and lower localization error rate than traditional visual simultaneous localization and mapping(vSLAM) in a period of time.
文摘Different devices in the recent era generated a vast amount of digital video.Generally,it has been seen in recent years that people are forging the video to use it as proof of evidence in the court of justice.Many kinds of researches on forensic detection have been presented,and it provides less accuracy.This paper proposed a novel forgery detection technique in image frames of the videos using enhanced Convolutional Neural Network(CNN).In the initial stage,the input video is taken as of the dataset and then converts the videos into image frames.Next,perform pre-sampling using the Adaptive Rood Pattern Search(ARPS)algorithm intended for reducing the useless frames.In the next stage,perform preprocessing for enhancing the image frames.Then,face detection is done as of the image utilizing the Viola-Jones algorithm.Finally,the improved Crow Search Algorithm(ICSA)has been used to select the extorted features and inputted to the Enhanced Convolutional Neural Network(ECNN)classifier for detecting the forged image frames.The experimental outcome of the proposed system has achieved 97.21%accuracy compared to other existing methods.
基金Project(50808025)supported by the National Natural Science Foundation of ChinaProject(20090162110057)supported by the Doctoral Fund of Ministry of Education,China
文摘A novel method based on interval temporal syntactic model was proposed to recognize human activities in video flow. The method is composed of two parts: feature extract and activities recognition. Trajectory shape descriptor, speeded up robust features(SURF) and histograms of optical flow(HOF) were proposed to represent human activities, which provide more exhaustive information to describe human activities on shape, structure and motion. In the process of recognition, a probabilistic latent semantic analysis model(PLSA) was used to recognize sample activities at the first step. Then, an interval temporal syntactic model, which combines the syntactic model with the interval algebra to model the temporal dependencies of activities explicitly, was introduced to recognize the complex activities with a time relationship. Experiments results show the effectiveness of the proposed method in comparison with other state-of-the-art methods on the public databases for the recognition of complex activities.
文摘Robust and efficient vision systems are essential in such a way to support different kinds of autonomous robotic behaviors linked to the capability to interact with the surrounding environment, without relying on any a priori knowledge. Within space missions, above all those involving rovers that have to explore planetary surfaces, vision can play a key role in the improvement of autonomous navigation functionalities: besides obstacle avoidance and hazard detection along the traveling, vision can in fact provide accurate motion estimation in order to constantly monitor all paths executed by the rover. The present work basically regards the development of an effective visual odometry system, focusing as much as possible on issues such as continuous operating mode, system speed and reliability.
基金Projects(41001260,61173122,61573380) supported by the National Natural Science Foundation of ChinaProject(11JJ5044) supported by the Hunan Provincial Natural Science Foundation of China
文摘It is illegal to spread and transmit pornographic images over internet,either in real or in artificial format.The traditional methods are designed to identify real pornographic images and they are less efficient in dealing with artificial images.Therefore,criminals turn to release artificial pornographic images in some specific scenes,e.g.,in social networks.To efficiently identify artificial pornographic images,a novel bag-of-visual-words based approach is proposed in the work.In the bag-of-words(Bo W)framework,speeded-up robust feature(SURF)is adopted for feature extraction at first,then a visual vocabulary is constructed through K-means clustering and images are represented by an improved Bo W encoding method,and finally the visual words are fed into a learning machine for training and classification.Different from the traditional BoW method,the proposed method sets a weight on each visual word according to the number of features that each cluster contains.Moreover,a non-binary encoding method and cross-matching strategy are utilized to improve the discriminative power of the visual words.Experimental results indicate that the proposed method outperforms the traditional method.
基金supported by National Natural Science Foundation of China(No.61673283)
文摘This paper describes a brain-inspired simultaneous localization and mapping (SLAM) system using oriented features from accelerated segment test and rotated binary robust independent elementary (ORB) features of RGB (red, green, blue) sensor for a mobile robot. The core SLAM system, dubbed RatSLAM, can construct a cognitive map using information of raw odometry and visual scenes in the path traveled. Different from existing RatSLAM system which only uses a simple vector to represent features of visual image, in this paper, we employ an efficient and very fast descriptor method, called ORB, to extract features from RCB images. Experiments show that these features are suitable to recognize the sequences of familiar visual scenes. Thus, while loop closure errors are detected, the descriptive features will help to modify the pose estimation by driving loop closure and localization in a map correction algorithm. Efficiency and robustness of our method are also demonstrated by comparing with different visual processing algorithms.
文摘Long duration visual tracking of targets is quite challenging for computer vision, because the environments may be cluttered and distracting. Illumination variations and partial occlusions are two main difficulties in real world visual tracking. Existing methods based on hostile appearance information cannot solve these problems effectively. This paper proposes a feature-based dynamic tracking approach that can track objects with partial occlusions and varying illumination. The method represents the tracked object by an invariant feature model. During the tracking, a new pyramid matching algorithm was used to match the object template with the observations to determine the observation likelihood. This matching is quite efficient in calculation and the spatial constraints among these features are also embedded. Instead of complicated optimization methods, the whole model is incorporated into a Bayesian filtering framework. The experiments on real world sequences demonstrate that the method can track objects accurately and robustly even with illumination variations and partial occlusions.
基金the National High- Tech Research andDevelopm ent Program of China(No. 2 0 0 1AA114 0 71)
文摘This work demonstrates the use of the nonlinear time-frequency distribution (NLTFD) of a discrete time energy operator (DTEO) based on amplitude modulation-frequency modulation demodulation techniques as a feature in speech recognition. The duration distribution based hidden Markov module in a speaker independent large vocabulary mandarin speech recognition system was reconstructed from the feature vectors in the front-end detection stage. The goal was to improve the performance of the existing system by combining new features to the baseline feature vector. This paper also deals with errors associated with using a pre-emphasis filter in the front end processing of the present scheme, which causes an increase in the noise energy at high frequencies above 4 kHz and in some cases degrades the recognition accuracy. The experimental results show that eliminating the pre-emphasis filters from the pre-processing stage and using NLTFD with compensated DTEO combined with Mel frequency cepstrum components give a 21.95% reduction in the relative error rate compared to the conventional technique with 25 candidates used in the test.