A machine learning based speech enhancement method is proposed to improve the intelligibility of whispered speech. A binary mask estimated by a two-class support vector machine (SVM) classifier is used to synthesize...A machine learning based speech enhancement method is proposed to improve the intelligibility of whispered speech. A binary mask estimated by a two-class support vector machine (SVM) classifier is used to synthesize the enhanced whisper. A novel noise robust feature called Gammatone feature cosine coefficients (GFCCs) extracted by an auditory periphery model is derived and used for the binary mask estimation. The intelligibility performance of the proposed method is evaluated and compared with the traditional speech enhancement methods. Objective and subjective evaluation results indicate that the proposed method can effectively improve the intelligibility of whispered speech which is contaminated by noise. Compared with the power subtract algorithm and the log-MMSE algorithm, both of which do not improve the intelligibility in lower signal-to-noise ratio (SNR) environments, the proposed method has good performance in improving the intelligibility of noisy whisper. Additionally, the intelligibility of the enhanced whispered speech using the proposed method also outperforms that of the corresponding unprocessed noisy whispered speech.展开更多
This paper proposes a novel robust image watermarking scheme for digital images using local invariant features and Independent Component Analysis (ICA). Most present watermarking algorithms are unable to resist geom...This paper proposes a novel robust image watermarking scheme for digital images using local invariant features and Independent Component Analysis (ICA). Most present watermarking algorithms are unable to resist geometric distortions that desynchronize the location. The method we propose here is robust to geometric attacks. In order to resist geometric distortions, we use a local invariant feature of the image called the scale invariant feature transform, which is invariant to translation and scaling distortions. The watermark is inserted into the circular patches generated by scale-invariant key point extractor. Rotation invariance is achieved using the translation property of the polar-mapped circular patches. Our method belongs to the blind watermark category, because we use Independent Component Analysis for detection that does not need the original image during detection. Experimental results show that our method is robust against geometric distortion attacks as well as signal-processing attacks.展开更多
Emotion recognition based on facial expressions is one of the most critical elements of human-machine interfaces.Most conventional methods for emotion recognition using facial expressions use the entire facial image t...Emotion recognition based on facial expressions is one of the most critical elements of human-machine interfaces.Most conventional methods for emotion recognition using facial expressions use the entire facial image to extract features and then recognize specific emotions through a pre-trained model.In contrast,this paper proposes a novel feature vector extraction method using the Euclidean distance between the landmarks changing their positions according to facial expressions,especially around the eyes,eyebrows,nose,andmouth.Then,we apply a newclassifier using an ensemble network to increase emotion recognition accuracy.The emotion recognition performance was compared with the conventional algorithms using public databases.The results indicated that the proposed method achieved higher accuracy than the traditional based on facial expressions for emotion recognition.In particular,our experiments with the FER2013 database show that our proposed method is robust to lighting conditions and backgrounds,with an average of 25% higher performance than previous studies.Consequently,the proposed method is expected to recognize facial expressions,especially fear and anger,to help prevent severe accidents by detecting security-related or dangerous actions in advance.展开更多
In the paper a referral system to assist the medical experts in the screening/referral of diabetic retinopathy is suggested. The system has been developed by a sequential use of different existing mathematical techniq...In the paper a referral system to assist the medical experts in the screening/referral of diabetic retinopathy is suggested. The system has been developed by a sequential use of different existing mathematical techniques. These techniques involve speeded up robust features(SURF), K-means clustering and visual dictionaries(VD). Three databases are mixed to test the working of the system when the sources are dissimilar. When experiments were performed an area under the curve(AUC) of 0.9343 was attained. The results acquired from the system are promising.展开更多
This article presents a good robust and real-time system scheme of the mobile robot obstacle detection and navigation, which principle of work is based on the feature descriptor SURF. In this scheme, firstly, the imag...This article presents a good robust and real-time system scheme of the mobile robot obstacle detection and navigation, which principle of work is based on the feature descriptor SURF. In this scheme, firstly, the image information of the mobile robot path was captured by the binocular camera; then the feature points were extracted and corresponding matched using SURF to the binocular images as the undetected obstacles; finally fixed the position of the objective by the parallax between the matching points combining with the binocular vision calibration model. Theoretical derivation and experimental results show that this scheme is more accurate for the detection and navigation of the interest points. It has fast matching speed and high accuracy and low error. So, it has certain practical effect and popularizing value for the mobile robot real-time obstacle avoidance and navigation.展开更多
In this paper,we build a remote-sensing satellite imagery priori-information data set,and propose an approach to evaluate the robustness of remote-sensing image feature detectors.The building TH Priori-Information(TPI...In this paper,we build a remote-sensing satellite imagery priori-information data set,and propose an approach to evaluate the robustness of remote-sensing image feature detectors.The building TH Priori-Information(TPI)data set with 2297 remote sensing images serves as a standardized high-resolution data set for studies related to remote-sensing image features.The TPI contains 1)raw and calibrated remote-sensing images with high spatial and temporal resolutions(up to 2 m and 7 days,respectively),and 2)a built-in 3-D target area model that supports view position,view angle,lighting,shadowing,and other transformations.Based on TPI,we further present a quantized approach,including the feature recurrence rate,the feature match score,and the weighted feature robustness score,to evaluate the robustness of remote-sensing image feature detectors.The quantized approach gives general and objective assessments of the robustness of feature detectors under complex remote-sensing circumstances.Three remote-sensing image feature detectors,including scale-invariant feature transform(SIFT),speeded up robust features(SURF),and priori information based robust features(PIRF),are evaluated using the proposed approach on the TPI data set.Experimental results show that the robustness of PIRF outperforms others by over 6.2%.展开更多
In recent visual tracking research,correlation filter(CF)based trackers become popular because of their high speed and considerable accuracy.Previous methods mainly work on the extension of features and the solution o...In recent visual tracking research,correlation filter(CF)based trackers become popular because of their high speed and considerable accuracy.Previous methods mainly work on the extension of features and the solution of the boundary effect to learn a better correlation filter.However,the related studies are insufficient.By exploring the potential of trackers in these two aspects,a novel adaptive padding correlation filter(APCF)with feature group fusion is proposed for robust visual tracking in this paper based on the popular context-aware tracking framework.In the tracker,three feature groups are fused by use of the weighted sum of the normalized response maps,to alleviate the risk of drift caused by the extreme change of single feature.Moreover,to improve the adaptive ability of padding for the filter training of different object shapes,the best padding is selected from the preset pool according to tracking precision over the whole video,where tracking precision is predicted according to the prediction model trained by use of the sequence features of the first several frames.The sequence features include three traditional features and eight newly constructed features.Extensive experiments demonstrate that the proposed tracker is superior to most state-of-the-art correlation filter based trackers and has a stable improvement compared to the basic trackers.展开更多
Sparse representation based on dictionary construction and learning methods have aroused interests in the field of face recognition.Aiming at the shortcomings of face feature dictionary not‘clean’and noise interfere...Sparse representation based on dictionary construction and learning methods have aroused interests in the field of face recognition.Aiming at the shortcomings of face feature dictionary not‘clean’and noise interference dictionary not‘representative’in sparse representation classification model,a new method named as robust sparse representation is proposed based on adaptive joint dictionary(RSR-AJD).First,a fast lowrank subspace recovery algorithm based on LogDet function(Fast LRSR-LogDet)is proposed for accurate low-rank facial intrinsic dictionary representing the similar structure of human face and low computational complexity.Then,the Iteratively Reweighted Robust Principal Component Analysis(IRRPCA)algorithm is used to get a more precise occlusion dictionary for depicting the possible discontinuous interference information attached to human face such as glasses occlusion or scarf occlusion etc.Finally,the above Fast LRSR-LogDet algorithm and IRRPCA algorithm are adopted to construct the adaptive joint dictionary,which includes the low-rank facial intrinsic dictionary,the occlusion dictionary and the remaining intra-class variant dictionary for robust sparse coding.Experiments conducted on four popular databases(AR,Extended Yale B,LFW,and Pubfig)verify the robustness and effectiveness of the authors’method.展开更多
针对传统SURF算法(speeded up robust features)在拼接高分辨率无人机航拍图像时运行速度慢、特征匹配率低的特点,提出了一种基于IB-SURF(image block-SURF)技术的无人机图像拼接算法。结合无人机定位定姿系统(position and orientation...针对传统SURF算法(speeded up robust features)在拼接高分辨率无人机航拍图像时运行速度慢、特征匹配率低的特点,提出了一种基于IB-SURF(image block-SURF)技术的无人机图像拼接算法。结合无人机定位定姿系统(position and orientation system,POS)求取图像重叠区域;构造掩模在无人机图像重叠区域检测特征点,减少特征提取时间;借助图像分块(image block,IB)的思想对图像划分网格,精简筛选特征点;引入Neighborhood-KNN(neighborhood-K nearest neighbors)进行特征点匹配,提高图像匹配效率。实验结果表明,IB-SURF算法有较快的运行速度和较高的特征匹配率,平均特征匹配率达到84.3%,特征匹配正确率超过95.1%,为图像高质量拼接提供了技术基础。展开更多
基金The National Natural Science Foundation of China (No.61231002,61273266,51075068,60872073,60975017, 61003131)the Ph.D.Programs Foundation of the Ministry of Education of China(No.20110092130004)+1 种基金the Science Foundation for Young Talents in the Educational Committee of Anhui Province(No. 2010SQRL018)the 211 Project of Anhui University(No.2009QN027B)
文摘A machine learning based speech enhancement method is proposed to improve the intelligibility of whispered speech. A binary mask estimated by a two-class support vector machine (SVM) classifier is used to synthesize the enhanced whisper. A novel noise robust feature called Gammatone feature cosine coefficients (GFCCs) extracted by an auditory periphery model is derived and used for the binary mask estimation. The intelligibility performance of the proposed method is evaluated and compared with the traditional speech enhancement methods. Objective and subjective evaluation results indicate that the proposed method can effectively improve the intelligibility of whispered speech which is contaminated by noise. Compared with the power subtract algorithm and the log-MMSE algorithm, both of which do not improve the intelligibility in lower signal-to-noise ratio (SNR) environments, the proposed method has good performance in improving the intelligibility of noisy whisper. Additionally, the intelligibility of the enhanced whispered speech using the proposed method also outperforms that of the corresponding unprocessed noisy whispered speech.
基金Supported by the National Natural Science Foun-dation of China (60373062 ,60573045)
文摘This paper proposes a novel robust image watermarking scheme for digital images using local invariant features and Independent Component Analysis (ICA). Most present watermarking algorithms are unable to resist geometric distortions that desynchronize the location. The method we propose here is robust to geometric attacks. In order to resist geometric distortions, we use a local invariant feature of the image called the scale invariant feature transform, which is invariant to translation and scaling distortions. The watermark is inserted into the circular patches generated by scale-invariant key point extractor. Rotation invariance is achieved using the translation property of the polar-mapped circular patches. Our method belongs to the blind watermark category, because we use Independent Component Analysis for detection that does not need the original image during detection. Experimental results show that our method is robust against geometric distortion attacks as well as signal-processing attacks.
基金supported by the Healthcare AI Convergence R&D Program through the National IT Industry Promotion Agency of Korea(NIPA)funded by the Ministry of Science and ICT(No.S0102-23-1007)the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(NRF-2017R1A6A1A03015496).
文摘Emotion recognition based on facial expressions is one of the most critical elements of human-machine interfaces.Most conventional methods for emotion recognition using facial expressions use the entire facial image to extract features and then recognize specific emotions through a pre-trained model.In contrast,this paper proposes a novel feature vector extraction method using the Euclidean distance between the landmarks changing their positions according to facial expressions,especially around the eyes,eyebrows,nose,andmouth.Then,we apply a newclassifier using an ensemble network to increase emotion recognition accuracy.The emotion recognition performance was compared with the conventional algorithms using public databases.The results indicated that the proposed method achieved higher accuracy than the traditional based on facial expressions for emotion recognition.In particular,our experiments with the FER2013 database show that our proposed method is robust to lighting conditions and backgrounds,with an average of 25% higher performance than previous studies.Consequently,the proposed method is expected to recognize facial expressions,especially fear and anger,to help prevent severe accidents by detecting security-related or dangerous actions in advance.
文摘In the paper a referral system to assist the medical experts in the screening/referral of diabetic retinopathy is suggested. The system has been developed by a sequential use of different existing mathematical techniques. These techniques involve speeded up robust features(SURF), K-means clustering and visual dictionaries(VD). Three databases are mixed to test the working of the system when the sources are dissimilar. When experiments were performed an area under the curve(AUC) of 0.9343 was attained. The results acquired from the system are promising.
文摘This article presents a good robust and real-time system scheme of the mobile robot obstacle detection and navigation, which principle of work is based on the feature descriptor SURF. In this scheme, firstly, the image information of the mobile robot path was captured by the binocular camera; then the feature points were extracted and corresponding matched using SURF to the binocular images as the undetected obstacles; finally fixed the position of the objective by the parallax between the matching points combining with the binocular vision calibration model. Theoretical derivation and experimental results show that this scheme is more accurate for the detection and navigation of the interest points. It has fast matching speed and high accuracy and low error. So, it has certain practical effect and popularizing value for the mobile robot real-time obstacle avoidance and navigation.
基金the National Key Research and Development Program of China under Grant 2018YFF0301205in part by the National Natural Science Foundation of China under Grant NSFC 61925105 and Grant 61801260.
文摘In this paper,we build a remote-sensing satellite imagery priori-information data set,and propose an approach to evaluate the robustness of remote-sensing image feature detectors.The building TH Priori-Information(TPI)data set with 2297 remote sensing images serves as a standardized high-resolution data set for studies related to remote-sensing image features.The TPI contains 1)raw and calibrated remote-sensing images with high spatial and temporal resolutions(up to 2 m and 7 days,respectively),and 2)a built-in 3-D target area model that supports view position,view angle,lighting,shadowing,and other transformations.Based on TPI,we further present a quantized approach,including the feature recurrence rate,the feature match score,and the weighted feature robustness score,to evaluate the robustness of remote-sensing image feature detectors.The quantized approach gives general and objective assessments of the robustness of feature detectors under complex remote-sensing circumstances.Three remote-sensing image feature detectors,including scale-invariant feature transform(SIFT),speeded up robust features(SURF),and priori information based robust features(PIRF),are evaluated using the proposed approach on the TPI data set.Experimental results show that the robustness of PIRF outperforms others by over 6.2%.
基金supported by the National KeyResearch and Development Program of China(2018AAA0103203)the National Natural Science Foundation of China(62073036,62076031)the Beijing Natural Science Foundation(4202071)。
文摘In recent visual tracking research,correlation filter(CF)based trackers become popular because of their high speed and considerable accuracy.Previous methods mainly work on the extension of features and the solution of the boundary effect to learn a better correlation filter.However,the related studies are insufficient.By exploring the potential of trackers in these two aspects,a novel adaptive padding correlation filter(APCF)with feature group fusion is proposed for robust visual tracking in this paper based on the popular context-aware tracking framework.In the tracker,three feature groups are fused by use of the weighted sum of the normalized response maps,to alleviate the risk of drift caused by the extreme change of single feature.Moreover,to improve the adaptive ability of padding for the filter training of different object shapes,the best padding is selected from the preset pool according to tracking precision over the whole video,where tracking precision is predicted according to the prediction model trained by use of the sequence features of the first several frames.The sequence features include three traditional features and eight newly constructed features.Extensive experiments demonstrate that the proposed tracker is superior to most state-of-the-art correlation filter based trackers and has a stable improvement compared to the basic trackers.
基金Natural Science Foundation of Jiangsu Province,Grant/Award Number:BK20170765Natural Science Foundation of China,Grant/Award Number:61703201Science Foundation of Nanjing Institute of Technology,Grant/Award Numbers:ZKJ202002,ZKJ202003,and YKJ202019。
文摘Sparse representation based on dictionary construction and learning methods have aroused interests in the field of face recognition.Aiming at the shortcomings of face feature dictionary not‘clean’and noise interference dictionary not‘representative’in sparse representation classification model,a new method named as robust sparse representation is proposed based on adaptive joint dictionary(RSR-AJD).First,a fast lowrank subspace recovery algorithm based on LogDet function(Fast LRSR-LogDet)is proposed for accurate low-rank facial intrinsic dictionary representing the similar structure of human face and low computational complexity.Then,the Iteratively Reweighted Robust Principal Component Analysis(IRRPCA)algorithm is used to get a more precise occlusion dictionary for depicting the possible discontinuous interference information attached to human face such as glasses occlusion or scarf occlusion etc.Finally,the above Fast LRSR-LogDet algorithm and IRRPCA algorithm are adopted to construct the adaptive joint dictionary,which includes the low-rank facial intrinsic dictionary,the occlusion dictionary and the remaining intra-class variant dictionary for robust sparse coding.Experiments conducted on four popular databases(AR,Extended Yale B,LFW,and Pubfig)verify the robustness and effectiveness of the authors’method.