To avoid interference from unexpected background noises and obtain high fidelity voice signal,acoustic sensors with high sensitivity,flat frequency response,and high signal-to-noise ratio(SNR)are urgently needed for v...To avoid interference from unexpected background noises and obtain high fidelity voice signal,acoustic sensors with high sensitivity,flat frequency response,and high signal-to-noise ratio(SNR)are urgently needed for voice recognition.Grapheneoxide(GO)has received extensive attention due to its advantages of controllable thickness and high fracture strength.However,low mechanical sensitivity(SM)introduced by undesirable initial stress limits the performance of GO material in the field of voice recognition.To alleviate the aforementioned issue,GO diaphragm with annular corrugations is proposed.By means of the reusable copper mold machined by picosecond laser,the fabrication and transfer of corrugated GO diaphragm are realized,thus achieving a Fabry–Perot(F–P)acoustic sensor.Benefitting from the structural advantage of the corrugated GO diaphragm,our F–P acoustic sensor exhibits high S_(M)(43.70 nm/Pa@17 kHz),flat frequency response(−3.2 to 3.7 dB within 300–3500 Hz),and high SNR(76.66 dB@1 kHz).In addition,further acoustic measurements also demonstrate other merits,including an excellent frequency detection resolution(0.01 Hz)and high time stability(output relative variation less than 6.7% for 90 min).Given the merits presented before,the fabricated F–P acoustic sensor with corrugated GO diaphragm can serve as a high-fidelity platform for acoustic detection and voice recognition.In conjunction with the deep residual learning framework,high recognition accuracy of 98.4%is achieved by training and testing the data recorded by the fabricated F–P acoustic sensor.展开更多
Biometric recognition refers to the process of recognizing a person’s identity using physiological or behavioral modalities,such as face,voice,fingerprint,gait,etc.Such biometric modalities are mostly used in recogni...Biometric recognition refers to the process of recognizing a person’s identity using physiological or behavioral modalities,such as face,voice,fingerprint,gait,etc.Such biometric modalities are mostly used in recognition tasks separately as in unimodal systems,or jointly with two or more as in multimodal systems.However,multimodal systems can usually enhance the recognition performance over unimodal systems by integrating the biometric data of multiple modalities at different fusion levels.Despite this enhancement,in real-life applications some factors degrade multimodal systems’performance,such as occlusion,face poses,and noise in voice data.In this paper,we propose two algorithms that effectively apply dynamic fusion at feature level based on the data quality of multimodal biometrics.The proposed algorithms attempt to minimize the negative influence of confusing and low-quality features by either exclusion or weight reduction to achieve better recognition performance.The proposed dynamic fusion was achieved using face and voice biometrics,where face features were extracted using principal component analysis(PCA),and Gabor filters separately,whilst voice features were extracted using Mel-Frequency Cepstral Coefficients(MFCCs).Here,the facial data quality assessment of face images is mainly based on the existence of occlusion,whereas the assessment of voice data quality is substantially based on the calculation of signal to noise ratio(SNR)as per the existence of noise.To evaluate the performance of the proposed algorithms,several experiments were conducted using two combinations of three different databases,AR database,and the extended Yale Face Database B for face images,in addition to VOiCES database for voice data.The obtained results show that both proposed dynamic fusion algorithms attain improved performance and offer more advantages in identification and verification over not only the standard unimodal algorithms but also the multimodal algorithms using standard fusion methods.展开更多
Automatic speaker recognition(ASR)systems are the field of Human-machine interaction and scientists have been using feature extraction and feature matching methods to analyze and synthesize these signals.One of the mo...Automatic speaker recognition(ASR)systems are the field of Human-machine interaction and scientists have been using feature extraction and feature matching methods to analyze and synthesize these signals.One of the most commonly used methods for feature extraction is Mel Frequency Cepstral Coefficients(MFCCs).Recent researches show that MFCCs are successful in processing the voice signal with high accuracies.MFCCs represents a sequence of voice signal-specific features.This experimental analysis is proposed to distinguish Turkish speakers by extracting the MFCCs from the speech recordings.Since the human perception of sound is not linear,after the filterbank step in theMFCC method,we converted the obtained log filterbanks into decibel(dB)features-based spectrograms without applying the Discrete Cosine Transform(DCT).A new dataset was created with converted spectrogram into a 2-D array.Several learning algorithms were implementedwith a 10-fold cross-validationmethod to detect the speaker.The highest accuracy of 90.2%was achieved using Multi-layer Perceptron(MLP)with tanh activation function.The most important output of this study is the inclusion of human voice as a new feature set.展开更多
In this work,we developed and implemented a voice control algorithm to steer smart robotic wheelchairs(SRW)using the neural network technique.This technique used a network in network(NIN)and long shortterm memory(LSTM...In this work,we developed and implemented a voice control algorithm to steer smart robotic wheelchairs(SRW)using the neural network technique.This technique used a network in network(NIN)and long shortterm memory(LSTM)structure integrated with a built-in voice recognition algorithm.An Android Smartphone application was designed and configured with the proposed method.A Wi-Fi hotspot was used to connect the software and hardware components of the system in an offline mode.To operate and guide SRW,the design technique proposed employing five voice commands(yes,no,left,right,no,and stop)via the Raspberry Pi and DC motors.Ten native Arabic speakers trained and validated an English speech corpus to determine the method’s overall effectiveness.The design method of SRW was evaluated in both indoor and outdoor environments in order to determine its time response and performance.The results showed that the accuracy rate for the system reached 98.2%for the five-voice commends in classifying voices accurately.Another interesting finding from the real-time test was that the root-mean-square deviation(RMSD)for indoor/outdoor maneuvering nodes was 2.2∗10–5(for latitude),while that for longitude coordinates was a whopping 2.4∗10–5(for latitude).展开更多
The problem of disguised voice recognition based on deep belief networks is studied. A hybrid feature extraction algorithm based on formants, Gammatone frequency cepstrum coefficients(GFCC) and their different coeffic...The problem of disguised voice recognition based on deep belief networks is studied. A hybrid feature extraction algorithm based on formants, Gammatone frequency cepstrum coefficients(GFCC) and their different coefficients is proposed to extract more discriminative speaker features from the original voice data. Using mixed features as the input of the model, a masquerade voice library is constructed. A masquerade voice recognition model based on a depth belief network is proposed. A dropout strategy is introduced to prevent overfitting, which effectively solves the problems of traditional Gaussian mixture models, such as insufficient modeling ability and low discrimination. Experimental results show that the proposed disguised voice recognition method can better fit the feature distribution, and significantly improve the classification effect and recognition rate.展开更多
Flexible mechanosensors with a high sensitivity and fast response speed may advance the wearable and implantable applications of healthcare devices, such as real-time heart rate, pulse, and respiration monitoring. In ...Flexible mechanosensors with a high sensitivity and fast response speed may advance the wearable and implantable applications of healthcare devices, such as real-time heart rate, pulse, and respiration monitoring. In this paper, we introduce a novel flexible electronic eardrum (EE) based on single-walled carbon nanotubes, polyethylene, and polydimethylsiloxane with micro-structured pyramid arrays. The EE device shows a high sensitivity, high signal-to-noise ratio (approximately 55 dB), and fast response time (76.9 μs) in detecting and recording sound within a frequency domain of 20-13,000 Hz. The mechanism for sound detection is investigated and the sensitivity is determined using the micro-structure, thickness, and strain state. We also demonstrated that the device is able to distinguish human voices. This unprecedented performance of the flexible electronic eardrum has implications for many applications such as implantable acoustical bioelectronics and personal voice recognition.展开更多
Image segmentation for 3D printing and 3D visualization has become an essential component in many fields of medical research,teaching,and clinical practice.Medical image segmentation requires sophisticated computerize...Image segmentation for 3D printing and 3D visualization has become an essential component in many fields of medical research,teaching,and clinical practice.Medical image segmentation requires sophisticated computerized quantifications and visualization tools.Recently,with the development of artificial intelligence(AI)technology,tumors or organs can be quickly and accurately detected and automatically contoured from medical images.This paper introduces a platform-independent,multi-modality image registration,segmentation,and 3D visualization program,named artificial intelligence-based medical image segmentation for 3D printing and naked eye 3D visualization(AIMIS3D).YOLOV3 algorithm was used to recognize prostate organ from T2-weighted MRI images with proper training.Prostate cancer and bladder cancer were segmented based on U-net from MRI images.CT images of osteosarcoma were loaded into the platform for the segmentation of lumbar spine,osteosarcoma,vessels,and local nerves for 3D printing.Breast displacement during each radiation therapy was quantitatively evaluated by automatically identifying the position of the 3D printed plastic breast bra.Brain vessel from multimodality MRI images was segmented by using model-based transfer learning for 3D printing and naked eye 3D visualization in AIMIS3D platform.展开更多
基金supported by the National Natural Science Foundation of China(No.62173021)Joint Funds of the National Natural Science Foundation of China(No.U23A20638)+1 种基金Beijing Natural Science Foundation(No.4212039)Aviation Science Foundation of China(No.2020Z073051002).
文摘To avoid interference from unexpected background noises and obtain high fidelity voice signal,acoustic sensors with high sensitivity,flat frequency response,and high signal-to-noise ratio(SNR)are urgently needed for voice recognition.Grapheneoxide(GO)has received extensive attention due to its advantages of controllable thickness and high fracture strength.However,low mechanical sensitivity(SM)introduced by undesirable initial stress limits the performance of GO material in the field of voice recognition.To alleviate the aforementioned issue,GO diaphragm with annular corrugations is proposed.By means of the reusable copper mold machined by picosecond laser,the fabrication and transfer of corrugated GO diaphragm are realized,thus achieving a Fabry–Perot(F–P)acoustic sensor.Benefitting from the structural advantage of the corrugated GO diaphragm,our F–P acoustic sensor exhibits high S_(M)(43.70 nm/Pa@17 kHz),flat frequency response(−3.2 to 3.7 dB within 300–3500 Hz),and high SNR(76.66 dB@1 kHz).In addition,further acoustic measurements also demonstrate other merits,including an excellent frequency detection resolution(0.01 Hz)and high time stability(output relative variation less than 6.7% for 90 min).Given the merits presented before,the fabricated F–P acoustic sensor with corrugated GO diaphragm can serve as a high-fidelity platform for acoustic detection and voice recognition.In conjunction with the deep residual learning framework,high recognition accuracy of 98.4%is achieved by training and testing the data recorded by the fabricated F–P acoustic sensor.
文摘Biometric recognition refers to the process of recognizing a person’s identity using physiological or behavioral modalities,such as face,voice,fingerprint,gait,etc.Such biometric modalities are mostly used in recognition tasks separately as in unimodal systems,or jointly with two or more as in multimodal systems.However,multimodal systems can usually enhance the recognition performance over unimodal systems by integrating the biometric data of multiple modalities at different fusion levels.Despite this enhancement,in real-life applications some factors degrade multimodal systems’performance,such as occlusion,face poses,and noise in voice data.In this paper,we propose two algorithms that effectively apply dynamic fusion at feature level based on the data quality of multimodal biometrics.The proposed algorithms attempt to minimize the negative influence of confusing and low-quality features by either exclusion or weight reduction to achieve better recognition performance.The proposed dynamic fusion was achieved using face and voice biometrics,where face features were extracted using principal component analysis(PCA),and Gabor filters separately,whilst voice features were extracted using Mel-Frequency Cepstral Coefficients(MFCCs).Here,the facial data quality assessment of face images is mainly based on the existence of occlusion,whereas the assessment of voice data quality is substantially based on the calculation of signal to noise ratio(SNR)as per the existence of noise.To evaluate the performance of the proposed algorithms,several experiments were conducted using two combinations of three different databases,AR database,and the extended Yale Face Database B for face images,in addition to VOiCES database for voice data.The obtained results show that both proposed dynamic fusion algorithms attain improved performance and offer more advantages in identification and verification over not only the standard unimodal algorithms but also the multimodal algorithms using standard fusion methods.
基金This work was supported by the GRRC program of Gyeonggi province.[GRRC-Gachon2020(B04),Development of AI-based Healthcare Devices].
文摘Automatic speaker recognition(ASR)systems are the field of Human-machine interaction and scientists have been using feature extraction and feature matching methods to analyze and synthesize these signals.One of the most commonly used methods for feature extraction is Mel Frequency Cepstral Coefficients(MFCCs).Recent researches show that MFCCs are successful in processing the voice signal with high accuracies.MFCCs represents a sequence of voice signal-specific features.This experimental analysis is proposed to distinguish Turkish speakers by extracting the MFCCs from the speech recordings.Since the human perception of sound is not linear,after the filterbank step in theMFCC method,we converted the obtained log filterbanks into decibel(dB)features-based spectrograms without applying the Discrete Cosine Transform(DCT).A new dataset was created with converted spectrogram into a 2-D array.Several learning algorithms were implementedwith a 10-fold cross-validationmethod to detect the speaker.The highest accuracy of 90.2%was achieved using Multi-layer Perceptron(MLP)with tanh activation function.The most important output of this study is the inclusion of human voice as a new feature set.
基金This research was funded by the deputyship for Research and Innovation,Ministry of Education,Saudi Arabia,Grant Number IFP-2020–31.
文摘In this work,we developed and implemented a voice control algorithm to steer smart robotic wheelchairs(SRW)using the neural network technique.This technique used a network in network(NIN)and long shortterm memory(LSTM)structure integrated with a built-in voice recognition algorithm.An Android Smartphone application was designed and configured with the proposed method.A Wi-Fi hotspot was used to connect the software and hardware components of the system in an offline mode.To operate and guide SRW,the design technique proposed employing five voice commands(yes,no,left,right,no,and stop)via the Raspberry Pi and DC motors.Ten native Arabic speakers trained and validated an English speech corpus to determine the method’s overall effectiveness.The design method of SRW was evaluated in both indoor and outdoor environments in order to determine its time response and performance.The results showed that the accuracy rate for the system reached 98.2%for the five-voice commends in classifying voices accurately.Another interesting finding from the real-time test was that the root-mean-square deviation(RMSD)for indoor/outdoor maneuvering nodes was 2.2∗10–5(for latitude),while that for longitude coordinates was a whopping 2.4∗10–5(for latitude).
基金supported by Natural Science Foundation of Liaoning Province (Nos. 2019-ZD-0168 and 2020-KF-12-11)Major Training Program of Criminal Investigation Police University of China (No. 3242019010)+1 种基金Key Research and Development Projects of Ministry of Science and Technology (No. 2017YFC0821005)Second Batch of New Engineering Research and Practice Projects(No. E-AQGABQ20202710)。
文摘The problem of disguised voice recognition based on deep belief networks is studied. A hybrid feature extraction algorithm based on formants, Gammatone frequency cepstrum coefficients(GFCC) and their different coefficients is proposed to extract more discriminative speaker features from the original voice data. Using mixed features as the input of the model, a masquerade voice library is constructed. A masquerade voice recognition model based on a depth belief network is proposed. A dropout strategy is introduced to prevent overfitting, which effectively solves the problems of traditional Gaussian mixture models, such as insufficient modeling ability and low discrimination. Experimental results show that the proposed disguised voice recognition method can better fit the feature distribution, and significantly improve the classification effect and recognition rate.
基金supported by the National Natural Science Foundation of China(61874007,12074028,and 52102152)Shandong Provincial Major Scientific and Technological Innovation Project(2019JZZY010209)+2 种基金the Key-area Research and Development Program of Guangdong Province(2020B010172001)the Fundamental Research Funds for the Central Universities(buctrc201802,buctrc201830,and buctrc202127)Beijing Outstanding Young Scientist Program(BJJWZYJH01201910010024)。
基金We acknowledge the funding support from the National Natural Science Foundation of China (No. 61574163), the China Postdoctoral Science Foundation (No. 2015M571837) and the Foundation Research Project of Jiangsu Province (No. BK20150364).
文摘Flexible mechanosensors with a high sensitivity and fast response speed may advance the wearable and implantable applications of healthcare devices, such as real-time heart rate, pulse, and respiration monitoring. In this paper, we introduce a novel flexible electronic eardrum (EE) based on single-walled carbon nanotubes, polyethylene, and polydimethylsiloxane with micro-structured pyramid arrays. The EE device shows a high sensitivity, high signal-to-noise ratio (approximately 55 dB), and fast response time (76.9 μs) in detecting and recording sound within a frequency domain of 20-13,000 Hz. The mechanism for sound detection is investigated and the sensitivity is determined using the micro-structure, thickness, and strain state. We also demonstrated that the device is able to distinguish human voices. This unprecedented performance of the flexible electronic eardrum has implications for many applications such as implantable acoustical bioelectronics and personal voice recognition.
文摘Image segmentation for 3D printing and 3D visualization has become an essential component in many fields of medical research,teaching,and clinical practice.Medical image segmentation requires sophisticated computerized quantifications and visualization tools.Recently,with the development of artificial intelligence(AI)technology,tumors or organs can be quickly and accurately detected and automatically contoured from medical images.This paper introduces a platform-independent,multi-modality image registration,segmentation,and 3D visualization program,named artificial intelligence-based medical image segmentation for 3D printing and naked eye 3D visualization(AIMIS3D).YOLOV3 algorithm was used to recognize prostate organ from T2-weighted MRI images with proper training.Prostate cancer and bladder cancer were segmented based on U-net from MRI images.CT images of osteosarcoma were loaded into the platform for the segmentation of lumbar spine,osteosarcoma,vessels,and local nerves for 3D printing.Breast displacement during each radiation therapy was quantitatively evaluated by automatically identifying the position of the 3D printed plastic breast bra.Brain vessel from multimodality MRI images was segmented by using model-based transfer learning for 3D printing and naked eye 3D visualization in AIMIS3D platform.