期刊文献+
共找到19篇文章
< 1 >
每页显示 20 50 100
Using Hybrid Penalty and Gated Linear Units to Improve Wasserstein Generative Adversarial Networks for Single-Channel Speech Enhancement
1
作者 Xiaojun Zhu Heming Huang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第6期2155-2172,共18页
Recently,speech enhancement methods based on Generative Adversarial Networks have achieved good performance in time-domain noisy signals.However,the training of Generative Adversarial Networks has such problems as con... Recently,speech enhancement methods based on Generative Adversarial Networks have achieved good performance in time-domain noisy signals.However,the training of Generative Adversarial Networks has such problems as convergence difficulty,model collapse,etc.In this work,an end-to-end speech enhancement model based on Wasserstein Generative Adversarial Networks is proposed,and some improvements have been made in order to get faster convergence speed and better generated speech quality.Specifically,in the generator coding part,each convolution layer adopts different convolution kernel sizes to conduct convolution operations for obtaining speech coding information from multiple scales;a gated linear unit is introduced to alleviate the vanishing gradient problem with the increase of network depth;the gradient penalty of the discriminator is replaced with spectral normalization to accelerate the convergence rate of themodel;a hybrid penalty termcomposed of L1 regularization and a scale-invariant signal-to-distortion ratio is introduced into the loss function of the generator to improve the quality of generated speech.The experimental results on both TIMIT corpus and Tibetan corpus show that the proposed model improves the speech quality significantly and accelerates the convergence speed of the model. 展开更多
关键词 speech enhancement generative adversarial networks hybrid penalty gated linear units multi-scale convolution
下载PDF
Real-Time Speech Enhancement Based on Convolutional Recurrent Neural Network
2
作者 S.Girirajan A.Pandian 《Intelligent Automation & Soft Computing》 SCIE 2023年第2期1987-2001,共15页
Speech enhancement is the task of taking a noisy speech input and pro-ducing an enhanced speech output.In recent years,the need for speech enhance-ment has been increased due to challenges that occurred in various app... Speech enhancement is the task of taking a noisy speech input and pro-ducing an enhanced speech output.In recent years,the need for speech enhance-ment has been increased due to challenges that occurred in various applications such as hearing aids,Automatic Speech Recognition(ASR),and mobile speech communication systems.Most of the Speech Enhancement research work has been carried out for English,Chinese,and other European languages.Only a few research works involve speech enhancement in Indian regional Languages.In this paper,we propose a two-fold architecture to perform speech enhancement for Tamil speech signal based on convolutional recurrent neural network(CRN)that addresses the speech enhancement in a real-time single channel or track of sound created by the speaker.In thefirst stage mask based long short-term mem-ory(LSTM)is used for noise suppression along with loss function and in the sec-ond stage,Convolutional Encoder-Decoder(CED)is used for speech restoration.The proposed model is evaluated on various speaker and noisy environments like Babble noise,car noise,and white Gaussian noise.The proposed CRN model improves speech quality by 0.1 points when compared with the LSTM base model and also CRN requires fewer parameters for training.The performance of the pro-posed model is outstanding even in low Signal to Noise Ratio(SNR). 展开更多
关键词 speech enhancement convolutional encoder-decoder long short-term memory noise suppression speech restoration
下载PDF
Speech Enhancement via Mask-Mapping Based Residual Dense Network
3
作者 Lin Zhou Xijin Chen +3 位作者 Chaoyan Wu Qiuyue Zhong Xu Cheng Yibin Tang 《Computers, Materials & Continua》 SCIE EI 2023年第1期1259-1277,共19页
Masking-based and spectrum mapping-based methods are the two main algorithms of speech enhancement with deep neural network(DNN).But the mapping-based methods only utilizes the phase of noisy speech,which limits the u... Masking-based and spectrum mapping-based methods are the two main algorithms of speech enhancement with deep neural network(DNN).But the mapping-based methods only utilizes the phase of noisy speech,which limits the upper bound of speech enhancement performance.Maskingbased methods need to accurately estimate the masking which is still the key problem.Combining the advantages of above two types of methods,this paper proposes the speech enhancement algorithm MM-RDN(maskingmapping residual dense network)based on masking-mapping(MM)and residual dense network(RDN).Using the logarithmic power spectrogram(LPS)of consecutive frames,MM estimates the ideal ratio masking(IRM)matrix of consecutive frames.RDN can make full use of feature maps of all layers.Meanwhile,using the global residual learning to combine the shallow features and deep features,RDN obtains the global dense features from the LPS,thereby improves estimated accuracy of the IRM matrix.Simulations show that the proposed method achieves attractive speech enhancement performance in various acoustic environments.Specifically,in the untrained acoustic test with limited priors,e.g.,unmatched signal-to-noise ratio(SNR)and unmatched noise category,MM-RDN can still outperform the existing convolutional recurrent network(CRN)method in themeasures of perceptual evaluation of speech quality(PESQ)and other evaluation indexes.It indicates that the proposed algorithm is more generalized in untrained conditions. 展开更多
关键词 Mask-mapping-based method residual dense block speech enhancement
下载PDF
Adversarial Examples Protect Your Privacy on Speech Enhancement System
4
作者 Mingyu Dong Diqun Yan Rangding Wang 《Computer Systems Science & Engineering》 SCIE EI 2023年第7期1-12,共12页
Speech is easily leaked imperceptibly.When people use their phones,the personal voice assistant is constantly listening and waiting to be activated.Private content in speech may be maliciously extracted through automa... Speech is easily leaked imperceptibly.When people use their phones,the personal voice assistant is constantly listening and waiting to be activated.Private content in speech may be maliciously extracted through automatic speech recognition(ASR)technology by some applications on phone devices.To guarantee that the recognized speech content is accurate,speech enhancement technology is used to denoise the input speech.Speech enhancement technology has developed rapidly along with deep neural networks(DNNs),but adversarial examples can cause DNNs to fail.Considering that the vulnerability of DNN can be used to protect the privacy in speech.In this work,we propose an adversarial method to degrade speech enhancement systems,which can prevent the malicious extraction of private information in speech.Experimental results show that the generated enhanced adversarial examples can be removed most content of the target speech or replaced with target speech content by speech enhancement.The word error rate(WER)between the enhanced original example and enhanced adversarial example recognition result can reach 89.0%.WER of target attack between enhanced adversarial example and target example is low at 33.75%.The adversarial perturbation in the adversarial example can bring much more change than itself.The rate of difference between two enhanced examples and adversarial perturbation can reach more than 1.4430.Meanwhile,the transferability between different speech enhancement models is also investigated.The low transferability of the method can be used to ensure the content in the adversarial example is not damaged,the useful information can be extracted by the friendly ASR.This work can prevent the malicious extraction of speech. 展开更多
关键词 Adversarial example speech enhancement privacy protection deep neural network
下载PDF
Speech Enhancement Based on Approximate Message Passing 被引量:1
5
作者 Chao Li Ting Jiang Sheng Wu 《China Communications》 SCIE CSCD 2020年第8期187-198,共12页
To overcome the limitations of conventional speech enhancement methods, such as inaccurate voice activity detector(VAD) and noise estimation, a novel speech enhancement algorithm based on the approximate message passi... To overcome the limitations of conventional speech enhancement methods, such as inaccurate voice activity detector(VAD) and noise estimation, a novel speech enhancement algorithm based on the approximate message passing(AMP) is adopted. AMP exploits the difference between speech and noise sparsity to remove or mute the noise from the corrupted speech. The AMP algorithm is adopted to reconstruct the clean speech efficiently for speech enhancement. More specifically, the prior probability distribution of speech sparsity coefficient is characterized by Gaussian-model, and the hyper-parameters of the prior model are excellently learned by expectation maximization(EM) algorithm. We utilize the k-nearest neighbor(k-NN) algorithm to learn the sparsity with the fact that the speech coefficients between adjacent frames are correlated. In addition, computational simulations are used to validate the proposed algorithm, which achieves better speech enhancement performance than other four baseline methods-Wiener filtering, subspace pursuit(SP), distributed sparsity adaptive matching pursuit(DSAMP), and expectation-maximization Gaussian-model approximate message passing(EM-GAMP) under different compression ratios and a wide range of signal to noise ratios(SNRs). 展开更多
关键词 speech enhancement approximate message passing Gaussian model expectation maximization algorithm
下载PDF
HMM-based noise estimator for speech enhancement
6
作者 许春冬 夏日升 +2 位作者 应冬文 李军锋 颜永红 《Journal of Beijing Institute of Technology》 EI CAS 2014年第4期549-556,共8页
A noise estimator was presented in this paper by modeling the log-power sequence with hidden Markov model(HMM).The smoothing factor of this estimator was motivated by the speech presence probability at each frequency ... A noise estimator was presented in this paper by modeling the log-power sequence with hidden Markov model(HMM).The smoothing factor of this estimator was motivated by the speech presence probability at each frequency band.This HMM had a speech state and a nonspeech state,and each state consisted of a unique Gaussian function.The mean of the nonspeech state was the estimation of the noise logarithmic power.To make this estimator run in an on-line manner,an HMM parameter updated method was used based on a first-order recursive process.The noise signal was tracked together with the HMM to be sequentially updated.For the sake of reliability,some constraints were introduced to the HMM.The proposed algorithm was compared with the conventional ones such as minimum statistics(MS)and improved minima controlled recursive averaging(IMCRA).The experimental results confirms its promising performance. 展开更多
关键词 noise estimation hidden markov model CONSTRAINTS first-order recursive process speech enhancement
下载PDF
Single-Channel Speech Enhancement Based on Improved Frame-Iterative Spectral Subtraction in the Modulation Domain
7
作者 Chao Li Ting Jiang Sheng Wu 《China Communications》 SCIE CSCD 2021年第9期100-115,共16页
Aiming at the problem of music noise introduced by classical spectral subtraction,a shorttime modulation domain(STM)spectral subtraction method has been successfully applied for singlechannel speech enhancement.Howeve... Aiming at the problem of music noise introduced by classical spectral subtraction,a shorttime modulation domain(STM)spectral subtraction method has been successfully applied for singlechannel speech enhancement.However,due to the inaccurate voice activity detection(VAD),the residual music noise and enhanced performance still need to be further improved,especially in the low signal to noise ratio(SNR)scenarios.To address this issue,an improved frame iterative spectral subtraction in the STM domain(IMModSSub)is proposed.More specifically,with the inter-frame correlation,the noise subtraction is directly applied to handle the noisy signal for each frame in the STM domain.Then,the noisy signal is classified into speech or silence frames based on a predefined threshold of segmented SNR.With these classification results,a corresponding mask function is developed for noisy speech after noise subtraction.Finally,exploiting the increased sparsity of speech signal in the modulation domain,the orthogonal matching pursuit(OMP)technique is employed to the speech frames for improving the speech quality and intelligibility.The effectiveness of the proposed method is evaluated with three types of noise,including white noise,pink noise,and hfchannel noise.The obtained results show that the proposed method outperforms some established baselines at lower SNRs(-5 to +5 dB). 展开更多
关键词 short-time modulation domain single-channel speech enhancement modulation improved frame iterative spectral subtraction low SNRs
下载PDF
An Efficient Reference Free Adaptive Learning Process for Speech Enhancement Applications
8
作者 Girika Jyoshna Md.Zia Ur Rahman L.Koteswararao 《Computers, Materials & Continua》 SCIE EI 2022年第2期3067-3080,共14页
In issues like hearing impairment,speech therapy and hearing aids play a major role in reducing the impairment.Removal of noise signals from speech signals is a key task in hearing aids as well as in speech therapy.Du... In issues like hearing impairment,speech therapy and hearing aids play a major role in reducing the impairment.Removal of noise signals from speech signals is a key task in hearing aids as well as in speech therapy.During the transmission of speech signals,several noise components contaminate the actual speech components.This paper addresses a new adaptive speech enhancement(ASE)method based on a modified version of singular spectrum analysis(MSSA).The MSSA generates a reference signal for ASE and makes the ASE is free from feeding reference component.The MSSA adopts three key steps for generating the reference from the contaminated speech only.These are decomposition,grouping and reconstruction.The generated reference is taken as a reference for variable size adaptive learning algorithms.In this work two categories of adaptive learning algorithms are used.They are step variable adaptive learning(SVAL)algorithm and time variable step size adaptive learning(TVAL).Further,sign regressor function is applied to adaptive learning algorithms to reduce the computational complexity of the proposed adaptive learning algorithms.The performance measures of the proposed schemes are calculated in terms of signal to noise ratio improvement(SNRI),excess mean square error(EMSE)and misadjustment(MSD).For cockpit noise these measures are found to be 29.2850,-27.6060 and 0.0758 dB respectively during the experiments using SVAL algorithm.By considering the reduced number of multiplications the sign regressor version of SVAL based ASE method is found to better then the counter parts. 展开更多
关键词 Adaptive algorithm speech enhancement singular spectrum analysis reference free noise canceller variable step size
下载PDF
SPEECH ENHANCEMENT USING HARMONICS REGENERATION BASED ON MULTIBAND EXCITATION
9
作者 Zhang Yanfang Tang Kun Cui Huijuan 《Journal of Electronics(China)》 2011年第4期565-570,共6页
This paper proposes an algorithm that adopts the harmonic regeneration as post-processing to improve the performance of speech enhancement using traditional Short Time Spectral Amplitude(STSA).The proposed algorithm a... This paper proposes an algorithm that adopts the harmonic regeneration as post-processing to improve the performance of speech enhancement using traditional Short Time Spectral Amplitude(STSA).The proposed algorithm aims to alleviate the distortion of the high harmonics of enhanced speech via the traditional STSA,and consequently improves the speech quality.We first detect the pitch,or fundamental frequency,of the enhanced speech via the traditional STSA,and then,divide the whole spectrum into multiple sub-bands which center on each harmonic.After that,a series of specially designed windows centered on each harmonic are applied to all the sub-bands,in order to redistribute the energy in the sub-bands.The results of experiment demonstrate that the method has both theo-retical and practical basis. 展开更多
关键词 speech enhancement Short time spectral amplitude Harmonic regeneration Multiband excitation Pitch detection
下载PDF
Speech Enhancement via Residual Dense Generative Adversarial Network
10
作者 Lin Zhou Qiuyue Zhong +2 位作者 Tianyi Wang Siyuan Lu Hongmei Hu 《Computer Systems Science & Engineering》 SCIE EI 2021年第9期279-289,共11页
Generative adversarial networks(GANs)are paid more attention to dealing with the end-to-end speech enhancement in recent years.Various GANbased enhancement methods are presented to improve the quality of reconstructed... Generative adversarial networks(GANs)are paid more attention to dealing with the end-to-end speech enhancement in recent years.Various GANbased enhancement methods are presented to improve the quality of reconstructed speech.However,the performance of these GAN-based methods is worse than those of masking-based methods.To tackle this problem,we propose speech enhancement method with a residual dense generative adversarial network(RDGAN)contributing to map the log-power spectrum(LPS)of degraded speech to the clean one.In detail,a residual dense block(RDB)architecture is designed to better estimate the LPS of clean speech,which can extract rich local features of LPS through densely connected convolution layers.Meanwhile,sequential RDB connections are incorporated on various scales of LPS.It significantly increases the feature learning flexibility and robustness in the time-frequency domain.Simulations show that the proposed method achieves attractive speech enhancement performance in various acoustic environments.Specifically,in the untrained acoustic test with limited priors,e.g.,unmatched signal-to-noise ratio(SNR)and unmatched noise category,RDGAN can still outperform the existing GAN-based methods and masking-based method in the measures of PESQ and other evaluation indexes.It indicates that our method is more generalized in untrained conditions. 展开更多
关键词 Generative adversarial networks neural networks residual dense block speech enhancement
下载PDF
Speech enhancement based on modified a priori SNR estimation 被引量:1
11
作者 Yu FANG Gang LIU Jun GUO 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2011年第4期542-546,共5页
To solve the frame delay problem and match the previous frame,Plapous et al.[IEEE Transactions on Audio,Speech,and Language Processing,2006,14(6):2098–2108]introduced a novel approach called two-step noise reduction(... To solve the frame delay problem and match the previous frame,Plapous et al.[IEEE Transactions on Audio,Speech,and Language Processing,2006,14(6):2098–2108]introduced a novel approach called two-step noise reduction(TSNR)technique to improve the performance of the speech enhancement system.However,TSNR approach results in spectral peaks of short duration and the broken spectral outlier,which degrade the spectral characteristics of the speech.To solve this problem,a cepstral smoothing step is added in order to remove these spectral peaks brought by TSNR approach.Theory analysis shows that the proposed approach can effectively smooth the spectral peaks and keep the spectral outlier so as to protect the speech characteristics.Experiment results also show that the proposed approach can bring significant improvement compared to decision-directed(DD)and TSNR approaches,especially in non-stationary noisy environments. 展开更多
关键词 speech enhancement decision-directed(DD) two-step noise reduction(TSNR) signal-to-noise ratio(SNR)estimation
原文传递
Optimizing the Perceptual Quality of Time-Domain Speech Enhancement with Reinforcement Learning
12
作者 Xiang Hao Chenglin Xu +1 位作者 Lei Xie Haizhou Li 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2022年第6期939-947,共9页
In neural speech enhancement,a mismatch exists between the training objective,i.e.,Mean-Square Error(MSE),and perceptual quality evaluation metrics,i.e.,perceptual evaluation of speech quality and short-time objective... In neural speech enhancement,a mismatch exists between the training objective,i.e.,Mean-Square Error(MSE),and perceptual quality evaluation metrics,i.e.,perceptual evaluation of speech quality and short-time objective intelligibility.We propose a novel reinforcement learning algorithm and network architecture,which incorporate a non-differentiable perceptual quality evaluation metric into the objective function using a dynamic filter module.Unlike the traditional dynamic filter implementation that directly generates a convolution kernel,we use a filter generation agent to predict the probability density function of a multivariate Gaussian distribution,from which we sample the convolution kernel.Experimental results show that the proposed reinforcement learning method clearly improves the perceptual quality over other supervised learning methods with the MSE objective function. 展开更多
关键词 speech enhancement neural networks dynamic filter reinforcement learning
原文传递
Speech enhancement with a GSC-like structure employing sparse coding
13
作者 Li-chun YANG Yun-tao QIAN 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2014年第12期1154-1163,共10页
Speech communication is often influenced by various types of interfering signals. To improve the quality of the desired signal, a generalized sidelobe canceller(GSC), which uses a reference signal to estimate the inte... Speech communication is often influenced by various types of interfering signals. To improve the quality of the desired signal, a generalized sidelobe canceller(GSC), which uses a reference signal to estimate the interfering signal, is attracting attention of researchers. However, the interference suppression of GSC is limited since a little residual desired signal leaks into the reference signal. To overcome this problem, we use sparse coding to suppress the residual desired signal while preserving the reference signal. Sparse coding with the learned dictionary is usually used to reconstruct the desired signal. As the training samples of a desired signal for dictionary learning are not observable in the real environment, the reconstructed desired signal may contain a lot of residual interfering signal. In contrast,the training samples of the interfering signal during the absence of the desired signal for interferer dictionary learning can be achieved through voice activity detection(VAD). Since the reference signal of an interfering signal is coherent to the interferer dictionary, it can be well restructured by sparse coding, while the residual desired signal will be removed. The performance of GSC will be improved since the estimate of the interfering signal with the proposed reference signal is more accurate than ever. Simulation and experiments on a real acoustic environment show that our proposed method is effective in suppressing interfering signals. 展开更多
关键词 Generalized sidelobe canceller speech enhancement Voice activity detection Dictionary learning Sparse coding
原文传递
Mobile Communication Voice Enhancement Under Convolutional Neural Networks and the Internet of Things
14
作者 Jiajia Yu 《Intelligent Automation & Soft Computing》 SCIE 2023年第7期777-797,共21页
This study aims to reduce the interference of ambient noise in mobile communication,improve the accuracy and authenticity of information transmitted by sound,and guarantee the accuracy of voice information deliv-ered ... This study aims to reduce the interference of ambient noise in mobile communication,improve the accuracy and authenticity of information transmitted by sound,and guarantee the accuracy of voice information deliv-ered by mobile communication.First,the principles and techniques of speech enhancement are analyzed,and a fast lateral recursive least square method(FLRLS method)is adopted to process sound data.Then,the convolutional neural networks(CNNs)-based noise recognition CNN(NR-CNN)algorithm and speech enhancement model are proposed.Finally,related experiments are designed to verify the performance of the proposed algorithm and model.The experimental results show that the noise classification accuracy of the NR-CNN noise recognition algorithm is higher than 99.82%,and the recall rate and F1 value are also higher than 99.92.The proposed sound enhance-ment model can effectively enhance the original sound in the case of noise interference.After the CNN is incorporated,the average value of all noisy sound perception quality evaluation system values is improved by over 21%compared with that of the traditional noise reduction method.The proposed algorithm can adapt to a variety of voice environments and can simultaneously enhance and reduce noise processing on a variety of different types of voice signals,and the processing effect is better than that of traditional sound enhancement models.In addition,the sound distortion index of the proposed speech enhancement model is inferior to that of the control group,indicating that the addition of the CNN neural network is less likely to cause sound signal distortion in various sound environments and shows superior robustness.In summary,the proposed CNN-based speech enhancement model shows significant sound enhancement effects,stable performance,and strong adapt-ability.This study provides a reference and basis for research applying neural networks in speech enhancement. 展开更多
关键词 Convolutional neural networks speech enhancement noise recognition deep learning human-computer interaction Internet of Things
下载PDF
Speech Intelligibility Enhancement Algorithm Based on Multi-Resolution Power-Normalized Cepstral Coefficients(MRPNCC)for Digital Hearing Aids
15
作者 Xia Wang Xing Deng +2 位作者 Hongming Shen Guodong Zhang Shibing Zhang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2021年第2期693-710,共18页
Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life.Recently,Machine-learning based approaches to speech enhancement have shown great pro... Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life.Recently,Machine-learning based approaches to speech enhancement have shown great promise for improving speech intelligibility.Two key issues of these approaches are acoustic features extracted from noisy signals and classifiers used for supervised learning.In this paper,features are focused.Multi-resolution power-normalized cepstral coefficients(MRPNCC)are proposed as a new feature to enhance the speech intelligibility for hearing impaired.The new feature is constructed by combining four cepstrum at different time–frequency(T–F)resolutions in order to capture both the local and contextual information.MRPNCC vectors and binary masking labels calculated by signals passed through gammatone filterbank are used to train support vector machine(SVM)classifier,which aim to identify the binary masking values of the T–F units in the enhancement stage.The enhanced speech is synthesized by using the estimated masking values and wiener filtered T–F unit.Objective experimental results demonstrate that the proposed feature is superior to other comparing features in terms of HIT-FA,STOI,HASPI and PESQ,and that the proposed algorithm not only improves speech intelligibility but also improves speech quality slightly.Subjective tests validate the effectiveness of the proposed algorithm for hearing impaired. 展开更多
关键词 speech intelligibility enhancement multi-resolution power-normalized cepstral coefficients binary masking value hearing impaired
下载PDF
Comparison of enhancement techniques based on neural networks for attenuated voice signal captured by flexible vibration sensors on throats
16
作者 Shenghan Gao Changyan Zheng +3 位作者 Yicong Zhao Ziyue Wu Jiao Li Xian Huang 《Nanotechnology and Precision Engineering》 CAS CSCD 2022年第1期1-11,共11页
Wearable flexible sensors attached on the neck have been developed to measure the vibration of vocal cords during speech.However,highfrequency attenuation caused by the frequency response of the flexible sensors and a... Wearable flexible sensors attached on the neck have been developed to measure the vibration of vocal cords during speech.However,highfrequency attenuation caused by the frequency response of the flexible sensors and absorption of high-frequency sound by the skin are obstacles to the practical application of these sensors in speech capture based on bone conduction.In this paper,speech enhancement techniques for enhancing the intelligibility of sensor signals are developed and compared.Four kinds of speech enhancement algorithms based on a fully connected neural network(FCNN),a long short-term memory(LSTM),a bidirectional long short-term memory(BLSTM),and a convolutional-recurrent neural network(CRNN)are adopted to enhance the sensor signals,and their performance after deployment on four kinds of edge and cloud platforms is also investigated.Experimental results show that the BLSTM performs best in improving speech quality,but is poorest with regard to hardware deployment.It improves short-time objective intelligibility(STOI)by 0.18 to nearly 0.80,which corresponds to a good intelligibility level,but it introduces latency as well as being a large model.The CRNN,which improves STOI to about 0.75,ranks second among the four neural networks.It is also the only model that is able to achieves real-time processing with all four hardware platforms,demonstrating its great potential for deployment on mobile platforms.To the best of our knowledge,this is one of the first trials to systematically and specifically develop processing techniques for bone-conduction speed signals captured by flexible sensors.The results demonstrate the possibility of realizing a wearable lightweight speech collection system based on flexible vibration sensors and real-time speech enhancement to compensate for high-frequency attenuation. 展开更多
关键词 Flexible electronics Vibration sensor Neural network speech enhancement Deep learning
下载PDF
Joint Noise Reduction and lp-Norm Minimization for Enhancing Time Delay Estimation in Colored Noise 被引量:1
17
作者 Jingxian Tu Youshen Xia 《Journal of Computer and Communications》 2016年第3期46-53,共8页
Time delay estimation (TDE) is an important issue in signal processing. Conventional TDE algorithms are usually efficient under white noise environments. In this paper, a joint noise reduction and -norm minimization m... Time delay estimation (TDE) is an important issue in signal processing. Conventional TDE algorithms are usually efficient under white noise environments. In this paper, a joint noise reduction and -norm minimization method is presented to enhance TDE in colored noise. An improved subspace method for colored noise reduction is first performed. Then the time delay is estimated by using an -norm minimization method. Because the clean speech signal form the noisy signal is well extracted by noise reduction and the -norm minimization method is robust, the TDE accuracy can be enhanced. Experiment results confirm that the proposed joint estimation method obtains more accurate TDE than several conventional algorithms in colored noise, especially in the case of low signal-to-noise ratio.   展开更多
关键词 Time Delay Estimation speech enhancement Noise Reduction SUBSPACE
下载PDF
Research on separation and enhancement of speech micro-vibration from macro motion 被引量:1
18
作者 陈鸿凯 王挺峰 +1 位作者 吴世松 李远洋 《Optoelectronics Letters》 EI 2020年第6期462-466,共5页
Based on the 1 550 nm all-fiber pulsed laser Doppler vibrometer(LDV) system independently developed by our laboratory, empirical mode decomposition(EMD) and optimally modified Log-spectral amplitude estimator(OM-LSA) ... Based on the 1 550 nm all-fiber pulsed laser Doppler vibrometer(LDV) system independently developed by our laboratory, empirical mode decomposition(EMD) and optimally modified Log-spectral amplitude estimator(OM-LSA) algorithms are associated to separate the speech micro-vibration from the target macro motion. This combined algorithm compensates for the weakness of the EMD algorithm in denoising and the inability of the OM-LSA algorithm on signal separation, achieving separation and simultaneous acquisition of the macro motion and speech micro-vibration of a target. The experimental results indicate that using this combined algorithm, the LDV system can functionally operate within 30 m and gain a 4.21 d B promotion in the signal-to-noise ratio(SNR) relative to a traditional OM-LSA algorithm. 展开更多
关键词 Doppler LDV LSA EMD Research on separation and enhancement of speech micro-vibration from macro motion
原文传递
An enhanced relative spectral processing of speech 被引量:2
19
作者 ZHEN Bin WU Xihong LIU Zhimin CHI Huisheng (Center for Information Science, Peking University Beijing 100871) 《Chinese Journal of Acoustics》 2002年第1期86-96,共11页
An enhanced relative spectral (FLRASTA) technique for speech and speaker recognition is proposed. The new method consists of classical RASTA filtering in logarithmic spectral domain following by another additive RASTA... An enhanced relative spectral (FLRASTA) technique for speech and speaker recognition is proposed. The new method consists of classical RASTA filtering in logarithmic spectral domain following by another additive RASTA filtering in the same domain. In this manner, both the channel distortion and additive noise are removed effectively. In speaker identification and speech recognition experiments on T146 database, the E_RASTA performs equal or better than J_RASTA method in both tasks. The E_RASTA does not need the speech SNR estimation in order to determinate the optimal value of J in J_RASTA, and the information of how the speech degrades. The choice of ERASTA filter also indicates that the low temporal modulation components in speech can deteriorate the performance of both recognition tasks. Besides, the speaker recognition needs less temporal modulation frequency band than that of the speech recognition. 展开更多
关键词 An enhanced relative spectral processing of speech MFCC
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部