期刊文献+
共找到9篇文章
< 1 >
每页显示 20 50 100
Speech Separation Methodology for Hearing Aid
1
作者 Joseph Sathiadhas Esra Y.Sukhi 《Computer Systems Science & Engineering》 SCIE EI 2023年第2期1659-1678,共20页
In the design of hearing aids(HA),the real-time speech-enhancement is done.The digital hearing aids should provide high signal-to-noise ratio,gain improvement and should eliminate feedback.In generic hearing aids the ... In the design of hearing aids(HA),the real-time speech-enhancement is done.The digital hearing aids should provide high signal-to-noise ratio,gain improvement and should eliminate feedback.In generic hearing aids the perfor-mance towards different frequencies varies and non uniform.Existing noise can-cellation and speech separation methods drops the voice magnitude under the noise environment.The performance of the HA for frequency response is non uni-form.Existing noise suppression methods reduce the required signal strength also.So,the performance of uniform sub band analysis is poor when hearing aid is con-cern.In this paper,a speech separation method using Non-negative Matrix Fac-torization(NMF)algorithm is proposed for wavelet decomposition.The Proposed non-uniformfilter-bank was validated by parameters like band power,Signal-to-noise ratio(SNR),Mean Square Error(MSE),Signal to Noise and Dis-tortion Ratio(SINAD),Spurious-free dynamic range(SFDR),error and time.The speech recordings before and after separation was evaluated for quality using objective speech quality measures International Telecommunication Union-Telecommunication standard ITU-T P.862. 展开更多
关键词 speech separation waveletfilter independent component analysis(ICA) non-negative matrix factorization(NMF) fejer-korovkin(FK) signal-to-noise ratio(SNR)
下载PDF
Speech Separation Algorithm Using Gated Recurrent Network Based on Microphone Array
2
作者 Xiaoyan Zhao Lin Zhou +2 位作者 Yue Xie Ying Tong Jingang Shi 《Intelligent Automation & Soft Computing》 SCIE 2023年第6期3087-3100,共14页
Speech separation is an active research topic that plays an important role in numerous applications,such as speaker recognition,hearing pros-thesis,and autonomous robots.Many algorithms have been put forward to improv... Speech separation is an active research topic that plays an important role in numerous applications,such as speaker recognition,hearing pros-thesis,and autonomous robots.Many algorithms have been put forward to improve separation performance.However,speech separation in reverberant noisy environment is still a challenging task.To address this,a novel speech separation algorithm using gate recurrent unit(GRU)network based on microphone array has been proposed in this paper.The main aim of the proposed algorithm is to improve the separation performance and reduce the computational cost.The proposed algorithm extracts the sub-band steered response power-phase transform(SRP-PHAT)weighted by gammatone filter as the speech separation feature due to its discriminative and robust spatial position in formation.Since the GRU net work has the advantage of processing time series data with faster training speed and fewer training parameters,the GRU model is adopted to process the separation featuresof several sequential frames in the same sub-band to estimate the ideal Ratio Masking(IRM).The proposed algorithm decomposes the mixture signals into time-frequency(TF)units using gammatone filter bank in the frequency domain,and the target speech is reconstructed in the frequency domain by masking the mixture signal according to the estimated IRM.The operations of decomposing the mixture signal and reconstructing the target signal are completed in the frequency domain which can reduce the total computational cost.Experimental results demonstrate that the proposed algorithm realizes omnidirectional speech sep-aration in noisy and reverberant environments,provides good performance in terms of speech quality and intelligibility,and has the generalization capacity to reverberate. 展开更多
关键词 Microphone array speech separation gate recurrent unit network gammatone sub-band steered response power-phase transform spatial spectrum
下载PDF
Binaural Speech Separation Algorithm Based on Long and Short Time Memory Networks 被引量:1
3
作者 Lin Zhou Siyuan Lu +3 位作者 Qiuyue Zhong Ying Chen Yibin Tang Yan Zhou 《Computers, Materials & Continua》 SCIE EI 2020年第6期1373-1386,共14页
Speaker separation in complex acoustic environment is one of challenging tasks in speech separation.In practice,speakers are very often unmoving or moving slowly in normal communication.In this case,the spatial featur... Speaker separation in complex acoustic environment is one of challenging tasks in speech separation.In practice,speakers are very often unmoving or moving slowly in normal communication.In this case,the spatial features among the consecutive speech frames become highly correlated such that it is helpful for speaker separation by providing additional spatial information.To fully exploit this information,we design a separation system on Recurrent Neural Network(RNN)with long short-term memory(LSTM)which effectively learns the temporal dynamics of spatial features.In detail,a LSTM-based speaker separation algorithm is proposed to extract the spatial features in each time-frequency(TF)unit and form the corresponding feature vector.Then,we treat speaker separation as a supervised learning problem,where a modified ideal ratio mask(IRM)is defined as the training function during LSTM learning.Simulations show that the proposed system achieves attractive separation performance in noisy and reverberant environments.Specifically,during the untrained acoustic test with limited priors,e.g.,unmatched signal to noise ratio(SNR)and reverberation,the proposed LSTM based algorithm can still outperforms the existing DNN based method in the measures of PESQ and STOI.It indicates our method is more robust in untrained conditions. 展开更多
关键词 Binaural speech separation long and short time memory networks feature vectors ideal ratio mask
下载PDF
Microphone Array Speech Separation Algorithm Based on TC-ResNet
4
作者 Lin Zhou Yue Xu +2 位作者 Tianyi Wang Kun Feng Jingang Shi 《Computers, Materials & Continua》 SCIE EI 2021年第11期2705-2716,共12页
Traditional separation methods have limited ability to handle the speech separation problem in high reverberant and low signal-to-noise ratio(SNR)environments,and thus achieve unsatisfactory results.In this study,a co... Traditional separation methods have limited ability to handle the speech separation problem in high reverberant and low signal-to-noise ratio(SNR)environments,and thus achieve unsatisfactory results.In this study,a convolutional neural network with temporal convolution and residual network(TC-ResNet)is proposed to realize speech separation in a complex acoustic environment.A simplified steered-response power phase transform,denoted as GSRP-PHAT,is employed to reduce the computational cost.The extracted features are reshaped to a special tensor as the system inputs and implements temporal convolution,which not only enlarges the receptive field of the convolution layer but also significantly reduces the network computational cost.Residual blocks are used to combine multiresolution features and accelerate the training procedure.A modified ideal ratio mask is applied as the training target.Simulation results demonstrate that the proposed microphone array speech separation algorithm based on TC-ResNet achieves a better performance in terms of distortion ratio,source-to-interference ratio,and short-time objective intelligibility in low SNR and high reverberant environments,particularly in untrained situations.This indicates that the proposed method has generalization to untrained conditions. 展开更多
关键词 Residual networks temporal convolution neural networks speech separation
下载PDF
Improving Deep Attractor Network by BGRU and GMM for Speech Separation
5
作者 Rawad Melhem Assef Jafar Riad Hamadeh 《Journal of Harbin Institute of Technology(New Series)》 CAS 2021年第3期90-96,共7页
Deep Attractor Network(DANet) is the state-of-the-art technique in speech separation field, which uses Bidirectional Long Short-Term Memory(BLSTM), but the complexity of the DANet model is very high. In this paper, a ... Deep Attractor Network(DANet) is the state-of-the-art technique in speech separation field, which uses Bidirectional Long Short-Term Memory(BLSTM), but the complexity of the DANet model is very high. In this paper, a simplified and powerful DANet model is proposed using Bidirectional Gated neural network(BGRU) instead of BLSTM. The Gaussian Mixture Model(GMM) other than the k-means was applied in DANet as a clustering algorithm to reduce the complexity and increase the learning speed and accuracy. The metrics used in this paper are Signal to Distortion Ratio(SDR), Signal to Interference Ratio(SIR), Signal to Artifact Ratio(SAR), and Perceptual Evaluation Speech Quality(PESQ) score. Two speaker mixture datasets from TIMIT corpus were prepared to evaluate the proposed model, and the system achieved 12.3 dB and 2.94 for SDR and PESQ scores respectively, which were better than the original DANet model. Other improvements were 20.7% and 17.9% in the number of parameters and time training respectively. The model was applied on mixed Arabic speech signals and the results were better than that in English. 展开更多
关键词 attractor network speech separation gated recurrent units
下载PDF
Latent source-specific generative factor learning for monaural speech separation using weighted-factor autoencoder
6
作者 Jing-jing CHEN Qi-rong MAO +2 位作者 You-cai QIN Shuang-qing QIAN Zhi-shen ZHENG 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2020年第11期1639-1650,共12页
Much recent progress in monaural speech separation(MSS)has been achieved through a series of deep learning architectures based on autoencoders,which use an encoder to condense the input signal into compressed features... Much recent progress in monaural speech separation(MSS)has been achieved through a series of deep learning architectures based on autoencoders,which use an encoder to condense the input signal into compressed features and then feed these features into a decoder to construct a specific audio source of interest.However,these approaches can neither learn generative factors of the original input for MSS nor construct each audio source in mixed speech.In this study,we propose a novel weighted-factor autoencoder(WFAE)model for MSS,which introduces a regularization loss in the objective function to isolate one source without containing other sources.By incorporating a latent attention mechanism and a supervised source constructor in the separation layer,WFAE can learn source-specific generative factors and a set of discriminative features for each source,leading to MSS performance improvement.Experiments on benchmark datasets show that our approach outperforms the existing methods.In terms of three important metrics,WFAE has great success on a relatively challenging MSS case,i.e.,speaker-independent MSS. 展开更多
关键词 speech separation Generative factors Autoencoder Deep learning
原文传递
Recent Progresses in Deep Learning Based Acoustic Models 被引量:9
7
作者 Dong Yu Jinyu Li 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2017年第3期396-409,共14页
In this paper,we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.We first discuss models such as recurrent neural networks(RNNs) a... In this paper,we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.We first discuss models such as recurrent neural networks(RNNs) and convolutional neural networks(CNNs) that can effectively exploit variablelength contextual information,and their various combination with other models.We then describe models that are optimized end-to-end and emphasize on feature representations learned jointly with the rest of the system,the connectionist temporal classification(CTC) criterion,and the attention-based sequenceto-sequence translation model.We further illustrate robustness issues in speech recognition systems,and discuss acoustic model adaptation,speech enhancement and separation,and robust training strategies.We also cover modeling techniques that lead to more efficient decoding and discuss possible future directions in acoustic model research. 展开更多
关键词 Attention model convolutional neural network(CNN) connectionist temporal classification(CTC) deep learning(DL) long short-term memory(LSTM) permutation invariant training speech adaptation speech processing speech recognition speech separation
下载PDF
Research on separation and enhancement of speech micro-vibration from macro motion 被引量:2
8
作者 陈鸿凯 王挺峰 +1 位作者 吴世松 李远洋 《Optoelectronics Letters》 EI 2020年第6期462-466,共5页
Based on the 1 550 nm all-fiber pulsed laser Doppler vibrometer(LDV) system independently developed by our laboratory, empirical mode decomposition(EMD) and optimally modified Log-spectral amplitude estimator(OM-LSA) ... Based on the 1 550 nm all-fiber pulsed laser Doppler vibrometer(LDV) system independently developed by our laboratory, empirical mode decomposition(EMD) and optimally modified Log-spectral amplitude estimator(OM-LSA) algorithms are associated to separate the speech micro-vibration from the target macro motion. This combined algorithm compensates for the weakness of the EMD algorithm in denoising and the inability of the OM-LSA algorithm on signal separation, achieving separation and simultaneous acquisition of the macro motion and speech micro-vibration of a target. The experimental results indicate that using this combined algorithm, the LDV system can functionally operate within 30 m and gain a 4.21 d B promotion in the signal-to-noise ratio(SNR) relative to a traditional OM-LSA algorithm. 展开更多
关键词 Doppler LDV LSA EMD Research on separation and enhancement of speech micro-vibration from macro motion
原文传递
Core processing neuron-enabled circuit motifs for neuromorphic computing
9
作者 Hanxi Li Jiayang Hu +9 位作者 Anzhe Chen Yishu Zhang Chenhao Wang Beiduo Wang Yi Tong Jiachao Zhou Kian Ping Loh Yang Xu Tawfique Hasan Bin Yu 《InfoMat》 SCIE CSCD 2023年第11期78-88,共11页
Based on brain-inspired computing frameworks,neuromorphic systems implement large-scale neural networks in hardware.Although rapid advances have been made in the development of artificial neurons and synapses in recen... Based on brain-inspired computing frameworks,neuromorphic systems implement large-scale neural networks in hardware.Although rapid advances have been made in the development of artificial neurons and synapses in recent years,further research is beyond these individual components and focuses on neuronal circuit motifs with specialized excitatory-inhibitory(E-I)connectivity patterns.In this study,we demonstrate a core processor that can be used to construct commonly used neuronal circuits.The neuron,featuring an ultracompact physical configuration,integrates a volatile threshold switch with a gate-modulated two-dimensional(2D)MoS_(2) field-effect channel to process complex E-I spatiotemporal spiking signals.Consequently,basic neuronal circuits are constructed for biorealistic neuromorphic computing.For practical applications,an algorithm-hardware co-design is implemented in a gatecontrolled spiking neural network with substantial performance improvement in human speech separation. 展开更多
关键词 artificial intelligence hardware excitatory-inhibitory neurons neuronal circuit motifs speech separation spiking neural networks
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部