期刊文献+
共找到44篇文章
< 1 2 3 >
每页显示 20 50 100
基于特征融合与改进GMM-UBM的方言分析
1
作者 徐芝灿 刘本永 《通信技术》 2023年第4期419-424,共6页
针对单一特征在描述方言间差异性方面存在不足和传统高斯混合通用背景模型(Gaussian Mixture Model-Universal Background Model,GMM-UBM)在训练时存在混叠的问题,在将经验模式分解引入到特征提取的基础上,融合多特征形成高维特征。进一... 针对单一特征在描述方言间差异性方面存在不足和传统高斯混合通用背景模型(Gaussian Mixture Model-Universal Background Model,GMM-UBM)在训练时存在混叠的问题,在将经验模式分解引入到特征提取的基础上,融合多特征形成高维特征。进一步,通过多次训练GMMUBM,筛选出最具区分性的方言模型以提升各方言模型间的区分性。方言种类识别对比实验结果表明,基于融合特征与改进GMM-UBM的方法优于传统方法。 展开更多
关键词 改进gmm-ubm 特征融合 经验模式分解 方言种类识别
下载PDF
基于SDC特征和GMM-UBM模型的自动语种识别 被引量:14
2
作者 姜洪臣 郑榕 +1 位作者 张树武 徐波 《中文信息学报》 CSCD 北大核心 2007年第1期49-53,共5页
本文提出了一种基于SDC特征和GMM-UBM模型的自动语种识别方法。SDC特征由许多语音帧的一阶差分谱连接扩展而成,与传统的MFCC特征相比,包含了更多的时序特征信息。UBM模型反映了所有待识别语种的特征分布特性,借助贝叶斯自适应算法可以... 本文提出了一种基于SDC特征和GMM-UBM模型的自动语种识别方法。SDC特征由许多语音帧的一阶差分谱连接扩展而成,与传统的MFCC特征相比,包含了更多的时序特征信息。UBM模型反映了所有待识别语种的特征分布特性,借助贝叶斯自适应算法可以快速得到每个语种的模型。与传统的GMM方法相比,该方法的训练和识别的速度更快。该方法对OGI电话语音库中11个语种进行了测试,其10秒、30秒和45秒句子的最佳识别正确率分别为72.38%、82.62%和85.23%,识别速度约为0.03倍实时。 展开更多
关键词 计算机应用 中文信息处理 SDC特征 gmm-ubm模型 贝叶斯自适应 自动语种识别
下载PDF
GMM-UBM和SVM在说话人识别中的应用 被引量:7
3
作者 李荟 赵云敏 《计算机系统应用》 2018年第1期225-230,共6页
针对说话识别领域短语音导致的训练数据不充分的问题,选择能够突出说话人个性特征的GMM-UBM作为基线系统模型,并引入SVM解决GMM-UBM导致的系统鲁棒性差的问题.选择不同的核函数对SVM的识别性能有较大的影响,针对多项式核函数泛化能力较... 针对说话识别领域短语音导致的训练数据不充分的问题,选择能够突出说话人个性特征的GMM-UBM作为基线系统模型,并引入SVM解决GMM-UBM导致的系统鲁棒性差的问题.选择不同的核函数对SVM的识别性能有较大的影响,针对多项式核函数泛化能力较强、学习能力较差与径向基核函数学习能力较强、泛化能力较差的特性,对两种单核核函数进行线性加权组合,以使组合核函数兼具各单核的优点.仿真实验结果表明,组合核函数SVM的识别率和等错误率明显优于不引入SVM的GMM-UBM的基线系统及其它三个单核函数,并在不同信噪比情况下也兼顾了系统识别准确率与鲁棒性. 展开更多
关键词 说话人识别 gmm-ubm SVM 组合核函数
下载PDF
基于GMM-UBM说话人模型的连续自适应算法研究 被引量:2
4
作者 张正平 张丽娜 贺松 《通信电源技术》 2016年第2期81-83,共3页
实际应用中与文本无关的说话人识别研究,模型训练的说话人语音一般是有限的。此外,由于说话人自身生理因素的改变、外部采集环境的变化等都可能会导致说话人语音的声学特征发生改变。因此,代表说话人模型的特征分布也在不断变化,从而造... 实际应用中与文本无关的说话人识别研究,模型训练的说话人语音一般是有限的。此外,由于说话人自身生理因素的改变、外部采集环境的变化等都可能会导致说话人语音的声学特征发生改变。因此,代表说话人模型的特征分布也在不断变化,从而造成说话人识别系统识别率下降。文中在说话人自适应技术的基础上,提出了说话人模型的连续自适应算法,解决了因说话人自身声学特征的变化导致识别率下降的问题。 展开更多
关键词 说话人识别 gmm-ubm 最大后验概率 连续自适应
下载PDF
基于GMM-UBM的说话人确认系统的研究
5
作者 霍春宝 张彩娟 赵红敏 《辽宁工业大学学报(自然科学版)》 2012年第2期98-101,157,共4页
在基于GMM的说话人确认系统中,模型的训练是为每个说话人的语音建立模型,然后通过一定的算法找到一组参数?,使似然概率最大。文中通过对GMM的研究提出一种改进的模糊C均值算法(FCM)并将改进后的算法应用到模型初始化中。同时,GMM在话者... 在基于GMM的说话人确认系统中,模型的训练是为每个说话人的语音建立模型,然后通过一定的算法找到一组参数?,使似然概率最大。文中通过对GMM的研究提出一种改进的模糊C均值算法(FCM)并将改进后的算法应用到模型初始化中。同时,GMM在话者确认时,语音数据不足会导致识别率下降,本文采用能覆盖话者语音的高斯混合模型-通用背景模型(GMM-UBM)作为识别模型,通过算法比较及实验分析可知,改进算法后的系统在识别率上明显优于传统的基于GMM的说话人识别系统。 展开更多
关键词 说话人识别 高斯混合模型 EM算法 gmm-ubm
下载PDF
基于GMM-UBM的说话人确认系统的研究
6
作者 霍春宝 张彩娟 赵红敏 《辽宁工业大学学报(自然科学版)》 2012年第3期149-151,157,共4页
在基于GMM的说话人确认系统中,模型的训练是为每个说话人的语音建立模型,然后通过一定的算法找到一组参数元,使似然概率最大。通过对GMM的研究提出一种改进的模糊C均值算法(FCM)并将改进后的算法应用到模型初始化中。同时,GMM在... 在基于GMM的说话人确认系统中,模型的训练是为每个说话人的语音建立模型,然后通过一定的算法找到一组参数元,使似然概率最大。通过对GMM的研究提出一种改进的模糊C均值算法(FCM)并将改进后的算法应用到模型初始化中。同时,GMM在话者确认时,语音数据不足会导致识别率下降.采用能覆盖话者语音的高斯混合模型-通用背景模型(GMM—UBM)作为识别模型,通过算法比较及实验分析可知,改进算法后的系统在识别率上明显优于传统的基于GMM的说话人识别系统。 展开更多
关键词 说话人识别 高斯混合模型 EM算法 gmm-ubm
下载PDF
基于子带GMM-UBM的广播语音多语种识别 被引量:2
7
作者 李思一 戴蓓蒨 王海祥 《数据采集与处理》 CSCD 北大核心 2007年第1期14-18,共5页
提出了一种基于概率统计模型的与语言内容无关的语种识别方法,它不需要掌握各语种的专业语言学知识就可以实现几十种语言的语种识别;并针对广播语音噪声干扰大的特点,采用GMM-UBM模型作为语种模型,提高了系统的噪声鲁棒性;由于广播语音... 提出了一种基于概率统计模型的与语言内容无关的语种识别方法,它不需要掌握各语种的专业语言学知识就可以实现几十种语言的语种识别;并针对广播语音噪声干扰大的特点,采用GMM-UBM模型作为语种模型,提高了系统的噪声鲁棒性;由于广播语音的背景噪声不是简单的全频带加性白噪声,因此本文构建了一种基于子带GMM-UBM模型的多子系统结构的语种识别系统,后端采用神经网络进行系统级融合。本文通过对37种语言及方言的识别实验,证明了子带GMM-UBM方法的有效性。 展开更多
关键词 语种识别 语言内容无关 广播语音 子带GMM—UBM
下载PDF
基于GMM-UBM的声纹识别技术的特征参数研究 被引量:16
8
作者 周玥媛 孔钦 《计算机技术与发展》 2020年第5期76-83,共8页
声纹识别技术实现的关键点在于从语音信号中提取语音特征参数,此参数具备表征说话人特征的能力。基于GMM-UBM模型,通过Matlab实现文本无关的声纹识别系统,对主流静态特征参数MFCC、LPCC、LPC以及结合动态参数的MFCC,从说话人确认与说话... 声纹识别技术实现的关键点在于从语音信号中提取语音特征参数,此参数具备表征说话人特征的能力。基于GMM-UBM模型,通过Matlab实现文本无关的声纹识别系统,对主流静态特征参数MFCC、LPCC、LPC以及结合动态参数的MFCC,从说话人确认与说话人辨认两种应用角度进行性能比较。在取不同特征参数阶数、不同高斯混合度和使用不同时长的训练语音与测试语音的情况下,从理论识别效果、实际识别效果、识别所用时长、识别时长占比等多个方面进行了分析与研究。最终结果表明:在GMM-UBM模式识别方法下,三种静态特征参数中MFCC绝大多数时候具有最佳识别效果,同时其系统识别耗时最长;识别率与语音特征参数的阶数之间并非单调上升关系。静态参数在结合较佳阶数的动态参数时能够提升识别效果;增加动态参数阶数与提高系统识别效果之间无必然联系。 展开更多
关键词 gmm-ubm 声纹识别 特征参数性能 说话人确认 说话人辨认
下载PDF
基于GMM-UBM/SVM的维吾尔语电话语音监控系统 被引量:2
9
作者 李晓阳 伊.达瓦 +1 位作者 吾守尔.斯拉木 勾坂芳典 《计算机应用与软件》 CSCD 北大核心 2012年第1期46-48,77,共4页
讨论基于GMM-UBM/SVM的电话语音监控系统。GMM是说话人识别系统中使用的常用方式。但由于监控语音发话时间短暂,电话-互联网终端及传输线背景噪音大等因素影响了GMM的识别精度。基于GMM的鲁棒性及SVM对小量静态数据具有高分类的优势设... 讨论基于GMM-UBM/SVM的电话语音监控系统。GMM是说话人识别系统中使用的常用方式。但由于监控语音发话时间短暂,电话-互联网终端及传输线背景噪音大等因素影响了GMM的识别精度。基于GMM的鲁棒性及SVM对小量静态数据具有高分类的优势设计电话语音监控系统并通过维吾尔语研讨了系统性能。为了便于比较,同时也讨论了量化距离(VQ)、加权量化距离(WVQ)及基线系统的识别。在50个目标人训练集,每人发话时间为20秒时,对10秒测试语音提案方法识别率对比于VQ和WVQ法分别提高了20.2%及16.7%。 展开更多
关键词 电话语音监控 说话人识别 维吾尔语 gmm-ubm SVM
下载PDF
GMM-UBM声纹识别技术研究与应用 被引量:3
10
作者 沈阳丽 赵启升 《电脑编程技巧与维护》 2017年第16期84-86,共3页
随着信息化技术发展,身份识别的方式逐步多样化,如人脸、指纹、声纹等,其中声纹识别因代价低、移动性好等优点而得到广泛关注。在研究VQ、HMM等传统声纹识别技术的基础上,提出了一种优秀算法GMM-UBM,并将其运用在考勤系统中。实验证明,... 随着信息化技术发展,身份识别的方式逐步多样化,如人脸、指纹、声纹等,其中声纹识别因代价低、移动性好等优点而得到广泛关注。在研究VQ、HMM等传统声纹识别技术的基础上,提出了一种优秀算法GMM-UBM,并将其运用在考勤系统中。实验证明,与传统算法相比,GMM-UBM在识别率和并发度方面具有一定的优越性。 展开更多
关键词 声纹识别 gmm-ubm技术 ANDROID系统 考勤系统
下载PDF
基于GMM-UBM的飞机发动机声音识别方法研究 被引量:2
11
作者 杨毫鸽 孙成立 《计算机科学与应用》 2017年第8期781-787,共7页
高斯混合模型–通用背景模型(Gaussian mixture model-universal background model, GMM-UBM)是说话人识别技术中最为常用的模型,该模型在诸多试验中都取得了很好的效果。本设计探索把GMM-UBM模型用在异常声音检测中,通过对飞机发动机... 高斯混合模型–通用背景模型(Gaussian mixture model-universal background model, GMM-UBM)是说话人识别技术中最为常用的模型,该模型在诸多试验中都取得了很好的效果。本设计探索把GMM-UBM模型用在异常声音检测中,通过对飞机发动机声音信号的处理,提取梅尔频率倒谱(MFCC)特征参数,训练UBM模型,用MAP自适应的算法得到GMM-UBM模型,用GMM-UBM模型检测识别发动机声音。实验证明,该方法优化了由于外界干扰变化导致的识别率下降的问题。 展开更多
关键词 说话人识别 gmm-ubm MFCC 异常声音检测 MAP
下载PDF
A Cell Condition-Sensitive Frequency Segmentation Method Based on the Sub-Band Instantaneous Energy Spectrum of Aluminum Electrolysis Cell Voltage 被引量:1
12
作者 Zhaohui Zeng Weihua Gui +3 位作者 Xiaofang Chen Yongfang Xie Hongliang Zhang Yubo Sun 《Engineering》 SCIE EI 2021年第9期1282-1292,共11页
Cell voltage is a widely used signal that can be measured online from an industrial aluminum electrolysis cell.A variety of parameters for the analysis and control of industrial cells are calculated using the cell vol... Cell voltage is a widely used signal that can be measured online from an industrial aluminum electrolysis cell.A variety of parameters for the analysis and control of industrial cells are calculated using the cell voltage.In this paper,the frequency segmentation of cell voltage is used as the basis for designing filters to obtain these parameters.Based on the qualitative analysis of the cell voltage,the sub-band instantaneous energy spectrum(SIEP)is first proposed,which is then used to quantitatively represent the characteristics of the designated frequency bands of the cell voltage under various cell conditions.Ultimately,a cell condition-sensitive frequency segmentation method is given.The proposed frequency segmentation method divides the effective frequency band into the[0,0.001]Hz band of lowfrequency signals and the[0.001,0.050]Hz band of low-frequency noise,and subdivides the lowfrequency noise into the[0.001,0.010]Hz band of metal pad abnormal rolling and the[0.01,0.05]Hz band of sub-low-frequency noise.Compared with the instantaneous energy spectrum based on empirical mode decomposition,the SIEP more finely represents the law of energy change with time in any designated frequency band within the effective frequency band of the cell voltage.The proposed frequency segmentation method is more sensitive to cell condition changes and can obtain more elaborate details of online cell condition information,thus providing a more reliable and accurate online basis for cell condition monitoring and control decisions. 展开更多
关键词 sub-band instantaneous energy spectrum Cell condition-sensitive frequency band Frequency segmentation Metal pad abnormal rolling Aluminum electrolysis
下载PDF
A fuzzy adaptive smoothing approach to robust endpoint detection based on MDL using sub-band speech
13
作者 王明政 张文军 +1 位作者 李建华 诸鸿文 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2005年第6期705-709,共5页
To develop a more robust endpoint detection algorithm, this paper first proposes a fuzzy adaptive smoothing algorithm. The general idea underlying adaptive smoothing is to adapt the short-term sub-band mean of the amp... To develop a more robust endpoint detection algorithm, this paper first proposes a fuzzy adaptive smoothing algorithm. The general idea underlying adaptive smoothing is to adapt the short-term sub-band mean of the amplitude to the local attributes of speech on the basis of discontinuity measures. The adaptive smoothing algorithm in this paper utilizes a scale-space framework through the minimal description length (MDL). We recommend using the fuzzy muhi-attribute decision making approach to select the proper sub-bands where the word boundary can be more reliably detected. The process and simulation of the fuzzy adaptive smoothing algorithm are given. The parameters utilize the mean amplitude of the audible frequency range (300 -3 700 Hz) and the sub-band mean of the amplitude (16 band filter-bank). We selected the audible band energy because of its usefulness in detecting high-energy regions and making the distinction between speech and noise. Otherwise, the fuzzy adaptive smoothing algorithm is processed in sub-band speech to utilize the full range of frequency information. 展开更多
关键词 ROBUSTNESS endpoint detection sub-band SMOOTHING MDL( minimal description length)
下载PDF
Sub-band ICA with selection criterion for BBS of dependent mages
14
作者 陈建国 王奉涛 +2 位作者 朱泓 郭正刚 张洪印 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2011年第4期113-118,共6页
Because of the correlation of images,the efficiency of the standard ICA is not satisfied in the blind source separation (BSS) of image.Therefore,a new method of sub-band ICA with selection criterion is proposed for th... Because of the correlation of images,the efficiency of the standard ICA is not satisfied in the blind source separation (BSS) of image.Therefore,a new method of sub-band ICA with selection criterion is proposed for this problem.Firstly,the sub-bands of the new method are made up of the wavelet packets (WP) coefficients.Secondly,the selection criterion of the new method is a combination of the mutual information (MI),kurtosis and sparsity.One sub-band or a sub-bands group obtained from the new method are more suitable as the inputs parameters of the algorithm of ICA than mixed images.The new method has been applied into the BSS of partially dependent images and highly dependent images successfully.According to the separation experiments,it is shown that the separation efficacy of the new method is more accurate and robust. 展开更多
关键词 sub-band decomposition independent component analysis wavelet packets mutual information KURTOSIS SPARSITY
下载PDF
Feature Conditioning Based on DWT Sub-Bands Selection on Proposed Channels in BCI Speller
15
作者 Bahram Perseh Majid Kiamini Sepideh Jabbari 《Journal of Biomedical Science and Engineering》 2017年第3期120-133,共14页
In this paper, we present a novel and efficient scheme for detection of P300 component of the event-related potential in the Brain Computer Interface (BCI) speller paradigm that needs significantly less EEG channels a... In this paper, we present a novel and efficient scheme for detection of P300 component of the event-related potential in the Brain Computer Interface (BCI) speller paradigm that needs significantly less EEG channels and uses a minimal subset of effective features. Removing unnecessary channels and reducing the feature dimension resulted in lower cost and shorter time and thus improved the BCI implementation. The idea was to employ a proper method to optimize the number of channels and feature vectors while keeping high accuracy in classification performance. Optimal channel selection was based on both discriminative criteria and forward-backward investigation. Besides, we obtained a minimal subset of effective features by choosing the discriminant coefficients of wavelet decomposition. Our algorithm was tested on dataset II of the BCI competition 2005. We achieved 92% accuracy using a simple LDA classifier, as compared with the second best result in BCI 2005 with an accuracy of 90.5% using SVM for classification which required more computation, and against the highest accuracy of 96.5% in BCI 2005 that used SVM and much more channels requiring excessive calculations. We also applied our proposed scheme on Hoffmann’s dataset to evaluate the effectiveness of channel reduction and achieved acceptable results. 展开更多
关键词 Brain Computer Interface P300 Component OPTIMAL sub-bands OPTIMAL CHANNELS Linear DISCRIMINANT Analysis
下载PDF
Video Compression USING a New Active Mesh Based Motion Compensation Algorithm in Wavelet Sub-Bands
16
作者 Mohammad Hossein Bisjerdi Alireza Behrad 《Journal of Signal and Information Processing》 2012年第3期368-376,共9页
In this paper, a new mesh based algorithm is applied for motion estimation and compensation in the wavelet domain. The first major contribution of this work is the introduction of a new active mesh based method for mo... In this paper, a new mesh based algorithm is applied for motion estimation and compensation in the wavelet domain. The first major contribution of this work is the introduction of a new active mesh based method for motion estimation and compensation. The proposed algorithm is based on the mesh energy minimization with novel sets of energy functions. The proposed energy functions have appropriate features, which improve the accuracy of motion estimation and compensation algorithm. We employ the proposed motion estimation algorithm in two different manners for video compression. In the first approach, the proposed algorithm is employed for motion estimation of consecutive frames. In the second approach, the algorithm is applied for motion estimation and compensation in the wavelet sub-bands. The experimental results reveal that the incorporation of active mesh based motion-compensated temporal filtering into wavelet sub-bands significantly improves the distortion performance rate of the video compression. We also use a new wavelet coder for the coding of the 3D volume of coefficients based on the retained energy criteria. This coder gives the maximum retained energy in all sub-bands. The proposed algorithm was tested with some video sequences and the results showed that the use of the proposed active mesh method for motion compensation and its implementation in sub-bands yields significant improvement in PSNR performance. 展开更多
关键词 Motion Estimation and COMPENSATION VIDEO Compression ACTIVE MESH Method WAVELET sub-bands
下载PDF
A method to compress vibration signals using wavelet packet transformation combined with sub-band vector quantization
17
作者 翁浩 Gao Jinji Jiang Zhinong 《High Technology Letters》 EI CAS 2013年第4期443-448,共6页
A novel compression method for mechanical vibrating signals,binding with sub-band vector quantization(SVQ) by wavelet packet transformation(WPT) and discrete cosine transformation(DCT) is proposed.Firstly,the vibratin... A novel compression method for mechanical vibrating signals,binding with sub-band vector quantization(SVQ) by wavelet packet transformation(WPT) and discrete cosine transformation(DCT) is proposed.Firstly,the vibrating signal is decomposed into sub-bands by WPT.Then DCT and adaptive bit allocation are done per sub-band and SVQ is performed in each sub-band.It is noted that,after DCT,we only need to code the first components whose numbers are determined by the bits allocated to that sub-band.Through an actual signal,our algorithm is proven to improve the signal-to-noise ratio(SNR) of the reconstructed signal effectively,especially in the situation of lowrate transmission. 展开更多
关键词 vibration signal compression wavelet packet transformation (WPT) discrete cosine transformation (DCT) sub-band vector quantization (SVQ)
下载PDF
Information Hiding Method Based on Block DWT Sub-Band Feature Encoding
18
作者 Qiudong SUN Wenxin MA +1 位作者 Wenying YAN Hong DAI 《Journal of Software Engineering and Applications》 2009年第5期383-387,共5页
For realizing of long text information hiding and covert communication, a binary watermark sequence was obtained firstly from a text file and encoded by a redundant encoding method. Then, two neighboring blocks were s... For realizing of long text information hiding and covert communication, a binary watermark sequence was obtained firstly from a text file and encoded by a redundant encoding method. Then, two neighboring blocks were selected at each time from the Hilbert scanning sequence of carrier image blocks, and transformed by 1-level discrete wavelet transformation (DWT). And then the double block based JNDs (just noticeable difference) were calculated with a visual model. According to the different codes of each two watermark bits, the average values of two corresponding detail sub-bands were modified by using one of JNDs to hide information into carrier image. The experimental results show that the hidden information is invisible to human eyes, and the algorithm is robust to some common image processing operations. The conclusion is that the algorithm is effective and practical. 展开更多
关键词 sub-band FEATURE ENCODING REDUNDANT ENCODING Visual Model Discrete WAVELET Transformation Information Hiding
下载PDF
Skin Lesion Classification System Using Shearlets
19
作者 S.Mohan Kumar T.Kumanan 《Computer Systems Science & Engineering》 SCIE EI 2023年第1期833-844,共12页
The main cause of skin cancer is the ultraviolet radiation of the sun.It spreads quickly to other body parts.Thus,early diagnosis is required to decrease the mortality rate due to skin cancer.In this study,an automati... The main cause of skin cancer is the ultraviolet radiation of the sun.It spreads quickly to other body parts.Thus,early diagnosis is required to decrease the mortality rate due to skin cancer.In this study,an automatic system for Skin Lesion Classification(SLC)using Non-Subsampled Shearlet Transform(NSST)based energy features and Support Vector Machine(SVM)classifier is proposed.Atfirst,the NSST is used for the decomposition of input skin lesion images with different directions like 2,4,8 and 16.From the NSST’s sub-bands,energy fea-tures are extracted and stored in the feature database for training.SVM classifier is used for the classification of skin lesion images.The dermoscopic skin images are obtained from PH^(2) database which comprises of 200 dermoscopic color images with melanocytic lesions.The performances of the SLC system are evaluated using the confusion matrix and Receiver Operating Characteristic(ROC)curves.The SLC system achieves 96%classification accuracy using NSST’s energy fea-tures obtained from 3^(rd) level with 8-directions. 展开更多
关键词 Skin lesion classification non-subsampled shearlet transform sub-band coefficients energy feature support vector machine
下载PDF
Speech Separation Algorithm Using Gated Recurrent Network Based on Microphone Array
20
作者 Xiaoyan Zhao Lin Zhou +2 位作者 Yue Xie Ying Tong Jingang Shi 《Intelligent Automation & Soft Computing》 SCIE 2023年第6期3087-3100,共14页
Speech separation is an active research topic that plays an important role in numerous applications,such as speaker recognition,hearing pros-thesis,and autonomous robots.Many algorithms have been put forward to improv... Speech separation is an active research topic that plays an important role in numerous applications,such as speaker recognition,hearing pros-thesis,and autonomous robots.Many algorithms have been put forward to improve separation performance.However,speech separation in reverberant noisy environment is still a challenging task.To address this,a novel speech separation algorithm using gate recurrent unit(GRU)network based on microphone array has been proposed in this paper.The main aim of the proposed algorithm is to improve the separation performance and reduce the computational cost.The proposed algorithm extracts the sub-band steered response power-phase transform(SRP-PHAT)weighted by gammatone filter as the speech separation feature due to its discriminative and robust spatial position in formation.Since the GRU net work has the advantage of processing time series data with faster training speed and fewer training parameters,the GRU model is adopted to process the separation featuresof several sequential frames in the same sub-band to estimate the ideal Ratio Masking(IRM).The proposed algorithm decomposes the mixture signals into time-frequency(TF)units using gammatone filter bank in the frequency domain,and the target speech is reconstructed in the frequency domain by masking the mixture signal according to the estimated IRM.The operations of decomposing the mixture signal and reconstructing the target signal are completed in the frequency domain which can reduce the total computational cost.Experimental results demonstrate that the proposed algorithm realizes omnidirectional speech sep-aration in noisy and reverberant environments,provides good performance in terms of speech quality and intelligibility,and has the generalization capacity to reverberate. 展开更多
关键词 Microphone array speech separation gate recurrent unit network gammatone sub-band steered response power-phase transform spatial spectrum
下载PDF
上一页 1 2 3 下一页 到第
使用帮助 返回顶部