期刊文献+
共找到235篇文章
< 1 2 12 >
每页显示 20 50 100
Improved Speech Emotion Recognition Focusing on High-Level Data Representations and Swift Feature Extraction Calculation
1
作者 Akmalbek Abdusalomov Alpamis Kutlimuratov +1 位作者 Rashid Nasimov Taeg Keun Whangbo 《Computers, Materials & Continua》 SCIE EI 2023年第12期2915-2933,共19页
The performance of a speech emotion recognition(SER)system is heavily influenced by the efficacy of its feature extraction techniques.The study was designed to advance the field of SER by optimizing feature extraction... The performance of a speech emotion recognition(SER)system is heavily influenced by the efficacy of its feature extraction techniques.The study was designed to advance the field of SER by optimizing feature extraction tech-niques,specifically through the incorporation of high-resolution Mel-spectrograms and the expedited calculation of Mel Frequency Cepstral Coefficients(MFCC).This initiative aimed to refine the system’s accuracy by identifying and mitigating the shortcomings commonly found in current approaches.Ultimately,the primary objective was to elevate both the intricacy and effectiveness of our SER model,with a focus on augmenting its proficiency in the accurate identification of emotions in spoken language.The research employed a dual-strategy approach for feature extraction.Firstly,a rapid computation technique for MFCC was implemented and integrated with a Bi-LSTM layer to optimize the encoding of MFCC features.Secondly,a pretrained ResNet model was utilized in conjunction with feature Stats pooling and dense layers for the effective encoding of Mel-spectrogram attributes.These two sets of features underwent separate processing before being combined in a Convolutional Neural Network(CNN)outfitted with a dense layer,with the aim of enhancing their representational richness.The model was rigorously evaluated using two prominent databases:CMU-MOSEI and RAVDESS.Notable findings include an accuracy rate of 93.2%on the CMU-MOSEI database and 95.3%on the RAVDESS database.Such exceptional performance underscores the efficacy of this innovative approach,which not only meets but also exceeds the accuracy benchmarks established by traditional models in the field of speech emotion recognition. 展开更多
关键词 feature extraction MFCC ResNet speech emotion recognition
下载PDF
Wake-Up-Word Feature Extraction on FPGA
2
作者 Veton ZKepuska Mohamed MEljhani Brian HHight 《World Journal of Engineering and Technology》 2014年第1期1-12,共12页
Wake-Up-Word Speech Recognition task (WUW-SR) is a computationally very demand, particularly the stage of feature extraction which is decoded with corresponding Hidden Markov Models (HMMs) in the back-end stage of the... Wake-Up-Word Speech Recognition task (WUW-SR) is a computationally very demand, particularly the stage of feature extraction which is decoded with corresponding Hidden Markov Models (HMMs) in the back-end stage of the WUW-SR. The state of the art WUW-SR system is based on three different sets of features: Mel-Frequency Cepstral Coefficients (MFCC), Linear Predictive Coding Coefficients (LPC), and Enhanced Mel-Frequency Cepstral Coefficients (ENH_MFCC). In (front-end of Wake-Up-Word Speech Recognition System Design on FPGA) [1], we presented an experimental FPGA design and implementation of a novel architecture of a real-time spectrogram extraction processor that generates MFCC, LPC, and ENH_MFCC spectrograms simultaneously. In this paper, the details of converting the three sets of spectrograms 1) Mel-Frequency Cepstral Coefficients (MFCC), 2) Linear Predictive Coding Coefficients (LPC), and 3) Enhanced Mel-Frequency Cepstral Coefficients (ENH_MFCC) to their equivalent features are presented. In the WUW- SR system, the recognizer’s frontend is located at the terminal which is typically connected over a data network to remote back-end recognition (e.g., server). The WUW-SR is shown in Figure 1. The three sets of speech features are extracted at the front-end. These extracted features are then compressed and transmitted to the server via a dedicated channel, where subsequently they are decoded. 展开更多
关键词 speech Recognition System feature extraction Mel-Frequency Cepstral Coefficients Linear Predictive Coding Coefficients Enhanced Mel-Frequency Cepstral Coefficients Hidden Markov Models Field-Programmable Gate Arrays
下载PDF
Principal Component Feature for ANN-Based Speech Recognition
3
作者 顾明亮 王太君 +1 位作者 史笑兴 何振亚 《Journal of Southeast University(English Edition)》 EI CAS 1998年第2期13-18,共6页
Using function approximation technology and principal component analysis method, this paper presents a principal component feature to solve the time alignment problem and to simplify the structure of neural network. I... Using function approximation technology and principal component analysis method, this paper presents a principal component feature to solve the time alignment problem and to simplify the structure of neural network. Its extraction simulates the processing of speech information in human auditory system. The experimental results show that the principal component feature based recognition system outperforms the standard CDHMM and GMDS method in many aspects. 展开更多
关键词 principal component analysis feature extraction speech recognition
下载PDF
Robust Speech Recognition System Using Conventional and Hybrid Features of MFCC,LPCC,PLP,RASTA-PLP and Hidden Markov Model Classifier in Noisy Conditions 被引量:7
4
作者 Veton Z.Kepuska Hussien A.Elharati 《Journal of Computer and Communications》 2015年第6期1-9,共9页
In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance... In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance degradation in noisy conditions or distorted channels. It is necessary to search for more robust feature extraction methods to gain better performance in adverse conditions. This paper investigates the performance of conventional and new hybrid speech feature extraction algorithms of Mel Frequency Cepstrum Coefficient (MFCC), Linear Prediction Coding Coefficient (LPCC), perceptual linear production (PLP), and RASTA-PLP in noisy conditions through using multivariate Hidden Markov Model (HMM) classifier. The behavior of the proposal system is evaluated using TIDIGIT human voice dataset corpora, recorded from 208 different adult speakers in both training and testing process. The theoretical basis for speech processing and classifier procedures were presented, and the recognition results were obtained based on word recognition rate. 展开更多
关键词 speech Recognition Noisy Conditions feature extraction Mel-Frequency Cepstral Coefficients Linear Predictive Coding Coefficients Perceptual Linear Production RASTA-PLP Isolated speech Hidden Markov Model
下载PDF
Feature Optimization of Speech Emotion Recognition
5
作者 Chunxia Yu Ling Xie Weiping Hu 《Journal of Biomedical Science and Engineering》 2016年第10期37-43,共8页
Speech emotion is divided into four categories, Fear, Happy, Neutral and Surprise in this paper. Traditional features and their statistics are generally applied to recognize speech emotion. In order to quantify each f... Speech emotion is divided into four categories, Fear, Happy, Neutral and Surprise in this paper. Traditional features and their statistics are generally applied to recognize speech emotion. In order to quantify each feature’s contribution to emotion recogni-tion, a method based on the Back Propagation (BP) neural network is adopted. Then we can obtain the optimal subset of the features. What’s more, two new characteristics of speech emotion, MFCC feature extracted from the fundamental frequency curve (MFCCF0) and amplitude perturbation parameters extracted from the short- time av-erage magnitude curve (APSAM), are added to the selected features. With the Gaus-sian Mixture Model (GMM), we get the highest average recognition rate of the four emotions 82.25%, and the recognition rate of Neutral 90%. 展开更多
关键词 speech Emotion Recognition feature Selection feature extraction BP Neural Network GMM
下载PDF
Multilayer Neural Network Based Speech Emotion Recognition for Smart Assistance 被引量:2
6
作者 Sandeep Kumar MohdAnul Haq +4 位作者 Arpit Jain C.Andy Jason Nageswara Rao Moparthi Nitin Mittal Zamil S.Alzamil 《Computers, Materials & Continua》 SCIE EI 2023年第1期1523-1540,共18页
Day by day,biometric-based systems play a vital role in our daily lives.This paper proposed an intelligent assistant intended to identify emotions via voice message.A biometric system has been developed to detect huma... Day by day,biometric-based systems play a vital role in our daily lives.This paper proposed an intelligent assistant intended to identify emotions via voice message.A biometric system has been developed to detect human emotions based on voice recognition and control a few electronic peripherals for alert actions.This proposed smart assistant aims to provide a support to the people through buzzer and light emitting diodes(LED)alert signals and it also keep track of the places like households,hospitals and remote areas,etc.The proposed approach is able to detect seven emotions:worry,surprise,neutral,sadness,happiness,hate and love.The key elements for the implementation of speech emotion recognition are voice processing,and once the emotion is recognized,the machine interface automatically detects the actions by buzzer and LED.The proposed system is trained and tested on various benchmark datasets,i.e.,Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS)database,Acoustic-Phonetic Continuous Speech Corpus(TIMIT)database,Emotional Speech database(Emo-DB)database and evaluated based on various parameters,i.e.,accuracy,error rate,and time.While comparing with existing technologies,the proposed algorithm gave a better error rate and less time.Error rate and time is decreased by 19.79%,5.13 s.for the RAVDEES dataset,15.77%,0.01 s for the Emo-DB dataset and 14.88%,3.62 for the TIMIT database.The proposed model shows better accuracy of 81.02%for the RAVDEES dataset,84.23%for the TIMIT dataset and 85.12%for the Emo-DB dataset compared to Gaussian Mixture Modeling(GMM)and Support Vector Machine(SVM)Model. 展开更多
关键词 speech emotion recognition classifier implementation feature extraction and selection smart assistance
下载PDF
Automatic depression recognition by intelligent speech signal processing:A systematic survey
7
作者 Pingping Wu Ruihao Wang +3 位作者 Han Lin Fanlong Zhang Juan Tu Miao Sun 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第3期701-711,共11页
Depression has become one of the most common mental illnesses in the world.For better prediction and diagnosis,methods of automatic depression recognition based on speech signal are constantly proposed and updated,wit... Depression has become one of the most common mental illnesses in the world.For better prediction and diagnosis,methods of automatic depression recognition based on speech signal are constantly proposed and updated,with a transition from the early traditional methods based on hand‐crafted features to the application of architectures of deep learning.This paper systematically and precisely outlines the most prominent and up‐to‐date research of automatic depression recognition by intelligent speech signal processing so far.Furthermore,methods for acoustic feature extraction,algorithms for classification and regression,as well as end to end deep models are investigated and analysed.Finally,general trends are summarised and key unresolved issues are identified to be considered in future studies of automatic speech depression recognition. 展开更多
关键词 acoustic signal processing deep learning feature extraction speech depression recognition
下载PDF
Discriminative tonal feature extraction method in mandarin speech recognition 被引量:1
8
作者 HUANG Hao ZHU Jie 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2007年第4期126-130,共5页
To utilize the supra-segmental nature of Mandarin tones, this article proposes a feature extraction method for hidden markov model (HMM) based tone modeling. The method uses linear transforms to project Fo(fundamen... To utilize the supra-segmental nature of Mandarin tones, this article proposes a feature extraction method for hidden markov model (HMM) based tone modeling. The method uses linear transforms to project Fo(fundamental frequency) features of neighboring syllables as compensations, and adds them to the original Fo features of the current syUable. The transforms are discriminatively trained by using an objective function termed as "minimum tone error", which is a smooth approximation of tone recognition accuracy. Experiments show that the new tonal features achieve 3.82% tone recognition rate improvement, compared with the baseline, using maximum likelihood trained HMM on the normal F0 features. Further experiments show that discriminative HMM training on the new features is 8.78% better than the baseline. 展开更多
关键词 discriminative training tone recognition feature extraction Mandarin speech recognition
原文传递
基于元学习自适应的小样本语音合成
9
作者 吴郅昊 迟子秋 +1 位作者 肖婷 王喆 《计算机应用》 CSCD 北大核心 2024年第5期1629-1635,共7页
在小样本条件下的语音合成(TTS)要求在仅有少量样本的情况下合成与原说话人相似的语音,然而现有的小样本语音合成面临如下问题:如何快速适配新说话人,并且在保证语音质量的情况下提高生成语音与说话人的相似性。现有模型在适配新说话人... 在小样本条件下的语音合成(TTS)要求在仅有少量样本的情况下合成与原说话人相似的语音,然而现有的小样本语音合成面临如下问题:如何快速适配新说话人,并且在保证语音质量的情况下提高生成语音与说话人的相似性。现有模型在适配新说话人的过程中,很少考虑到在不同适配阶段模型特征的变化规律,导致生成语音不能在保证语音质量的情况下快速提升语音相似性。为了解决上述问题,提出一种使用元学习指导模型适配新说话人的方法,模型中通过元特征模块对适配过程进行指导,在适配新说话人过程中提升语音相似度的同时保证生成语音质量;并通过步数编码器区分不同的适配阶段,以提升模型适配新说话人的速度。在Libri-TTS与VCTK数据集上通过主观与客观评价指标,在不同的适配步数下对现有快速适配新说话人的方法进行了比较,实验结果表明所提方法动态时间规整的梅尔倒谱失真(DTW-MCD)分别为7.4502与6.5243,在合成语音的相似度上优于其他元学习方法,并且能够更快适配新的说话人。 展开更多
关键词 小样本生成 语音合成 元学习 说话人适配 特征提取
下载PDF
基于F-DFCC融合特征的语音情感识别方法 被引量:1
10
作者 何朝霞 朱嵘涛 罗辉 《现代电子技术》 北大核心 2024年第6期131-136,共6页
结合神经网络、并行多特征向量和注意力机制,有助于提高语音情感识别的性能。基于此,从前期已经提取的DFCC参数入手,提取I-DFCC和Mid-DFCC特征参数,利用Fisher比选取特征参数构成F-DFCC;再将F-DFCC特征参数与LPCC、MFCC特征参数进行对... 结合神经网络、并行多特征向量和注意力机制,有助于提高语音情感识别的性能。基于此,从前期已经提取的DFCC参数入手,提取I-DFCC和Mid-DFCC特征参数,利用Fisher比选取特征参数构成F-DFCC;再将F-DFCC特征参数与LPCC、MFCC特征参数进行对比并融合,输入到含双向LSTM网络及注意力机制的ECAPA-TDNN模型中;最后,在CASIA和RAVDESS数据集上验证F-DFCC融合特征参数的有效性。实验结果表明:与单一的F-DFCC特征参数相比,F-DFCC融合特征的准确率WA、召回率UA、F1-score在CASIA数据集上分别提高0.035 1、0.031 1、0.031 3;在RAVDESS数据集上分别提高0.024 5、0.035 8、0.033 2。在两个数据集中,surprised情感的识别准确率最高,为0.94;F-DFCC融合特征参数的6种和8种情感识别率与其他特征参数相比均有所提升。 展开更多
关键词 语音情感识别 DFCC F-DFCC 融合特征 特征提取 Fisher比 ECAPA-TDNN
下载PDF
基于LMD改进特征提取的三路病理语音识别
11
作者 张楠 陈媛媛 +1 位作者 陈鑫钰 侯懿桃 《电子测量技术》 北大核心 2024年第12期140-147,共8页
针对发音障碍患者发音不够清晰准确,导致病理语音识别率低的问题,提出一种基于LMD改进的Gammatone滤波器组图谱特征提取算法进行三路病理语音识别,首先,该算法采用LMD分解语音信号,对分解后的各语音分量做短时傅里叶变换后进行频率合成... 针对发音障碍患者发音不够清晰准确,导致病理语音识别率低的问题,提出一种基于LMD改进的Gammatone滤波器组图谱特征提取算法进行三路病理语音识别,首先,该算法采用LMD分解语音信号,对分解后的各语音分量做短时傅里叶变换后进行频率合成,提取滤波器组特征及其一阶、二阶差分特征,构成能获取病理语音有效局部特征的LMD-GFbank图谱特征;其次,为了进一步优化网络模型在训练过程中遗漏掉部分有效特征信息,提出一种三路病理语音识别模型;最后,结合语音特征信息进行病理语音识别模型训练和测试。实验结果表明,LMD-GFbank图谱特征在三路病理语音识别模型上的识别率达到了93.36%,优于传统MFCC、GFCC、Fbank特征的语音识别效果,验证了所提算法及识别模型能提升病理语音识别准确率。 展开更多
关键词 发音障碍 局部均值分解 病理语音识别 特征提取
下载PDF
基于时频特征的反监听技术研究与应用
12
作者 李建勋 王开 《电子测量技术》 北大核心 2024年第7期1-8,共8页
本研究旨在探讨一种基于时频特征设计的反监听技术,重点研究如何通过动态修改时序和频率增强特定频率范围内的人类语音干扰。本文针对现有的语音干扰技术展开研究,并与标准噪声注入方法进行了比较。研究方法包括理论分析和实验验证,通... 本研究旨在探讨一种基于时频特征设计的反监听技术,重点研究如何通过动态修改时序和频率增强特定频率范围内的人类语音干扰。本文针对现有的语音干扰技术展开研究,并与标准噪声注入方法进行了比较。研究方法包括理论分析和实验验证,通过对实际原型进行测试和验证,评估了基于时频特征提取的干扰信号在干扰语音识别系统方面的有效性。实验结果显示,当信噪比低于0dB时,所提出方法的文本识别错误率超过了60%;而当信噪比为0dB时,本文算法的文本识别错误率平均比当前干扰算法高出20%以上。此外,当干扰系统与录音设备保持相同距离时,本文算法在录音设备上产生的信噪比比当前算法低近2dB,这说明了所提出算法的高能量利用效率。因此,本研究成果对于提高通信安全和保护隐私具有重要意义,特别是在需要高度保密的通信环境中。 展开更多
关键词 隐私保护 防窃听 超声波干扰 时频特征提取 语音干扰
下载PDF
基于多重视觉注意力的唇语识别
13
作者 谢胤岑 薛峰 曹明伟 《模式识别与人工智能》 EI CSCD 北大核心 2024年第1期73-84,共12页
唇语识别是将单个说话人嘴唇运动的无声视频翻译成文字的一种技术.由于嘴唇运动幅度较小,现有唇语识别方法的特征区分能力和泛化能力都较差.针对该问题,文中分别从时间、空间和通道三个维度研究唇语视觉特征的提纯问题,提出基于多重视... 唇语识别是将单个说话人嘴唇运动的无声视频翻译成文字的一种技术.由于嘴唇运动幅度较小,现有唇语识别方法的特征区分能力和泛化能力都较差.针对该问题,文中分别从时间、空间和通道三个维度研究唇语视觉特征的提纯问题,提出基于多重视觉注意力的唇语识别方法(Lipreading Based on Multiple Visual Attention Network,LipMVA).首先利用通道注意力自适应校准通道级别的特征,减轻无意义通道的干扰.然后使用两种粒度不同的时空注意力,抑制不重要的像素或帧的影响.CMLR、GRID数据集上的实验表明LipMVA可降低识别错误率,由此验证方法的有效性. 展开更多
关键词 唇语识别 视觉语音识别 注意力机制 深度神经网络 特征提取
下载PDF
基于幅度和相位混合特征交叉的语音增强方法
14
作者 卿朝进 付小伟 唐书海 《计算机工程与设计》 北大核心 2024年第2期587-593,共7页
为充分利用含噪语音信号的相位特征信息及其与幅度信息的相关性,提出一种幅度和相位混合特征交叉的单通道语音增强方法。提取含噪信号的对数功率谱和相位特征,依次交叉排列;计算复数掩模,将复数掩模的实虚部依次交叉以保持对称输入特征... 为充分利用含噪语音信号的相位特征信息及其与幅度信息的相关性,提出一种幅度和相位混合特征交叉的单通道语音增强方法。提取含噪信号的对数功率谱和相位特征,依次交叉排列;计算复数掩模,将复数掩模的实虚部依次交叉以保持对称输入特征;在此基础上,构建深度编解码器网络(amplitude phase deep encoder decoder network,APDEDN)增强语音质量。实验结果表明,相较单一特征方法,提出方法获得了语音质量感知评估评分和短时目标可懂度上的改善。 展开更多
关键词 语音增强 特征交叉 特征提取 混合特征 复数掩模 编解码器 深度学习
下载PDF
面向语音识别的声学特征优化方法
15
作者 杨波 《电声技术》 2024年第4期51-53,共3页
针对语音识别中声学特征提取存在的局限性,提出一种基于自适应特征加权的声学特征优化方法。首先,分析声学特征提取在语音识别中的作用,介绍传统梅尔频率倒谱系数(Mel-scale Frequency Cepstral Coefficients,MFCC)方法的基本原理和存... 针对语音识别中声学特征提取存在的局限性,提出一种基于自适应特征加权的声学特征优化方法。首先,分析声学特征提取在语音识别中的作用,介绍传统梅尔频率倒谱系数(Mel-scale Frequency Cepstral Coefficients,MFCC)方法的基本原理和存在的问题。其次,提出自适应特征加权方法,通过计算自适应权重优化MFCC特征。最后,进行实验分析。实验结果表明,优化方法在语音识别任务中具有较好的效果和实用性。 展开更多
关键词 语音识别 特征提取 声学特征
下载PDF
少样本情感可控语音合成研究与应用
16
作者 张梦姣 杨捍 马军 《通信技术》 2024年第9期897-904,共8页
在深度合成技术快速发展的背景下,基于现有语音合成技术,特定人物的语音合成需要在专业的录音棚收集大量的数据,同时,合成语音情感仅限于录制的数据。在VITS2方法的基础上提出了新的情感可控语音合成模型,新增了预训练的说话人特征提取... 在深度合成技术快速发展的背景下,基于现有语音合成技术,特定人物的语音合成需要在专业的录音棚收集大量的数据,同时,合成语音情感仅限于录制的数据。在VITS2方法的基础上提出了新的情感可控语音合成模型,新增了预训练的说话人特征提取模块、情感特征提取模块、双向流网络损失计算模块和混合训练技巧,实现了少样本情况下情感可控语音合成。在AISHELL3数据集进行实验,结果表明,所提模型在少样本情况下具有更高的合成自然度和合成相似度。在EDS数据集上进行实验表明,所提模型在少样本情况下具有更高的情感相似度,相比于基线方法,在客观指标上合成字错误率也更低,进一步验证了所提方法的有效性。 展开更多
关键词 语音合成 少样本 情感可控 双向流网络 情感特征提取
下载PDF
语音识别特征提取中对特征方法的对比
17
作者 郭明琦 《计算机应用文摘》 2024年第2期96-99,共4页
人工智能概念的提出,让语音识别迎来了新的生机。随着相关知识与技能的飞速发展,神经网络带动了语音识别领域相关知识的革新。文章使用语音识别中常见的LPCC特征、MFCC特征和PLP特征对同一段语音进行特征提取,通过特征图像化可以直观展... 人工智能概念的提出,让语音识别迎来了新的生机。随着相关知识与技能的飞速发展,神经网络带动了语音识别领域相关知识的革新。文章使用语音识别中常见的LPCC特征、MFCC特征和PLP特征对同一段语音进行特征提取,通过特征图像化可以直观展示其特征的优劣势。其中,LPCC特征对频谱包络变化较为敏感;MFCC特征具有较好语音信号的短时频谱,对信号的语音干扰和音量变化等抗干扰能力较好,但高频细节不够清晰;PLP特征具有较好的鲁棒性,对信号的语音干扰和音量变化等有很好的抗干扰能力,且对高频部分的细节信息表示更为准确。 展开更多
关键词 语音识别 特征提取 LPCC MFCC PLP
下载PDF
Speech Recognition-Based Automated Visual Acuity Testing with Adaptive Mel Filter Bank 被引量:2
18
作者 Shibli Nisar Muhammad Asghar Khan +3 位作者 Fahad Algarni Abdul Wakeel M.Irfan Uddin Insaf Ullah 《Computers, Materials & Continua》 SCIE EI 2022年第2期2991-3004,共14页
One of the most commonly reported disabilities is vision loss,which can be diagnosed by an ophthalmologist in order to determine the visual system of a patient.This procedure,however,usually requires an appointment wi... One of the most commonly reported disabilities is vision loss,which can be diagnosed by an ophthalmologist in order to determine the visual system of a patient.This procedure,however,usually requires an appointment with an ophthalmologist,which is both time-consuming and expensive process.Other issues that can arise include a lack of appropriate equipment and trained practitioners,especially in rural areas.Centered on a cognitively motivated attribute extraction and speech recognition approach,this paper proposes a novel idea that immediately determines the eyesight deficiency.The proposed system uses an adaptive filter bank with weighted mel frequency cepstral coefficients for feature extraction.The adaptive filter bank implementation is inspired by the principle of spectrum sensing in cognitive radio that is aware of its environment and adapts to statistical variations in the input stimuli by learning from the environment.Comparative performance evaluation demonstrates the potential of our automated visual acuity test method to achieve comparable results to the clinical ground truth,established by the expert ophthalmologist’s tests.The overall accuracy achieved by the proposed model when compared with the expert ophthalmologist test is 91.875%.The proposed method potentially offers a second opinion to ophthalmologists,and serves as a cost-effective pre-screening test to predict eyesight loss at an early stage. 展开更多
关键词 Eyesight test speech recognition HMM SVM feature extraction
下载PDF
Arabic Speech Recognition System Based on MFCC and HMMs 被引量:2
19
作者 Hussien A.Elharati Mohamed Alshaari Veton Z.Kepuska 《Journal of Computer and Communications》 2020年第3期28-34,共7页
Speech recognition allows the machine to turn the speech signal into text through identification and understanding process. Extract the features, predict the maximum likelihood, and generate the models of the input sp... Speech recognition allows the machine to turn the speech signal into text through identification and understanding process. Extract the features, predict the maximum likelihood, and generate the models of the input speech signal are considered the most important steps to configure the Automatic Speech Recognition System (ASR). In this paper, an automatic Arabic speech recognition system was established using MATLAB and 24 Arabic words Consonant-Vowel Consonant-Vowel Consonant-Vowel (CVCVCV) was recorded from 19 Arabic native speakers, each speaker uttering the same word 3 times (total 1368 words). In order to test the system, 39-features were extracted by partitioning the speech signal into frames ~ 0.25 sec shifted by 0.10 sec. in back-end, the statistical models were generated by separated the features into number of states between 4 to 10, each state has 8-gaussian distributions. The data has 48 k sample rate and 32-bit depth and saved separately in a wave file format. The system was trained in phonetically rich and balanced Arabic speech words list (10 speakers * 3 times * 24 words, total 720 words) and tested using another word list (24 words * 9 speakers * 3 times *, total 648 words). Using different speakers similar words, the system obtained a very good word recognition accuracy results of 92.92% and a Word Error Rate (WER) of 7.08%. 展开更多
关键词 speech Recognition feature extraction Maximum LIKELIHOOD GAUSSIAN Distribution Consonant-Vowel
下载PDF
融合注意力机制的CS-BiLSTM深度回声消除算法 被引量:2
20
作者 许春冬 王茹霞 +2 位作者 徐锦武 凌贤鹏 黄乔月 《现代电子技术》 2023年第5期55-59,共5页
在全双工通信系统中,声学回声会降低用户的体验,针对在双向通话场景下自适应滤波算法消除声学回声效果不理想以及非线性声学回声难以消除的问题,提出一种注意力机制与BiLSTM网络相结合的CS-BiLSTM深度声学回声消除算法。首先通过构建BiL... 在全双工通信系统中,声学回声会降低用户的体验,针对在双向通话场景下自适应滤波算法消除声学回声效果不理想以及非线性声学回声难以消除的问题,提出一种注意力机制与BiLSTM网络相结合的CS-BiLSTM深度声学回声消除算法。首先通过构建BiLSTM网络提取语音的时序特征,之后引入通道和空间注意力机制提取回声信号的空间特征信息,并融合均方根误差与平均绝对误差提出一种新的损失函数,提高模型的鲁棒性。改进后的CS-BiLSTM网络模型能够获得清晰的语音信号,具有更好的回声消除性能。仿真结果表明,在非线性回声和双向通话环境下,与其他几种参考算法相比,所提出的CS-BiLSTM算法在感知语音质量评价方面明显优于其他算法,更有效地实现了回声消除,此外,该算法结构简单且模型参数量更少。 展开更多
关键词 回声消除 双工通信 注意力机制 特征提取 语音信号获得 损失函数优化 回声系统模型 对比实验
下载PDF
上一页 1 2 12 下一页 到第
使用帮助 返回顶部