期刊文献+
共找到17篇文章
< 1 >
每页显示 20 50 100
Challenges and Limitations in Speech Recognition Technology:A Critical Review of Speech Signal Processing Algorithms,Tools and Systems
1
作者 Sneha Basak Himanshi Agrawal +4 位作者 Shreya Jena Shilpa Gite Mrinal Bachute Biswajeet Pradhan Mazen Assiri 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第5期1053-1089,共37页
Speech recognition systems have become a unique human-computer interaction(HCI)family.Speech is one of the most naturally developed human abilities;speech signal processing opens up a transparent and hand-free computa... Speech recognition systems have become a unique human-computer interaction(HCI)family.Speech is one of the most naturally developed human abilities;speech signal processing opens up a transparent and hand-free computation experience.This paper aims to present a retrospective yet modern approach to the world of speech recognition systems.The development journey of ASR(Automatic Speech Recognition)has seen quite a few milestones and breakthrough technologies that have been highlighted in this paper.A step-by-step rundown of the fundamental stages in developing speech recognition systems has been presented,along with a brief discussion of various modern-day developments and applications in this domain.This review paper aims to summarize and provide a beginning point for those starting in the vast field of speech signal processing.Since speech recognition has a vast potential in various industries like telecommunication,emotion recognition,healthcare,etc.,this review would be helpful to researchers who aim at exploring more applications that society can quickly adopt in future years of evolution. 展开更多
关键词 Speech recognition automatic speech recognition(ASR) mel-frequency cepstral coefficients(MFCC) hidden Markov model(HMM) artificial neural network(ANN)
下载PDF
基于余弦相似度的动态语音特征提取算法 被引量:9
2
作者 艾佳琪 左毅 +3 位作者 刘君霞 贺培超 李铁山 陈俊龙 《计算机应用研究》 CSCD 北大核心 2020年第S02期147-149,共3页
为进一步研究语音特征提取方法,分析了基于逆离散余弦变换倒谱系数(IDCT CC)的语音特征,利用频域语音信号间的余弦相似度(cosine similarity)特性将IDCT CC进行层次聚类,得到14维频域语音特征向量(feature vector),称之为C-vector。实验... 为进一步研究语音特征提取方法,分析了基于逆离散余弦变换倒谱系数(IDCT CC)的语音特征,利用频域语音信号间的余弦相似度(cosine similarity)特性将IDCT CC进行层次聚类,得到14维频域语音特征向量(feature vector),称之为C-vector。实验中,建立基于高斯混合模型(Gaussian mixture model,GMM)的说话人识别模型对C-vector进行识别精度和时间的讨论,并与经典的梅尔频率倒谱系数和等频域倒谱系数(histogram of DCT cepstrum coefficients,HDCC)进行对比实验。通过具体的实验结果比较,提出的C-vector在识别精度方面比MFCC和HDCC分别高出7%和5%。而且,C-vector在多人语音集下表现出的识别能力更为优异。 展开更多
关键词 说话人识别 语音特征 梅尔频率倒谱系数(mel-frequency cepstral coefficients MFCC) 逆离散余弦变换倒谱系数(inrerse discrete cosine tromsform cepstrwm coefficient IDCT CC) 余弦相似度 层次聚类分析
下载PDF
Wake-Up-Word Feature Extraction on FPGA
3
作者 Veton ZKepuska Mohamed MEljhani Brian HHight 《World Journal of Engineering and Technology》 2014年第1期1-12,共12页
Wake-Up-Word Speech Recognition task (WUW-SR) is a computationally very demand, particularly the stage of feature extraction which is decoded with corresponding Hidden Markov Models (HMMs) in the back-end stage of the... Wake-Up-Word Speech Recognition task (WUW-SR) is a computationally very demand, particularly the stage of feature extraction which is decoded with corresponding Hidden Markov Models (HMMs) in the back-end stage of the WUW-SR. The state of the art WUW-SR system is based on three different sets of features: Mel-Frequency Cepstral Coefficients (MFCC), Linear Predictive Coding Coefficients (LPC), and Enhanced Mel-Frequency Cepstral Coefficients (ENH_MFCC). In (front-end of Wake-Up-Word Speech Recognition System Design on FPGA) [1], we presented an experimental FPGA design and implementation of a novel architecture of a real-time spectrogram extraction processor that generates MFCC, LPC, and ENH_MFCC spectrograms simultaneously. In this paper, the details of converting the three sets of spectrograms 1) Mel-Frequency Cepstral Coefficients (MFCC), 2) Linear Predictive Coding Coefficients (LPC), and 3) Enhanced Mel-Frequency Cepstral Coefficients (ENH_MFCC) to their equivalent features are presented. In the WUW- SR system, the recognizer’s frontend is located at the terminal which is typically connected over a data network to remote back-end recognition (e.g., server). The WUW-SR is shown in Figure 1. The three sets of speech features are extracted at the front-end. These extracted features are then compressed and transmitted to the server via a dedicated channel, where subsequently they are decoded. 展开更多
关键词 Speech Recognition System Feature Extraction mel-frequency Cepstral Coefficients Linear Predictive Coding Coefficients Enhanced mel-frequency Cepstral Coefficients Hidden Markov Models Field-Programmable Gate Arrays
下载PDF
采用Mel倒谱参数的咳嗽声识别方法 被引量:2
4
作者 尹永 莫鸿强 《信息技术》 2012年第10期85-91,共7页
在诊断一个有慢性咳嗽的病人时,他的咳嗽强度和频率评估能提供很有价值的信息。因此提高咳嗽识别率,对疾病的诊断有着重要意义。从语音识别中被广泛应用的Mel倒谱参数出发,寻找咳嗽和语音在Mel倒谱参数中的区别。基于Mel倒谱参数的原理... 在诊断一个有慢性咳嗽的病人时,他的咳嗽强度和频率评估能提供很有价值的信息。因此提高咳嗽识别率,对疾病的诊断有着重要意义。从语音识别中被广泛应用的Mel倒谱参数出发,寻找咳嗽和语音在Mel倒谱参数中的区别。基于Mel倒谱参数的原理,将其计算过程中的Mel刻度滤波器对数能量的极值数分布情况提取出来作为咳嗽的识别特征。在病房环境下对录音文件进行实验,得到的咳嗽识别率为90%以上,同时能够将语音等非咳嗽信号有效地剔除,实验结果显示90%以上的语音信号被排除。在录音设备及环境等各项参数不变的条件下,对不同病人样本,可使用同一阈值对咳嗽进行识别。该方法过程简单,数据计算量小,便于快速识别。 展开更多
关键词 Mel倒谱参数(mel-frequency CEPSTRUM Coefficient MFCC) Mel刻度滤波器对数能量 咳嗽识别
下载PDF
Environmental Sound Classification Using Deep Learning 被引量:7
5
作者 SHANTHAKUMAR S SHAKILA S +1 位作者 SUNETH Pathirana JAYALATH Ekanayake 《Instrumentation》 2020年第3期15-22,共8页
Perhaps hearing impairment individuals cannot identify the environmental sounds due to noise around them.However,very little research has been conducted in this domain.Hence,the aim of this study is to categorize soun... Perhaps hearing impairment individuals cannot identify the environmental sounds due to noise around them.However,very little research has been conducted in this domain.Hence,the aim of this study is to categorize sounds generated in the environment so that the impairment individuals can distinguish the sound categories.To that end first we define nine sound classes--air conditioner,car horn,children playing,dog bark,drilling,engine idling,jackhammer,siren,and street music--typically exist in the environment.Then we record 100 sound samples from each category and extract features of each sound category using Mel-Frequency Cepstral Coefficients(MFCC).The training dataset is developed using this set of features together with the class variable;sound category.Sound classification is a complex task and hence,we use two Deep Learning techniques;Multi Layer Perceptron(MLP)and Convolution Neural Network(CNN)to train classification models.The models are tested using a separate test set and the performances of the models are evaluated using precision,recall and F1-score.The results show that the CNN model outperforms the MLP.However,the MLP also provided a decent accuracy in classifying unknown environmental sounds. 展开更多
关键词 mel-frequency Cepstral Coefficients MFCC Multi-Layer Perceptron MLP Convolutional Neural Network CNN
下载PDF
Robust Speech Recognition System Using Conventional and Hybrid Features of MFCC,LPCC,PLP,RASTA-PLP and Hidden Markov Model Classifier in Noisy Conditions 被引量:7
6
作者 Veton Z.Kepuska Hussien A.Elharati 《Journal of Computer and Communications》 2015年第6期1-9,共9页
In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance... In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance degradation in noisy conditions or distorted channels. It is necessary to search for more robust feature extraction methods to gain better performance in adverse conditions. This paper investigates the performance of conventional and new hybrid speech feature extraction algorithms of Mel Frequency Cepstrum Coefficient (MFCC), Linear Prediction Coding Coefficient (LPCC), perceptual linear production (PLP), and RASTA-PLP in noisy conditions through using multivariate Hidden Markov Model (HMM) classifier. The behavior of the proposal system is evaluated using TIDIGIT human voice dataset corpora, recorded from 208 different adult speakers in both training and testing process. The theoretical basis for speech processing and classifier procedures were presented, and the recognition results were obtained based on word recognition rate. 展开更多
关键词 Speech Recognition Noisy Conditions Feature Extraction mel-frequency Cepstral Coefficients Linear Predictive Coding Coefficients Perceptual Linear Production RASTA-PLP Isolated Speech Hidden Markov Model
下载PDF
Application of Hidden Markov Models in Speech Command Recognition 被引量:1
7
作者 Shing-Tai Pan Zong-Hong Huang +3 位作者 Sheng-Syun Yuan Xu-Yu Li Yu-De Su Jia-Hua Li 《Journal of Mechanics Engineering and Automation》 2020年第2期41-45,共5页
In this study,vector quantization and hidden Markov models were used to achieve speech command recognition.Pre-emphasis,a hamming window,and Mel-frequency cepstral coefficients were first adopted to obtain feature val... In this study,vector quantization and hidden Markov models were used to achieve speech command recognition.Pre-emphasis,a hamming window,and Mel-frequency cepstral coefficients were first adopted to obtain feature values.Subsequently,vector quantization and HMMs(hidden Markov models)were employed to achieve speech command recognition.The recorded speech length was three Chinese characters,which were used to test the method.Five phrases pronounced mixing various human voices were recorded and used to test the models.The recorded phrases were then used for speech command recognition to demonstrate whether the experiment results were satisfactory. 展开更多
关键词 HMMs mel-frequency cepstral coefficients speech command recognition vector quantization
下载PDF
Application of formant instantaneous characteristics to speech recognition and speaker identification
8
作者 侯丽敏 胡晓宁 谢娟敏 《Journal of Shanghai University(English Edition)》 CAS 2011年第2期123-127,共5页
This paper proposes a new phase feature derived from the formant instantaneous characteristics for speech recognition (SR) and speaker identification (SI) systems. Using Hilbert transform (HT), the formant chara... This paper proposes a new phase feature derived from the formant instantaneous characteristics for speech recognition (SR) and speaker identification (SI) systems. Using Hilbert transform (HT), the formant characteristics can be represented by instantaneous frequency (IF) and instantaneous bandwidth, namely formant instantaneous characteristics (FIC). In order to explore the importance of FIC both in SR and SI, this paper proposes different features from FIC used for SR and SI systems. When combing these new features with conventional parameters, higher identification rate can be achieved than that of using Mel-frequency cepstral coefficients (MFCC) parameters only. The experiment results show that the new features are effective characteristic parameters and can be treated as the compensation of conventional parameters for SR and SI. 展开更多
关键词 instantaneous frequency (IF) Hilbert transform (HT) speech recognition speaker identification mel-frequency cepstral coefficients (MFCC)
下载PDF
A Novel System for Recognizing Recording Devices from Recorded Speech Signals
9
作者 Yongqiang Bao Qi Shao +4 位作者 Xuxu Zhang Jiahui Jiang Yue Xie Tingting Liu Weiye Xu 《Computers, Materials & Continua》 SCIE EI 2020年第12期2557-2570,共14页
The field of digital audio forensics aims to detect threats and fraud in audio signals.Contemporary audio forensic techniques use digital signal processing to detect the authenticity of recorded speech,recognize speak... The field of digital audio forensics aims to detect threats and fraud in audio signals.Contemporary audio forensic techniques use digital signal processing to detect the authenticity of recorded speech,recognize speakers,and recognize recording devices.User-generated audio recordings from mobile phones are very helpful in a number of forensic applications.This article proposed a novel method for recognizing recording devices based on recorded audio signals.First,a database of the features of various recording devices was constructed using 32 recording devices(20 mobile phones of different brands and 12 kinds of recording pens)in various environments.Second,the audio features of each recording device,such as the Mel-frequency cepstral coefficients(MFCC),were extracted from the audio signals and used as model inputs.Finally,support vector machines(SVM)with fractional Gaussian kernel were used to recognize the recording devices from their audio features.Experiments demonstrated that the proposed method had a 93.4%accuracy in recognizing recording devices. 展开更多
关键词 Recording device recognition mel-frequency cepstral coefficients support vector machines
下载PDF
A Comparison of Classifiers in Performing Speaker Accent Recognition Using MFCCs
10
作者 Zichen Ma Ernest Fokoué 《Open Journal of Statistics》 2014年第4期258-266,共9页
An algorithm involving Mel-Frequency Cepstral Coefficients (MFCCs) is provided to perform signal feature extraction for the task of speaker accent recognition. Then different classifiers are compared based on the MFCC... An algorithm involving Mel-Frequency Cepstral Coefficients (MFCCs) is provided to perform signal feature extraction for the task of speaker accent recognition. Then different classifiers are compared based on the MFCC feature. For each signal, the mean vector of MFCC matrix is used as an input vector for pattern recognition. A sample of 330 signals, containing 165 US voice and 165 non-US voice, is analyzed. By comparison, k-nearest neighbors yield the highest average test accuracy, after using a cross-validation of size 500, and least time being used in the computation. 展开更多
关键词 SPEAKER ACCENT RECOGNITION mel-frequency Cepstral Coefficients (MFCCs) DISCRIMINANT Analysis Support Vector Machines (SVMs) k-Nearest NEIGHBORS
下载PDF
Identification of Noisy Utterance Speech Signal using GA-Based Optimized 2D-MFCC Method and a Bispectrum Analysis
11
作者 Benyamin Kusumoputro Agus Buono Li Na 《Journal of Software Engineering and Applications》 2012年第12期193-199,共7页
One-dimensional Mel-Frequency Cepstrum Coefficients (1D-MFCC) in conjunction with a power spectrum analysis method is usually used as a feature extraction in a speaker identification system. However, as this one dimen... One-dimensional Mel-Frequency Cepstrum Coefficients (1D-MFCC) in conjunction with a power spectrum analysis method is usually used as a feature extraction in a speaker identification system. However, as this one dimensional feature extraction subsystem shows low recognition rate for identifying an utterance speech signal under harsh noise conditions, we have developed a speaker identification system based on two-dimensional Bispectrum data that was theoretically more robust to the addition of Gaussian noise. As the processing sequence of ID-MFCC method could not be directly used for processing the two-dimensional Bispectrum data, in this paper we proposed a 2D-MFCC method as an extension of the 1D-MFCC method and the optimization of the 2D filter design using Genetic Algorithms. By using the 2D-MFCC method with the Bispectrum analysis method as the feature extraction technique, we then used Hidden Markov Model as the pattern classifier. In this paper, we have experimentally shows our developed methods for identifying an utterance speech signal buried with various levels of noise. Experimental result shows that the 2D-MFCC method without GA optimization has a comparable high recognition rate with that of 1D-MFCC method for utterance signal without noise addition. However, when the utterance signal is buried with Gaussian noises, the developed 2D-MFCC shows higher recognition capability, especially, when the 2D-MFCC optimized by Genetics Algorithms is utilized. 展开更多
关键词 2D mel-frequency CEPSTRUM COEFFICIENTS BISPECTRUM Hidden Markov Model GENETICS Algorithms
下载PDF
Environmental Sound Recognition Using Double-Level Energy Detection
12
作者 Xiaoxia Zhang Ying Li 《Journal of Signal and Information Processing》 2013年第3期19-24,共6页
The performance of classic Mel-frequency cepstral coefficients (MFCC) is unsatisfactory in noisy environment with different sound sources from nature. In this paper, a classification approach of the ecological environ... The performance of classic Mel-frequency cepstral coefficients (MFCC) is unsatisfactory in noisy environment with different sound sources from nature. In this paper, a classification approach of the ecological environmental sounds using the double-level energy detection (DED) was presented. The DED was used to detect the existence of the sound signals under noise conditions. In addition, MFCC features from the frames which were detected the presence of the sound signals by DED were extracted. Experimental results show that the proposed technology has better noise immunity than classic MFCC, and also outperforms time-domain energy detection (TED) and frequency-domain energy detection (FED) respectively. 展开更多
关键词 Ecological ENVIRONMENTAL Sounds Double-Level ENERGY DETECTION Time-Domain ENERGY DETECTION FREQUENCY-DOMAIN ENERGY DETECTION mel-frequency Cepstral Coefficients
下载PDF
静态MFCC特征的性别差异性研究
13
作者 杨继臣 吴裕玲 苏杰华 《仲恺农业工程学院学报》 CAS 2011年第4期54-56,59,共4页
从男性、女性的静态美尔倒谱系数(Mel-frequency cepstral coefficients,MFCC)特征概率密度函数的峰值差异、平均值和方差等方面研究了静态MFCC特征的性别差异性.结果表明,在峰值方面,MFCC1、MFCC2、MFCC6、MFCC9和MFCC12的差异最大;在... 从男性、女性的静态美尔倒谱系数(Mel-frequency cepstral coefficients,MFCC)特征概率密度函数的峰值差异、平均值和方差等方面研究了静态MFCC特征的性别差异性.结果表明,在峰值方面,MFCC1、MFCC2、MFCC6、MFCC9和MFCC12的差异最大;在均值方面,男性MFCC特征分量大于女性MFCC特征分量;在方差方面,大部分男性MFCC特征分量小于女性MFCC特征分量. 展开更多
关键词 MFCC(mel-frequency cepstral coefficients)特征 性别差异 峰值差异 平均值 方差
下载PDF
Improved MFCC-Based Feature for Robust Speaker Identification 被引量:6
14
作者 吴尊敬 曹志刚 《Tsinghua Science and Technology》 SCIE EI CAS 2005年第2期158-161,共4页
The Mel-frequency cepstral coefficient (MFCC) is the most widely used feature in speech and speaker recognition. However, MFCC is very sensitive to noise interference, which tends to drastically de- grade the perfor... The Mel-frequency cepstral coefficient (MFCC) is the most widely used feature in speech and speaker recognition. However, MFCC is very sensitive to noise interference, which tends to drastically de- grade the performance of recognition systems because of the mismatches between training and testing. In this paper, the logarithmic transformation in the standard MFCC analysis is replaced by a combined function to improve the noisy sensitivity. The proposed feature extraction process is also combined with speech en- hancement methods, such as spectral subtraction and median-filter to further suppress the noise. Experi- ments show that the proposed robust MFCC-based feature significantly reduces the recognition error rate over a wide signal-to-noise ratio range. 展开更多
关键词 mel-frequency cepstral coefficient (MFCC) robust speaker identification feature extraction
原文传递
Short-term feeding behaviour sound classification method for sheep using LSTM networks
15
作者 Guanghui Duan Shengfu Zhang +3 位作者 Mingzhou Lu Cedric Okinda Mingxia Shen Tomas Norton 《International Journal of Agricultural and Biological Engineering》 SCIE EI CAS 2021年第2期43-54,共12页
A deep learning approach using long-short term memory(LSTM)networks was implemented in this study to classify the sound of short-term feeding behaviour of sheep,including biting,chewing,bolus regurgitation,and ruminat... A deep learning approach using long-short term memory(LSTM)networks was implemented in this study to classify the sound of short-term feeding behaviour of sheep,including biting,chewing,bolus regurgitation,and rumination chewing.The original acoustic signal was split into sound episodes using an endpoint detection method,where the thresholds of short-term energy and average zero-crossing rate were utilized.A discrete wavelet transform(DWT),Mel-frequency cepstral,and principal-component analysis(PCA)were integrated to extract the dimensionally reduced DWT based Mel-frequency cepstral coefficients(denoted by PW_MFCC)for each sound episode.Then,LSTM networks were employed to train classifiers for sound episode category classification.The performances of the LSTM classifiers with original Mel-frequency cepstral coefficients(MFCC),DWT based MFCC(denoted by W_MFCC),and PW_MFCC as the input feature coefficients were compared.Comparison results demonstrated that the introduction of DWT improved the classifier performance effectively,and PCA reduced the computational overhead without degrading classifier performance.The overall accuracy and comprehensive F1-score of the PW_MFCC based LSTM classifier were 94.97%and 97.41%,respectively.The classifier established in this study provided a foundation for an automatic identification system for sick sheep with abnormal feeding and rumination behaviour pattern. 展开更多
关键词 sheep behaviour short-term feeding behaviour acoustic analysis mel-frequency cepstral coefficients long-short term memory networks
原文传递
Adaptive Compensation Algorithm in Open Vocabulary Mandarin Speaker-Independent Speech Recognition
16
作者 FadhilH.T.Al-dulaimy 王作英 田野 《Tsinghua Science and Technology》 SCIE EI CAS 2002年第5期521-526,共6页
In speech recognition systems, the physiological characteristics of the speech production model cause the voiced sections of the speech signal to have an attenuation of approximately 20 dB per decade. Many speech rec... In speech recognition systems, the physiological characteristics of the speech production model cause the voiced sections of the speech signal to have an attenuation of approximately 20 dB per decade. Many speech recognition algorithms have been developed to solve this problem by filtering the input signal with a single-zero high pass filter. Unfortunately, this technique increases the noise energy at high frequencies above 4 kHz, which in some cases degrades the recognition accuracy. This paper solves the problem using a pre-emphasis filter in the front end of the recognizer. The aim is to develop a modified parameterization approach taking into account the whole energy zone in the spectrum to improve the performance of the existing baseline recognition system in the acoustic phase. The results show that a large vocabulary speaker-independent continuous speech recognition system using this approach has a greatly improved recognition rate. 展开更多
关键词 mel-frequency cepstrum coefficients speech recognition duration distribution based hidden Markov model
原文传递
English Speech Recognition System on Chip
17
作者 刘鸿 钱彦旻 刘加 《Tsinghua Science and Technology》 SCIE EI CAS 2011年第1期95-99,共5页
An English speech recognition system was implemented on a chip, called speech system-on-chip (SoC). The SoC included an application specific integrated circuit with a vector accelerator to improve performance. The s... An English speech recognition system was implemented on a chip, called speech system-on-chip (SoC). The SoC included an application specific integrated circuit with a vector accelerator to improve performance. The sub-word model based on a continuous density hidden Markov model recognition algorithm ran on a very cheap speech chip. The algorithm was a two-stage fixed-width beam-search baseline system with a variable beam-width pruning strategy and a frame-synchronous word-level pruning strategy to significantly reduce the recognition time. Tests show that this method reduces the recognition time nearly 6 fold and the memory size nearly 2 fold compared to the original system, with less than 1% accuracy degradation for a 600 word recognition task and recognition accuracy rate of about 98%. 展开更多
关键词 non-specific human voice-consciousness SYSTEM-ON-CHIP mel-frequency cepstral coefficients (MFCC)
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部