期刊文献+
共找到11篇文章
< 1 >
每页显示 20 50 100
A Comparison of Classifiers in Performing Speaker Accent Recognition Using MFCCs
1
作者 Zichen Ma Ernest Fokoué 《Open Journal of Statistics》 2014年第4期258-266,共9页
An algorithm involving Mel-Frequency Cepstral Coefficients (MFCCs) is provided to perform signal feature extraction for the task of speaker accent recognition. Then different classifiers are compared based on the MFCC... An algorithm involving Mel-Frequency Cepstral Coefficients (MFCCs) is provided to perform signal feature extraction for the task of speaker accent recognition. Then different classifiers are compared based on the MFCC feature. For each signal, the mean vector of MFCC matrix is used as an input vector for pattern recognition. A sample of 330 signals, containing 165 US voice and 165 non-US voice, is analyzed. By comparison, k-nearest neighbors yield the highest average test accuracy, after using a cross-validation of size 500, and least time being used in the computation. 展开更多
关键词 SPEAKER ACCENT RECOGNITION mel-frequency Cepstral coefficients (mfccs) DISCRIMINANT Analysis Support Vector Machines (SVMs) k-Nearest NEIGHBORS
下载PDF
Challenges and Limitations in Speech Recognition Technology:A Critical Review of Speech Signal Processing Algorithms,Tools and Systems
2
作者 Sneha Basak Himanshi Agrawal +4 位作者 Shreya Jena Shilpa Gite Mrinal Bachute Biswajeet Pradhan Mazen Assiri 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第5期1053-1089,共37页
Speech recognition systems have become a unique human-computer interaction(HCI)family.Speech is one of the most naturally developed human abilities;speech signal processing opens up a transparent and hand-free computa... Speech recognition systems have become a unique human-computer interaction(HCI)family.Speech is one of the most naturally developed human abilities;speech signal processing opens up a transparent and hand-free computation experience.This paper aims to present a retrospective yet modern approach to the world of speech recognition systems.The development journey of ASR(Automatic Speech Recognition)has seen quite a few milestones and breakthrough technologies that have been highlighted in this paper.A step-by-step rundown of the fundamental stages in developing speech recognition systems has been presented,along with a brief discussion of various modern-day developments and applications in this domain.This review paper aims to summarize and provide a beginning point for those starting in the vast field of speech signal processing.Since speech recognition has a vast potential in various industries like telecommunication,emotion recognition,healthcare,etc.,this review would be helpful to researchers who aim at exploring more applications that society can quickly adopt in future years of evolution. 展开更多
关键词 Speech recognition automatic speech recognition(ASR) mel-frequency cepstral coefficients(mfcc) hidden Markov model(HMM) artificial neural network(ANN)
下载PDF
基于小波子带分解的特征参数对语音自动切分的改进 被引量:2
3
作者 秦欢 柴佩琪 陈锴 《计算机应用》 CSCD 北大核心 2005年第6期1345-1346,共2页
采用了基于小波子带分解的特征提取方法,根据DCT和DWT两种去相关方法的不同,得到语音信号的特征参数分别为SubbandBasedCepstral(SBC)和WaveletPacketParameters(WPP)。实验切分结果表明,基于小波子带分解的特征参数比MFCC取得更好的切... 采用了基于小波子带分解的特征提取方法,根据DCT和DWT两种去相关方法的不同,得到语音信号的特征参数分别为SubbandBasedCepstral(SBC)和WaveletPacketParameters(WPP)。实验切分结果表明,基于小波子带分解的特征参数比MFCC取得更好的切分效果。 展开更多
关键词 隐马尔可夫模型 语音自动切分 MEL频率倒谱系数 小波子带分解
下载PDF
基于余弦相似度的动态语音特征提取算法 被引量:10
4
作者 艾佳琪 左毅 +3 位作者 刘君霞 贺培超 李铁山 陈俊龙 《计算机应用研究》 CSCD 北大核心 2020年第S02期147-149,共3页
为进一步研究语音特征提取方法,分析了基于逆离散余弦变换倒谱系数(IDCT CC)的语音特征,利用频域语音信号间的余弦相似度(cosine similarity)特性将IDCT CC进行层次聚类,得到14维频域语音特征向量(feature vector),称之为C-vector。实验... 为进一步研究语音特征提取方法,分析了基于逆离散余弦变换倒谱系数(IDCT CC)的语音特征,利用频域语音信号间的余弦相似度(cosine similarity)特性将IDCT CC进行层次聚类,得到14维频域语音特征向量(feature vector),称之为C-vector。实验中,建立基于高斯混合模型(Gaussian mixture model,GMM)的说话人识别模型对C-vector进行识别精度和时间的讨论,并与经典的梅尔频率倒谱系数和等频域倒谱系数(histogram of DCT cepstrum coefficients,HDCC)进行对比实验。通过具体的实验结果比较,提出的C-vector在识别精度方面比MFCC和HDCC分别高出7%和5%。而且,C-vector在多人语音集下表现出的识别能力更为优异。 展开更多
关键词 说话人识别 语音特征 梅尔频率倒谱系数(mel-frequency cepstral coefficients mfcc) 逆离散余弦变换倒谱系数(inrerse discrete cosine tromsform cepstrwm coefficient IDCT CC) 余弦相似度 层次聚类分析
下载PDF
采用Mel倒谱参数的咳嗽声识别方法 被引量:2
5
作者 尹永 莫鸿强 《信息技术》 2012年第10期85-91,共7页
在诊断一个有慢性咳嗽的病人时,他的咳嗽强度和频率评估能提供很有价值的信息。因此提高咳嗽识别率,对疾病的诊断有着重要意义。从语音识别中被广泛应用的Mel倒谱参数出发,寻找咳嗽和语音在Mel倒谱参数中的区别。基于Mel倒谱参数的原理... 在诊断一个有慢性咳嗽的病人时,他的咳嗽强度和频率评估能提供很有价值的信息。因此提高咳嗽识别率,对疾病的诊断有着重要意义。从语音识别中被广泛应用的Mel倒谱参数出发,寻找咳嗽和语音在Mel倒谱参数中的区别。基于Mel倒谱参数的原理,将其计算过程中的Mel刻度滤波器对数能量的极值数分布情况提取出来作为咳嗽的识别特征。在病房环境下对录音文件进行实验,得到的咳嗽识别率为90%以上,同时能够将语音等非咳嗽信号有效地剔除,实验结果显示90%以上的语音信号被排除。在录音设备及环境等各项参数不变的条件下,对不同病人样本,可使用同一阈值对咳嗽进行识别。该方法过程简单,数据计算量小,便于快速识别。 展开更多
关键词 Mel倒谱参数(mel-frequency CEPSTRUM coefficient mfcc) Mel刻度滤波器对数能量 咳嗽识别
下载PDF
Environmental Sound Classification Using Deep Learning 被引量:7
6
作者 SHANTHAKUMAR S SHAKILA S +1 位作者 SUNETH Pathirana JAYALATH Ekanayake 《Instrumentation》 2020年第3期15-22,共8页
Perhaps hearing impairment individuals cannot identify the environmental sounds due to noise around them.However,very little research has been conducted in this domain.Hence,the aim of this study is to categorize soun... Perhaps hearing impairment individuals cannot identify the environmental sounds due to noise around them.However,very little research has been conducted in this domain.Hence,the aim of this study is to categorize sounds generated in the environment so that the impairment individuals can distinguish the sound categories.To that end first we define nine sound classes--air conditioner,car horn,children playing,dog bark,drilling,engine idling,jackhammer,siren,and street music--typically exist in the environment.Then we record 100 sound samples from each category and extract features of each sound category using Mel-Frequency Cepstral Coefficients(MFCC).The training dataset is developed using this set of features together with the class variable;sound category.Sound classification is a complex task and hence,we use two Deep Learning techniques;Multi Layer Perceptron(MLP)and Convolution Neural Network(CNN)to train classification models.The models are tested using a separate test set and the performances of the models are evaluated using precision,recall and F1-score.The results show that the CNN model outperforms the MLP.However,the MLP also provided a decent accuracy in classifying unknown environmental sounds. 展开更多
关键词 mel-frequency Cepstral coefficients mfcc Multi-Layer Perceptron MLP Convolutional Neural Network CNN
下载PDF
Application of formant instantaneous characteristics to speech recognition and speaker identification
7
作者 侯丽敏 胡晓宁 谢娟敏 《Journal of Shanghai University(English Edition)》 CAS 2011年第2期123-127,共5页
This paper proposes a new phase feature derived from the formant instantaneous characteristics for speech recognition (SR) and speaker identification (SI) systems. Using Hilbert transform (HT), the formant chara... This paper proposes a new phase feature derived from the formant instantaneous characteristics for speech recognition (SR) and speaker identification (SI) systems. Using Hilbert transform (HT), the formant characteristics can be represented by instantaneous frequency (IF) and instantaneous bandwidth, namely formant instantaneous characteristics (FIC). In order to explore the importance of FIC both in SR and SI, this paper proposes different features from FIC used for SR and SI systems. When combing these new features with conventional parameters, higher identification rate can be achieved than that of using Mel-frequency cepstral coefficients (MFCC) parameters only. The experiment results show that the new features are effective characteristic parameters and can be treated as the compensation of conventional parameters for SR and SI. 展开更多
关键词 instantaneous frequency (IF) Hilbert transform (HT) speech recognition speaker identification mel-frequency cepstral coefficients mfcc
下载PDF
静态MFCC特征的性别差异性研究
8
作者 杨继臣 吴裕玲 苏杰华 《仲恺农业工程学院学报》 CAS 2011年第4期54-56,59,共4页
从男性、女性的静态美尔倒谱系数(Mel-frequency cepstral coefficients,MFCC)特征概率密度函数的峰值差异、平均值和方差等方面研究了静态MFCC特征的性别差异性.结果表明,在峰值方面,MFCC1、MFCC2、MFCC6、MFCC9和MFCC12的差异最大;在... 从男性、女性的静态美尔倒谱系数(Mel-frequency cepstral coefficients,MFCC)特征概率密度函数的峰值差异、平均值和方差等方面研究了静态MFCC特征的性别差异性.结果表明,在峰值方面,MFCC1、MFCC2、MFCC6、MFCC9和MFCC12的差异最大;在均值方面,男性MFCC特征分量大于女性MFCC特征分量;在方差方面,大部分男性MFCC特征分量小于女性MFCC特征分量. 展开更多
关键词 mfcc(mel-frequency cepstral coefficients)特征 性别差异 峰值差异 平均值 方差
下载PDF
Improved MFCC-Based Feature for Robust Speaker Identification 被引量:7
9
作者 吴尊敬 曹志刚 《Tsinghua Science and Technology》 SCIE EI CAS 2005年第2期158-161,共4页
The Mel-frequency cepstral coefficient (MFCC) is the most widely used feature in speech and speaker recognition. However, MFCC is very sensitive to noise interference, which tends to drastically de- grade the perfor... The Mel-frequency cepstral coefficient (MFCC) is the most widely used feature in speech and speaker recognition. However, MFCC is very sensitive to noise interference, which tends to drastically de- grade the performance of recognition systems because of the mismatches between training and testing. In this paper, the logarithmic transformation in the standard MFCC analysis is replaced by a combined function to improve the noisy sensitivity. The proposed feature extraction process is also combined with speech en- hancement methods, such as spectral subtraction and median-filter to further suppress the noise. Experi- ments show that the proposed robust MFCC-based feature significantly reduces the recognition error rate over a wide signal-to-noise ratio range. 展开更多
关键词 mel-frequency cepstral coefficient (mfcc) robust speaker identification feature extraction
原文传递
Audiovisual speech recognition based on a deep convolutional neural network
10
作者 Shashidhar Rudregowda Sudarshan Patilkulkarni +2 位作者 Vinayakumar Ravi Gururaj H.L. Moez Krichen 《Data Science and Management》 2024年第1期25-34,共10页
Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for India... Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for Indian English linguistics and categorized it into three main categories:(1)audio recognition,(2)visual feature extraction,and(3)combined audio and visual recognition.Audio features were extracted using the mel-frequency cepstral coefficient,and classification was performed using a one-dimension convolutional neural network.Visual feature extraction uses Dlib and then classifies visual speech using a long short-term memory type of recurrent neural networks.Finally,integration was performed using a deep convolutional network.The audio speech of Indian English was successfully recognized with accuracies of 93.67%and 91.53%,respectively,using testing data from 200 epochs.The training accuracy for visual speech recognition using the Indian English dataset was 77.48%and the test accuracy was 76.19%using 60 epochs.After integration,the accuracies of audiovisual speech recognition using the Indian English dataset for training and testing were 94.67%and 91.75%,respectively. 展开更多
关键词 Audiovisual speech recognition Custom dataset 1D Convolution neural network(CNN) Deep CNN(DCNN) Long short-term memory(LSTM) Lipreading Dlib mel-frequency cepstral coefficient(mfcc)
下载PDF
English Speech Recognition System on Chip
11
作者 刘鸿 钱彦旻 刘加 《Tsinghua Science and Technology》 SCIE EI CAS 2011年第1期95-99,共5页
An English speech recognition system was implemented on a chip, called speech system-on-chip (SoC). The SoC included an application specific integrated circuit with a vector accelerator to improve performance. The s... An English speech recognition system was implemented on a chip, called speech system-on-chip (SoC). The SoC included an application specific integrated circuit with a vector accelerator to improve performance. The sub-word model based on a continuous density hidden Markov model recognition algorithm ran on a very cheap speech chip. The algorithm was a two-stage fixed-width beam-search baseline system with a variable beam-width pruning strategy and a frame-synchronous word-level pruning strategy to significantly reduce the recognition time. Tests show that this method reduces the recognition time nearly 6 fold and the memory size nearly 2 fold compared to the original system, with less than 1% accuracy degradation for a 600 word recognition task and recognition accuracy rate of about 98%. 展开更多
关键词 non-specific human voice-consciousness SYSTEM-ON-CHIP mel-frequency cepstral coefficients mfcc
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部