期刊文献+
共找到10篇文章
< 1 >
每页显示 20 50 100
采用GW-MFCC模型空间参数的语音情感识别 被引量:1
1
作者 沈燕 肖仲喆 +3 位作者 李冰洁 周孝进 周强 陶智 《计算机工程与应用》 CSCD 北大核心 2015年第10期219-222,226,共5页
针对单一语音特征对语音情感表达不完整的问题,将具有良好量化和插值特性的LSF参数与体现人耳听觉特性的MFCC参数相融合,提出基于线谱权重的MFCC(WMFCC)新特征。同时,通过高斯混合模型来对该参数建立模型空间,进一步得到GW-MFCC模型空... 针对单一语音特征对语音情感表达不完整的问题,将具有良好量化和插值特性的LSF参数与体现人耳听觉特性的MFCC参数相融合,提出基于线谱权重的MFCC(WMFCC)新特征。同时,通过高斯混合模型来对该参数建立模型空间,进一步得到GW-MFCC模型空间参数,以获取更高维的细节信息,进一步提高情感识别性能。采用柏林情感语料库进行验证,新参数的识别率比传统的MFCC和LSF分别有5.7%和6.9%的提高。实验结果表明,提出的WMFCC以及GW-MFCC参数可以有效地表现语音情感信息,提高语音情感识别率。 展开更多
关键词 语音情感识别 线谱对频率(LSF) Mel频率倒谱系数(mfcc) 高斯混合模型 模型空间
下载PDF
A Comparison of Classifiers in Performing Speaker Accent Recognition Using MFCCs
2
作者 Zichen Ma Ernest Fokoué 《Open Journal of Statistics》 2014年第4期258-266,共9页
An algorithm involving Mel-Frequency Cepstral Coefficients (MFCCs) is provided to perform signal feature extraction for the task of speaker accent recognition. Then different classifiers are compared based on the MFCC... An algorithm involving Mel-Frequency Cepstral Coefficients (MFCCs) is provided to perform signal feature extraction for the task of speaker accent recognition. Then different classifiers are compared based on the MFCC feature. For each signal, the mean vector of MFCC matrix is used as an input vector for pattern recognition. A sample of 330 signals, containing 165 US voice and 165 non-US voice, is analyzed. By comparison, k-nearest neighbors yield the highest average test accuracy, after using a cross-validation of size 500, and least time being used in the computation. 展开更多
关键词 SPEAKER ACCENT RECOGNITION mel-frequency cepstral Coefficients (mfccs) DISCRIMINANT Analysis Support Vector Machines (SVMs) k-Nearest NEIGHBORS
下载PDF
Audiovisual speech recognition based on a deep convolutional neural network
3
作者 Shashidhar Rudregowda Sudarshan Patilkulkarni +2 位作者 Vinayakumar Ravi Gururaj H.L. Moez Krichen 《Data Science and Management》 2024年第1期25-34,共10页
Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for India... Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for Indian English linguistics and categorized it into three main categories:(1)audio recognition,(2)visual feature extraction,and(3)combined audio and visual recognition.Audio features were extracted using the mel-frequency cepstral coefficient,and classification was performed using a one-dimension convolutional neural network.Visual feature extraction uses Dlib and then classifies visual speech using a long short-term memory type of recurrent neural networks.Finally,integration was performed using a deep convolutional network.The audio speech of Indian English was successfully recognized with accuracies of 93.67%and 91.53%,respectively,using testing data from 200 epochs.The training accuracy for visual speech recognition using the Indian English dataset was 77.48%and the test accuracy was 76.19%using 60 epochs.After integration,the accuracies of audiovisual speech recognition using the Indian English dataset for training and testing were 94.67%and 91.75%,respectively. 展开更多
关键词 Audiovisual speech recognition Custom dataset 1D Convolution neural network(CNN) Deep CNN(DCNN) Long short-term memory(LSTM) LIPREADING Dlib mel-frequency cepstral coefficient(mfcc)
下载PDF
基于余弦相似度的动态语音特征提取算法 被引量:11
4
作者 艾佳琪 左毅 +3 位作者 刘君霞 贺培超 李铁山 陈俊龙 《计算机应用研究》 CSCD 北大核心 2020年第S02期147-149,共3页
为进一步研究语音特征提取方法,分析了基于逆离散余弦变换倒谱系数(IDCT CC)的语音特征,利用频域语音信号间的余弦相似度(cosine similarity)特性将IDCT CC进行层次聚类,得到14维频域语音特征向量(feature vector),称之为C-vector。实验... 为进一步研究语音特征提取方法,分析了基于逆离散余弦变换倒谱系数(IDCT CC)的语音特征,利用频域语音信号间的余弦相似度(cosine similarity)特性将IDCT CC进行层次聚类,得到14维频域语音特征向量(feature vector),称之为C-vector。实验中,建立基于高斯混合模型(Gaussian mixture model,GMM)的说话人识别模型对C-vector进行识别精度和时间的讨论,并与经典的梅尔频率倒谱系数和等频域倒谱系数(histogram of DCT cepstrum coefficients,HDCC)进行对比实验。通过具体的实验结果比较,提出的C-vector在识别精度方面比MFCC和HDCC分别高出7%和5%。而且,C-vector在多人语音集下表现出的识别能力更为优异。 展开更多
关键词 说话人识别 语音特征 梅尔频率倒谱系数(mel-frequency cepstral coefficients mfcc) 逆离散余弦变换倒谱系数(inrerse discrete cosine tromsform cepstrwm coefficient IDCT CC) 余弦相似度 层次聚类分析
下载PDF
Environmental Sound Classification Using Deep Learning 被引量:7
5
作者 SHANTHAKUMAR S SHAKILA S +1 位作者 SUNETH Pathirana JAYALATH Ekanayake 《Instrumentation》 2020年第3期15-22,共8页
Perhaps hearing impairment individuals cannot identify the environmental sounds due to noise around them.However,very little research has been conducted in this domain.Hence,the aim of this study is to categorize soun... Perhaps hearing impairment individuals cannot identify the environmental sounds due to noise around them.However,very little research has been conducted in this domain.Hence,the aim of this study is to categorize sounds generated in the environment so that the impairment individuals can distinguish the sound categories.To that end first we define nine sound classes--air conditioner,car horn,children playing,dog bark,drilling,engine idling,jackhammer,siren,and street music--typically exist in the environment.Then we record 100 sound samples from each category and extract features of each sound category using Mel-Frequency Cepstral Coefficients(MFCC).The training dataset is developed using this set of features together with the class variable;sound category.Sound classification is a complex task and hence,we use two Deep Learning techniques;Multi Layer Perceptron(MLP)and Convolution Neural Network(CNN)to train classification models.The models are tested using a separate test set and the performances of the models are evaluated using precision,recall and F1-score.The results show that the CNN model outperforms the MLP.However,the MLP also provided a decent accuracy in classifying unknown environmental sounds. 展开更多
关键词 mel-frequency cepstral Coefficients mfcc Multi-Layer Perceptron MLP Convolutional Neural Network CNN
下载PDF
Application of formant instantaneous characteristics to speech recognition and speaker identification
6
作者 侯丽敏 胡晓宁 谢娟敏 《Journal of Shanghai University(English Edition)》 CAS 2011年第2期123-127,共5页
This paper proposes a new phase feature derived from the formant instantaneous characteristics for speech recognition (SR) and speaker identification (SI) systems. Using Hilbert transform (HT), the formant chara... This paper proposes a new phase feature derived from the formant instantaneous characteristics for speech recognition (SR) and speaker identification (SI) systems. Using Hilbert transform (HT), the formant characteristics can be represented by instantaneous frequency (IF) and instantaneous bandwidth, namely formant instantaneous characteristics (FIC). In order to explore the importance of FIC both in SR and SI, this paper proposes different features from FIC used for SR and SI systems. When combing these new features with conventional parameters, higher identification rate can be achieved than that of using Mel-frequency cepstral coefficients (MFCC) parameters only. The experiment results show that the new features are effective characteristic parameters and can be treated as the compensation of conventional parameters for SR and SI. 展开更多
关键词 instantaneous frequency (IF) Hilbert transform (HT) speech recognition speaker identification mel-frequency cepstral coefficients (mfcc
下载PDF
Challenges and Limitations in Speech Recognition Technology:A Critical Review of Speech Signal Processing Algorithms,Tools and Systems
7
作者 Sneha Basak Himanshi Agrawal +4 位作者 Shreya Jena Shilpa Gite Mrinal Bachute Biswajeet Pradhan Mazen Assiri 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第5期1053-1089,共37页
Speech recognition systems have become a unique human-computer interaction(HCI)family.Speech is one of the most naturally developed human abilities;speech signal processing opens up a transparent and hand-free computa... Speech recognition systems have become a unique human-computer interaction(HCI)family.Speech is one of the most naturally developed human abilities;speech signal processing opens up a transparent and hand-free computation experience.This paper aims to present a retrospective yet modern approach to the world of speech recognition systems.The development journey of ASR(Automatic Speech Recognition)has seen quite a few milestones and breakthrough technologies that have been highlighted in this paper.A step-by-step rundown of the fundamental stages in developing speech recognition systems has been presented,along with a brief discussion of various modern-day developments and applications in this domain.This review paper aims to summarize and provide a beginning point for those starting in the vast field of speech signal processing.Since speech recognition has a vast potential in various industries like telecommunication,emotion recognition,healthcare,etc.,this review would be helpful to researchers who aim at exploring more applications that society can quickly adopt in future years of evolution. 展开更多
关键词 Speech recognition automatic speech recognition(ASR) mel-frequency cepstral coefficients(mfcc) hidden Markov model(HMM) artificial neural network(ANN)
下载PDF
静态MFCC特征的性别差异性研究
8
作者 杨继臣 吴裕玲 苏杰华 《仲恺农业工程学院学报》 CAS 2011年第4期54-56,59,共4页
从男性、女性的静态美尔倒谱系数(Mel-frequency cepstral coefficients,MFCC)特征概率密度函数的峰值差异、平均值和方差等方面研究了静态MFCC特征的性别差异性.结果表明,在峰值方面,MFCC1、MFCC2、MFCC6、MFCC9和MFCC12的差异最大;在... 从男性、女性的静态美尔倒谱系数(Mel-frequency cepstral coefficients,MFCC)特征概率密度函数的峰值差异、平均值和方差等方面研究了静态MFCC特征的性别差异性.结果表明,在峰值方面,MFCC1、MFCC2、MFCC6、MFCC9和MFCC12的差异最大;在均值方面,男性MFCC特征分量大于女性MFCC特征分量;在方差方面,大部分男性MFCC特征分量小于女性MFCC特征分量. 展开更多
关键词 mfcc(mel-frequency cepstral coefficients)特征 性别差异 峰值差异 平均值 方差
下载PDF
Improved MFCC-Based Feature for Robust Speaker Identification 被引量:7
9
作者 吴尊敬 曹志刚 《Tsinghua Science and Technology》 SCIE EI CAS 2005年第2期158-161,共4页
The Mel-frequency cepstral coefficient (MFCC) is the most widely used feature in speech and speaker recognition. However, MFCC is very sensitive to noise interference, which tends to drastically de- grade the perfor... The Mel-frequency cepstral coefficient (MFCC) is the most widely used feature in speech and speaker recognition. However, MFCC is very sensitive to noise interference, which tends to drastically de- grade the performance of recognition systems because of the mismatches between training and testing. In this paper, the logarithmic transformation in the standard MFCC analysis is replaced by a combined function to improve the noisy sensitivity. The proposed feature extraction process is also combined with speech en- hancement methods, such as spectral subtraction and median-filter to further suppress the noise. Experi- ments show that the proposed robust MFCC-based feature significantly reduces the recognition error rate over a wide signal-to-noise ratio range. 展开更多
关键词 mel-frequency cepstral coefficient (mfcc) robust speaker identification feature extraction
原文传递
English Speech Recognition System on Chip
10
作者 刘鸿 钱彦旻 刘加 《Tsinghua Science and Technology》 SCIE EI CAS 2011年第1期95-99,共5页
An English speech recognition system was implemented on a chip, called speech system-on-chip (SoC). The SoC included an application specific integrated circuit with a vector accelerator to improve performance. The s... An English speech recognition system was implemented on a chip, called speech system-on-chip (SoC). The SoC included an application specific integrated circuit with a vector accelerator to improve performance. The sub-word model based on a continuous density hidden Markov model recognition algorithm ran on a very cheap speech chip. The algorithm was a two-stage fixed-width beam-search baseline system with a variable beam-width pruning strategy and a frame-synchronous word-level pruning strategy to significantly reduce the recognition time. Tests show that this method reduces the recognition time nearly 6 fold and the memory size nearly 2 fold compared to the original system, with less than 1% accuracy degradation for a 600 word recognition task and recognition accuracy rate of about 98%. 展开更多
关键词 non-specific human voice-consciousness SYSTEM-ON-CHIP mel-frequency cepstral coefficients (mfcc
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部