期刊文献+
共找到153篇文章
< 1 2 8 >
每页显示 20 50 100
Audio-Text Multimodal Speech Recognition via Dual-Tower Architecture for Mandarin Air Traffic Control Communications
1
作者 Shuting Ge Jin Ren +3 位作者 Yihua Shi Yujun Zhang Shunzhi Yang Jinfeng Yang 《Computers, Materials & Continua》 SCIE EI 2024年第3期3215-3245,共31页
In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a p... In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues, we propose a speech-text multimodal dual-tower architecture for speech recognition. It employs cross-modal interactions to achieve close semantic alignment during the encoding stage and strengthen its capabilities in modeling auditory long-distance context dependencies. In addition, a two-stage training strategy is elaborately devised to derive semantics-aware acoustic representations effectively. The first stage focuses on pre-training the speech-text multimodal encoding module to enhance inter-modal semantic alignment and aural long-distance context dependencies. The second stage fine-tunes the entire network to bridge the input modality variation gap between the training and inference phases and boost generalization performance. Extensive experiments demonstrate the effectiveness of the proposed speech-text multimodal speech recognition method on the ATCC and AISHELL-1 datasets. It reduces the character error rate to 6.54% and 8.73%, respectively, and exhibits substantial performance gains of 28.76% and 23.82% compared with the best baseline model. The case studies indicate that the obtained semantics-aware acoustic representations aid in accurately recognizing terms with similar pronunciations but distinctive semantics. The research provides a novel modeling paradigm for semantics-aware speech recognition in air traffic control communications, which could contribute to the advancement of intelligent and efficient aviation safety management. 展开更多
关键词 speech-text multimodal automatic speech recognition semantic alignment air traffic control communications dual-tower architecture
下载PDF
Multi-Objective Equilibrium Optimizer for Feature Selection in High-Dimensional English Speech Emotion Recognition
2
作者 Liya Yue Pei Hu +1 位作者 Shu-Chuan Chu Jeng-Shyang Pan 《Computers, Materials & Continua》 SCIE EI 2024年第2期1957-1975,共19页
Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is ext... Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is extremely high,so we introduce a hybrid filter-wrapper feature selection algorithm based on an improved equilibrium optimizer for constructing an emotion recognition system.The proposed algorithm implements multi-objective emotion recognition with the minimum number of selected features and maximum accuracy.First,we use the information gain and Fisher Score to sort the features extracted from signals.Then,we employ a multi-objective ranking method to evaluate these features and assign different importance to them.Features with high rankings have a large probability of being selected.Finally,we propose a repair strategy to address the problem of duplicate solutions in multi-objective feature selection,which can improve the diversity of solutions and avoid falling into local traps.Using random forest and K-nearest neighbor classifiers,four English speech emotion datasets are employed to test the proposed algorithm(MBEO)as well as other multi-objective emotion identification techniques.The results illustrate that it performs well in inverted generational distance,hypervolume,Pareto solutions,and execution time,and MBEO is appropriate for high-dimensional English SER. 展开更多
关键词 speech emotion recognition filter-wrapper HIGH-DIMENSIONAL feature selection equilibrium optimizer MULTI-OBJECTIVE
下载PDF
Exploring Sequential Feature Selection in Deep Bi-LSTM Models for Speech Emotion Recognition
3
作者 Fatma Harby Mansor Alohali +1 位作者 Adel Thaljaoui Amira Samy Talaat 《Computers, Materials & Continua》 SCIE EI 2024年第2期2689-2719,共31页
Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotiona... Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotional states of speakers holds significant importance in a range of real-time applications,including but not limited to virtual reality,human-robot interaction,emergency centers,and human behavior assessment.Accurately identifying emotions in the SER process relies on extracting relevant information from audio inputs.Previous studies on SER have predominantly utilized short-time characteristics such as Mel Frequency Cepstral Coefficients(MFCCs)due to their ability to capture the periodic nature of audio signals effectively.Although these traits may improve their ability to perceive and interpret emotional depictions appropriately,MFCCS has some limitations.So this study aims to tackle the aforementioned issue by systematically picking multiple audio cues,enhancing the classifier model’s efficacy in accurately discerning human emotions.The utilized dataset is taken from the EMO-DB database,preprocessing input speech is done using a 2D Convolution Neural Network(CNN)involves applying convolutional operations to spectrograms as they afford a visual representation of the way the audio signal frequency content changes over time.The next step is the spectrogram data normalization which is crucial for Neural Network(NN)training as it aids in faster convergence.Then the five auditory features MFCCs,Chroma,Mel-Spectrogram,Contrast,and Tonnetz are extracted from the spectrogram sequentially.The attitude of feature selection is to retain only dominant features by excluding the irrelevant ones.In this paper,the Sequential Forward Selection(SFS)and Sequential Backward Selection(SBS)techniques were employed for multiple audio cues features selection.Finally,the feature sets composed from the hybrid feature extraction methods are fed into the deep Bidirectional Long Short Term Memory(Bi-LSTM)network to discern emotions.Since the deep Bi-LSTM can hierarchically learn complex features and increases model capacity by achieving more robust temporal modeling,it is more effective than a shallow Bi-LSTM in capturing the intricate tones of emotional content existent in speech signals.The effectiveness and resilience of the proposed SER model were evaluated by experiments,comparing it to state-of-the-art SER techniques.The results indicated that the model achieved accuracy rates of 90.92%,93%,and 92%over the Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS),Berlin Database of Emotional Speech(EMO-DB),and The Interactive Emotional Dyadic Motion Capture(IEMOCAP)datasets,respectively.These findings signify a prominent enhancement in the ability to emotional depictions identification in speech,showcasing the potential of the proposed model in advancing the SER field. 展开更多
关键词 Artificial intelligence application multi features sequential selection speech emotion recognition deep Bi-LSTM
下载PDF
Audiovisual speech recognition based on a deep convolutional neural network
4
作者 Shashidhar Rudregowda Sudarshan Patilkulkarni +2 位作者 Vinayakumar Ravi Gururaj H.L. Moez Krichen 《Data Science and Management》 2024年第1期25-34,共10页
Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for India... Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for Indian English linguistics and categorized it into three main categories:(1)audio recognition,(2)visual feature extraction,and(3)combined audio and visual recognition.Audio features were extracted using the mel-frequency cepstral coefficient,and classification was performed using a one-dimension convolutional neural network.Visual feature extraction uses Dlib and then classifies visual speech using a long short-term memory type of recurrent neural networks.Finally,integration was performed using a deep convolutional network.The audio speech of Indian English was successfully recognized with accuracies of 93.67%and 91.53%,respectively,using testing data from 200 epochs.The training accuracy for visual speech recognition using the Indian English dataset was 77.48%and the test accuracy was 76.19%using 60 epochs.After integration,the accuracies of audiovisual speech recognition using the Indian English dataset for training and testing were 94.67%and 91.75%,respectively. 展开更多
关键词 Audiovisual speech recognition Custom dataset 1D Convolution neural network(CNN) Deep CNN(DCNN) Long short-term memory(LSTM) LIPREADING Dlib Mel-frequency cepstral coefficient(MFCC)
下载PDF
A novel speech emotion recognition algorithm based on combination of emotion data field and ant colony search strategy 被引量:3
5
作者 查诚 陶华伟 +3 位作者 张昕然 周琳 赵力 杨平 《Journal of Southeast University(English Edition)》 EI CAS 2016年第2期158-163,共6页
In order to effectively conduct emotion recognition from spontaneous, non-prototypical and unsegmented speech so as to create a more natural human-machine interaction; a novel speech emotion recognition algorithm base... In order to effectively conduct emotion recognition from spontaneous, non-prototypical and unsegmented speech so as to create a more natural human-machine interaction; a novel speech emotion recognition algorithm based on the combination of the emotional data field (EDF) and the ant colony search (ACS) strategy, called the EDF-ACS algorithm, is proposed. More specifically, the inter- relationship among the turn-based acoustic feature vectors of different labels are established by using the potential function in the EDF. To perform the spontaneous speech emotion recognition, the artificial colony is used to mimic the turn- based acoustic feature vectors. Then, the canonical ACS strategy is used to investigate the movement direction of each artificial ant in the EDF, which is regarded as the emotional label of the corresponding turn-based acoustic feature vector. The proposed EDF-ACS algorithm is evaluated on the continueous audio)'visual emotion challenge (AVEC) 2012 dataset, which contains the spontaneous, non-prototypical and unsegmented speech emotion data. The experimental results show that the proposed EDF-ACS algorithm outperforms the existing state-of-the-art algorithm in turn-based speech emotion recognition. 展开更多
关键词 speech emotion recognition emotional data field ant colony search human-machine interaction
下载PDF
Auditory attention model based on Chirplet for cross-corpus speech emotion recognition 被引量:1
6
作者 张昕然 宋鹏 +2 位作者 查诚 陶华伟 赵力 《Journal of Southeast University(English Edition)》 EI CAS 2016年第4期402-407,共6页
To solve the problem of mismatching features in an experimental database, which is a key technique in the field of cross-corpus speech emotion recognition, an auditory attention model based on Chirplet is proposed for... To solve the problem of mismatching features in an experimental database, which is a key technique in the field of cross-corpus speech emotion recognition, an auditory attention model based on Chirplet is proposed for feature extraction.First, in order to extract the spectra features, the auditory attention model is employed for variational emotion features detection. Then, the selective attention mechanism model is proposed to extract the salient gist features which showtheir relation to the expected performance in cross-corpus testing.Furthermore, the Chirplet time-frequency atoms are introduced to the model. By forming a complete atom database, the Chirplet can improve the spectrum feature extraction including the amount of information. Samples from multiple databases have the characteristics of multiple components. Hereby, the Chirplet expands the scale of the feature vector in the timefrequency domain. Experimental results show that, compared to the traditional feature model, the proposed feature extraction approach with the prototypical classifier has significant improvement in cross-corpus speech recognition. In addition, the proposed method has better robustness to the inconsistent sources of the training set and the testing set. 展开更多
关键词 speech emotion recognition selective attention mechanism spectrogram feature cross-corpus
下载PDF
Speech emotion recognition via discriminant-cascading dimensionality reduction 被引量:1
7
作者 王如刚 徐新洲 +3 位作者 黄程韦 吴尘 张昕然 赵力 《Journal of Southeast University(English Edition)》 EI CAS 2016年第2期151-157,共7页
In order to accurately identify speech emotion information, the discriminant-cascading effect in dimensionality reduction of speech emotion recognition is investigated. Based on the existing locality preserving projec... In order to accurately identify speech emotion information, the discriminant-cascading effect in dimensionality reduction of speech emotion recognition is investigated. Based on the existing locality preserving projections and graph embedding framework, a novel discriminant-cascading dimensionality reduction method is proposed, which is named discriminant-cascading locality preserving projections (DCLPP). The proposed method specifically utilizes supervised embedding graphs and it keeps the original space for the inner products of samples to maintain enough information for speech emotion recognition. Then, the kernel DCLPP (KDCLPP) is also proposed to extend the mapping form. Validated by the experiments on the corpus of EMO-DB and eNTERFACE'05, the proposed method can clearly outperform the existing common dimensionality reduction methods, such as principal component analysis (PCA), linear discriminant analysis (LDA), locality preserving projections (LPP), local discriminant embedding (LDE), graph-based Fisher analysis (GbFA) and so on, with different categories of classifiers. 展开更多
关键词 speech emotion recognition discriminant-cascading locality preserving projections DISCRIMINANTANALYSIS dimensionality reduction
下载PDF
Recognition of Speech Based on HMM/MLP Hybrid Network
8
作者 黄心晔 马小辉 +2 位作者 李想 富煜清 陆佶人 《Journal of Southeast University(English Edition)》 EI CAS 2000年第2期26-30,共5页
This paper presents a new HMM/MLP hybrid network for speech recognition. By taking advantage of the discriminative training of MLP, the unreasonable model correctness assumption on the model correctness of the ML trai... This paper presents a new HMM/MLP hybrid network for speech recognition. By taking advantage of the discriminative training of MLP, the unreasonable model correctness assumption on the model correctness of the ML training in basic HMM can be overcome, and its discriminative ability and recognition performance can be improved. Experimental results demonstrate that the discriminative ability and recognition performance of HMM/MLP is apparently better than normal HMM. 展开更多
关键词 HMM/MLP hybrid network discriminative training speech recognition
下载PDF
SPEECH ENHANCEMENT METHOD FOR LPC AUTOREGRESSIVE MODEL AND SYSTEM IMPLEMENTATION OF COMMAND WORD RECOGNITION USED IN NOISY ENVIRONMENT
9
作者 王承发 吕成国 +1 位作者 孙立新 李俊庆 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI 1998年第2期113-120,共8页
At present, almost all the systems and products for speech recognition are working in quiet environment and their performances are degraded or even can′t work when they are operated in high noisy environment. In this... At present, almost all the systems and products for speech recognition are working in quiet environment and their performances are degraded or even can′t work when they are operated in high noisy environment. In this paper, after analyzing the features of speech and noise, a speech enhancement method for LPC autoregressive model for command words recognition used in noisy environment is proposed, and an experimental system is realized. In different background noisy environments, we conduct experiments about SNR, basic accuracy, noise resistant ability and system environment adaptability with different microphones. The experimental results show that the system has good recognition performance in high noisy environments. The system can resist many kinds of noises and meet the needs of application areas on the whole such as military, traffic, marketplace and factory etc. 展开更多
关键词 speech recognition noisy environment Wiener filter
下载PDF
Principal Component Feature for ANN-Based Speech Recognition
10
作者 顾明亮 王太君 +1 位作者 史笑兴 何振亚 《Journal of Southeast University(English Edition)》 EI CAS 1998年第2期13-18,共6页
Using function approximation technology and principal component analysis method, this paper presents a principal component feature to solve the time alignment problem and to simplify the structure of neural network. I... Using function approximation technology and principal component analysis method, this paper presents a principal component feature to solve the time alignment problem and to simplify the structure of neural network. Its extraction simulates the processing of speech information in human auditory system. The experimental results show that the principal component feature based recognition system outperforms the standard CDHMM and GMDS method in many aspects. 展开更多
关键词 principal component analysis feature extraction speech recognition
下载PDF
Novel feature fusion method for speech emotion recognition based on multiple kernel learning
11
作者 金赟 宋鹏 +1 位作者 郑文明 赵力 《Journal of Southeast University(English Edition)》 EI CAS 2013年第2期129-133,共5页
In order to improve the performance of speech emotion recognition, a novel feature fusion method is proposed. Based on the global features, the local information of different kinds of features is utilized. Both the gl... In order to improve the performance of speech emotion recognition, a novel feature fusion method is proposed. Based on the global features, the local information of different kinds of features is utilized. Both the global and the local features are combined together. Moreover, the multiple kernel learning method is adopted. The global features and each kind of local feature are respectively associated with a kernel, and all these kernels are added together with different weights to obtain a mixed kernel for nonlinear mapping. In the reproducing kernel Hilbert space, different kinds of emotional features can be easily classified. In the experiments, the popular Berlin dataset is used, and the optimal parameters of the global and the local kernels are determined by cross-validation. After computing using multiple kernel learning, the weights of all the kernels are obtained, which shows that the formant and intensity features play a key role in speech emotion recognition. The classification results show that the recognition rate is 78. 74% by using the global kernel, and it is 81.10% by using the proposed method, which demonstrates the effectiveness of the proposed method. 展开更多
关键词 speech emotion recognition multiple kemellearning feature fusion support vector machine
下载PDF
Comparative Study on VQ-Based Efficient Mandarin Speech Recognition Method
12
作者 谢湘 赵军辉 匡镜明 《Journal of Beijing Institute of Technology》 EI CAS 2002年第3期266-270,共5页
A VQ based efficient speech recognition method is introduced, and the key parameters of this method are comparatively studied. This method is especially designed for mandarin speaker dependent small size word set r... A VQ based efficient speech recognition method is introduced, and the key parameters of this method are comparatively studied. This method is especially designed for mandarin speaker dependent small size word set recognition. It has less complexity, less resource consumption but higher ARR (accurate recognition rate) compared with traditional HMM or NN approach. A large scale test on the task of 11 mandarin digits recognition shows that the WER(word error rate) can reach 3 86%. This method is suitable for being embedded in PDA (personal digital assistant), mobile phone and so on to perform voice controlling like digits dialing, name dialing, calculating, voice commanding, etc. 展开更多
关键词 speech recognition vector quantization(VQ) speaker dependent digits recognition
下载PDF
Discriminative tone model training and optimal integration for Mandarin speech recognition
13
作者 黄浩 朱杰 《Journal of Southeast University(English Edition)》 EI CAS 2007年第2期174-178,共5页
Two discriminative methods for solving tone problems in Mandarin speech recognition are presented. First, discriminative training on the HMM (hidden Markov model) based tone models is proposed. Then an integration t... Two discriminative methods for solving tone problems in Mandarin speech recognition are presented. First, discriminative training on the HMM (hidden Markov model) based tone models is proposed. Then an integration technique of tone models into a large vocabulary continuous speech recognition system is presented. Discriminative model weight training based on minimum phone error criteria is adopted aiming at optimal integration of the tone models. The extended Baum Welch algorithm is applied to find the model-dependent weights to scale the acoustic scores and tone scores. Experimental results show that tone recognition rates and continuous speech recognition accuracy can be improved by the discriminatively trained tone model. Performance of a large vocabulary continuous Mandarin speech recognition system can be further enhanced by the discriminatively trained weight combinations due to a better interpolation of the given models. 展开更多
关键词 discriminative training minimum phone error tone modeling Mandarin speech recognition
下载PDF
Adaptive bands filter bank optimized by genetic algorithm for robust speech recognition system 被引量:5
14
作者 黄丽霞 G.Evangelista 张雪英 《Journal of Central South University》 SCIE EI CAS 2011年第5期1595-1601,共7页
Perceptual auditory filter banks such as Bark-scale filter bank are widely used as front-end processing in speech recognition systems.However,the problem of the design of optimized filter banks that provide higher acc... Perceptual auditory filter banks such as Bark-scale filter bank are widely used as front-end processing in speech recognition systems.However,the problem of the design of optimized filter banks that provide higher accuracy in recognition tasks is still open.Owing to spectral analysis in feature extraction,an adaptive bands filter bank (ABFB) is presented.The design adopts flexible bandwidths and center frequencies for the frequency responses of the filters and utilizes genetic algorithm (GA) to optimize the design parameters.The optimization process is realized by combining the front-end filter bank with the back-end recognition network in the performance evaluation loop.The deployment of ABFB together with zero-crossing peak amplitude (ZCPA) feature as a front process for radial basis function (RBF) system shows significant improvement in robustness compared with the Bark-scale filter bank.In ABFB,several sub-bands are still more concentrated toward lower frequency but their exact locations are determined by the performance rather than the perceptual criteria.For the ease of optimization,only symmetrical bands are considered here,which still provide satisfactory results. 展开更多
关键词 perceptual filter banks bark scale speaker independent speech recognition systems zero-crossing peak amplitude genetic algorithm
下载PDF
Improved hidden Markov model for speech recognition and POS tagging 被引量:4
15
作者 袁里驰 《Journal of Central South University》 SCIE EI CAS 2012年第2期511-516,共6页
In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language proc... In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language processing. The speaker independently continuous speech recognition experiments and the part-of-speech tagging experiments show that Markov family model has higher performance than hidden Markov model. The precision is enhanced from 94.642% to 96.214% in the part-of-speech tagging experiments, and the work rate is reduced by 11.9% in the speech recognition experiments with respect to HMM baseline system. 展开更多
关键词 hidden Markov model Markov family model speech recognition part-of-speech tagging
下载PDF
An Innovative Approach Utilizing Binary-View Transformer for Speech Recognition Task 被引量:3
16
作者 Muhammad Babar Kamal Arfat Ahmad Khan +5 位作者 Faizan Ahmed Khan Malik Muhammad Ali Shahid Chitapong Wechtaisong Muhammad Daud Kamal Muhammad Junaid Ali Peerapong Uthansakul 《Computers, Materials & Continua》 SCIE EI 2022年第9期5547-5562,共16页
The deep learning advancements have greatly improved the performance of speech recognition systems,and most recent systems are based on the Recurrent Neural Network(RNN).Overall,the RNN works fine with the small seque... The deep learning advancements have greatly improved the performance of speech recognition systems,and most recent systems are based on the Recurrent Neural Network(RNN).Overall,the RNN works fine with the small sequence data,but suffers from the gradient vanishing problem in case of large sequence.The transformer networks have neutralized this issue and have shown state-of-the-art results on sequential or speech-related data.Generally,in speech recognition,the input audio is converted into an image using Mel-spectrogram to illustrate frequencies and intensities.The image is classified by the machine learning mechanism to generate a classification transcript.However,the audio frequency in the image has low resolution and causing inaccurate predictions.This paper presents a novel end-to-end binary view transformer-based architecture for speech recognition to cope with the frequency resolution problem.Firstly,the input audio signal is transformed into a 2D image using Mel-spectrogram.Secondly,the modified universal transformers utilize the multi-head attention to derive contextual information and derive different speech-related features.Moreover,a feedforward neural network is also deployed for classification.The proposed system has generated robust results on Google’s speech command dataset with an accuracy of 95.16%and with minimal loss.The binary-view transformer eradicates the eventuality of the over-fitting problem by deploying a multiview mechanism to diversify the input data,and multi-head attention captures multiple contexts from the data’s feature map. 展开更多
关键词 Convolution neural network multi-head attention MULTI-VIEW RNN self-attention speech recognition TRANSFORMER
下载PDF
SPEECH EMOTION RECOGNITION USING MODIFIED QUADRATIC DISCRIMINATION FUNCTION 被引量:9
17
作者 Zhao Yan Zhao Li Zou Cairong Yu Yinhua 《Journal of Electronics(China)》 2008年第6期840-844,共5页
Quadratic Discrimination Function (QDF) is commonly used in speech emotion recognition, which proceeds on the premise that the input data is normal distribution. In this paper, we propose a transformation to normali... Quadratic Discrimination Function (QDF) is commonly used in speech emotion recognition, which proceeds on the premise that the input data is normal distribution. In this paper, we propose a transformation to normalize the emotional features, emotion recognition. Features based on prosody then derivate a Modified QDF (MQDF) to speech and voice quality are extracted and Principal Component Analysis Neural Network (PCANN) is used to reduce dimension of the feature vectors. The results show that voice quality features are effective supplement for recognition, and the method in this paper could improve the recognition ratio effectively. 展开更多
关键词 speech emotion recognition Principal Component Analysis Neural Network (PCANN) Modified Quadratic Discrimination Function (MQDF)
下载PDF
Integrated search technique for parameter determination of SVM for speech recognition 被引量:2
18
作者 Teena Mittal R.K.Sharma 《Journal of Central South University》 SCIE EI CAS CSCD 2016年第6期1390-1398,共9页
Support vector machine(SVM)has a good application prospect for speech recognition problems;still optimum parameter selection is a vital issue for it.To improve the learning ability of SVM,a method for searching the op... Support vector machine(SVM)has a good application prospect for speech recognition problems;still optimum parameter selection is a vital issue for it.To improve the learning ability of SVM,a method for searching the optimal parameters based on integration of predator prey optimization(PPO)and Hooke-Jeeves method has been proposed.In PPO technique,population consists of prey and predator particles.The prey particles search the optimum solution and predator always attacks the global best prey particle.The solution obtained by PPO is further improved by applying Hooke-Jeeves method.Proposed method is applied to recognize isolated words in a Hindi speech database and also to recognize words in a benchmark database TI-20 in clean and noisy environment.A recognition rate of 81.5%for Hindi database and 92.2%for TI-20 database has been achieved using proposed technique. 展开更多
关键词 support vector machine (SVM) predator prey optimization speech recognition Mel-frequency cepstral coefficients wavelet packets Hooke-Jeeves method
下载PDF
Transformer-like model with linear attention for speech emotion recognition 被引量:3
19
作者 Du Jing Tang Manting Zhao Li 《Journal of Southeast University(English Edition)》 EI CAS 2021年第2期164-170,共7页
Because of the excellent performance of Transformer in sequence learning tasks,such as natural language processing,an improved Transformer-like model is proposed that is suitable for speech emotion recognition tasks.T... Because of the excellent performance of Transformer in sequence learning tasks,such as natural language processing,an improved Transformer-like model is proposed that is suitable for speech emotion recognition tasks.To alleviate the prohibitive time consumption and memory footprint caused by softmax inside the multihead attention unit in Transformer,a new linear self-attention algorithm is proposed.The original exponential function is replaced by a Taylor series expansion formula.On the basis of the associative property of matrix products,the time and space complexity of softmax operation regarding the input's length is reduced from O(N2)to O(N),where N is the sequence length.Experimental results on the emotional corpora of two languages show that the proposed linear attention algorithm can achieve similar performance to the original scaled dot product attention,while the training time and memory cost are reduced by half.Furthermore,the improved model obtains more robust performance on speech emotion recognition compared with the original Transformer. 展开更多
关键词 TRANSFORMER attention mechanism speech emotion recognition fast softmax
下载PDF
Speech Recognition-Based Automated Visual Acuity Testing with Adaptive Mel Filter Bank 被引量:2
20
作者 Shibli Nisar Muhammad Asghar Khan +3 位作者 Fahad Algarni Abdul Wakeel M.Irfan Uddin Insaf Ullah 《Computers, Materials & Continua》 SCIE EI 2022年第2期2991-3004,共14页
One of the most commonly reported disabilities is vision loss,which can be diagnosed by an ophthalmologist in order to determine the visual system of a patient.This procedure,however,usually requires an appointment wi... One of the most commonly reported disabilities is vision loss,which can be diagnosed by an ophthalmologist in order to determine the visual system of a patient.This procedure,however,usually requires an appointment with an ophthalmologist,which is both time-consuming and expensive process.Other issues that can arise include a lack of appropriate equipment and trained practitioners,especially in rural areas.Centered on a cognitively motivated attribute extraction and speech recognition approach,this paper proposes a novel idea that immediately determines the eyesight deficiency.The proposed system uses an adaptive filter bank with weighted mel frequency cepstral coefficients for feature extraction.The adaptive filter bank implementation is inspired by the principle of spectrum sensing in cognitive radio that is aware of its environment and adapts to statistical variations in the input stimuli by learning from the environment.Comparative performance evaluation demonstrates the potential of our automated visual acuity test method to achieve comparable results to the clinical ground truth,established by the expert ophthalmologist’s tests.The overall accuracy achieved by the proposed model when compared with the expert ophthalmologist test is 91.875%.The proposed method potentially offers a second opinion to ophthalmologists,and serves as a cost-effective pre-screening test to predict eyesight loss at an early stage. 展开更多
关键词 Eyesight test speech recognition HMM SVM feature extraction
下载PDF
上一页 1 2 8 下一页 到第
使用帮助 返回顶部