期刊文献+
共找到20,593篇文章
< 1 2 250 >
每页显示 20 50 100
Audio-Text Multimodal Speech Recognition via Dual-Tower Architecture for Mandarin Air Traffic Control Communications
1
作者 Shuting Ge Jin Ren +3 位作者 Yihua Shi Yujun Zhang Shunzhi Yang Jinfeng Yang 《Computers, Materials & Continua》 SCIE EI 2024年第3期3215-3245,共31页
In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a p... In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues, we propose a speech-text multimodal dual-tower architecture for speech recognition. It employs cross-modal interactions to achieve close semantic alignment during the encoding stage and strengthen its capabilities in modeling auditory long-distance context dependencies. In addition, a two-stage training strategy is elaborately devised to derive semantics-aware acoustic representations effectively. The first stage focuses on pre-training the speech-text multimodal encoding module to enhance inter-modal semantic alignment and aural long-distance context dependencies. The second stage fine-tunes the entire network to bridge the input modality variation gap between the training and inference phases and boost generalization performance. Extensive experiments demonstrate the effectiveness of the proposed speech-text multimodal speech recognition method on the ATCC and AISHELL-1 datasets. It reduces the character error rate to 6.54% and 8.73%, respectively, and exhibits substantial performance gains of 28.76% and 23.82% compared with the best baseline model. The case studies indicate that the obtained semantics-aware acoustic representations aid in accurately recognizing terms with similar pronunciations but distinctive semantics. The research provides a novel modeling paradigm for semantics-aware speech recognition in air traffic control communications, which could contribute to the advancement of intelligent and efficient aviation safety management. 展开更多
关键词 speech-text multimodal automatic speech recognition semantic alignment air traffic control communications dual-tower architecture
下载PDF
Comparing Fine-Tuning, Zero and Few-Shot Strategies with Large Language Models in Hate Speech Detection in English
2
作者 Ronghao Pan JoséAntonio García-Díaz Rafael Valencia-García 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第9期2849-2868,共20页
Large Language Models(LLMs)are increasingly demonstrating their ability to understand natural language and solve complex tasks,especially through text generation.One of the relevant capabilities is contextual learning... Large Language Models(LLMs)are increasingly demonstrating their ability to understand natural language and solve complex tasks,especially through text generation.One of the relevant capabilities is contextual learning,which involves the ability to receive instructions in natural language or task demonstrations to generate expected outputs for test instances without the need for additional training or gradient updates.In recent years,the popularity of social networking has provided a medium through which some users can engage in offensive and harmful online behavior.In this study,we investigate the ability of different LLMs,ranging from zero-shot and few-shot learning to fine-tuning.Our experiments show that LLMs can identify sexist and hateful online texts using zero-shot and few-shot approaches through information retrieval.Furthermore,it is found that the encoder-decoder model called Zephyr achieves the best results with the fine-tuning approach,scoring 86.811%on the Explainable Detection of Online Sexism(EDOS)test-set and 57.453%on the Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter(HatEval)test-set.Finally,it is confirmed that the evaluated models perform well in hate text detection,as they beat the best result in the HatEval task leaderboard.The error analysis shows that contextual learning had difficulty distinguishing between types of hate speech and figurative language.However,the fine-tuned approach tends to produce many false positives. 展开更多
关键词 Hate speech detection zero-shot few-shot fine-tuning natural language processing
下载PDF
Multi-Objective Equilibrium Optimizer for Feature Selection in High-Dimensional English Speech Emotion Recognition
3
作者 Liya Yue Pei Hu +1 位作者 Shu-Chuan Chu Jeng-Shyang Pan 《Computers, Materials & Continua》 SCIE EI 2024年第2期1957-1975,共19页
Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is ext... Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is extremely high,so we introduce a hybrid filter-wrapper feature selection algorithm based on an improved equilibrium optimizer for constructing an emotion recognition system.The proposed algorithm implements multi-objective emotion recognition with the minimum number of selected features and maximum accuracy.First,we use the information gain and Fisher Score to sort the features extracted from signals.Then,we employ a multi-objective ranking method to evaluate these features and assign different importance to them.Features with high rankings have a large probability of being selected.Finally,we propose a repair strategy to address the problem of duplicate solutions in multi-objective feature selection,which can improve the diversity of solutions and avoid falling into local traps.Using random forest and K-nearest neighbor classifiers,four English speech emotion datasets are employed to test the proposed algorithm(MBEO)as well as other multi-objective emotion identification techniques.The results illustrate that it performs well in inverted generational distance,hypervolume,Pareto solutions,and execution time,and MBEO is appropriate for high-dimensional English SER. 展开更多
关键词 speech emotion recognition filter-wrapper HIGH-DIMENSIONAL feature selection equilibrium optimizer MULTI-OBJECTIVE
下载PDF
Exploring Sequential Feature Selection in Deep Bi-LSTM Models for Speech Emotion Recognition
4
作者 Fatma Harby Mansor Alohali +1 位作者 Adel Thaljaoui Amira Samy Talaat 《Computers, Materials & Continua》 SCIE EI 2024年第2期2689-2719,共31页
Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotiona... Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotional states of speakers holds significant importance in a range of real-time applications,including but not limited to virtual reality,human-robot interaction,emergency centers,and human behavior assessment.Accurately identifying emotions in the SER process relies on extracting relevant information from audio inputs.Previous studies on SER have predominantly utilized short-time characteristics such as Mel Frequency Cepstral Coefficients(MFCCs)due to their ability to capture the periodic nature of audio signals effectively.Although these traits may improve their ability to perceive and interpret emotional depictions appropriately,MFCCS has some limitations.So this study aims to tackle the aforementioned issue by systematically picking multiple audio cues,enhancing the classifier model’s efficacy in accurately discerning human emotions.The utilized dataset is taken from the EMO-DB database,preprocessing input speech is done using a 2D Convolution Neural Network(CNN)involves applying convolutional operations to spectrograms as they afford a visual representation of the way the audio signal frequency content changes over time.The next step is the spectrogram data normalization which is crucial for Neural Network(NN)training as it aids in faster convergence.Then the five auditory features MFCCs,Chroma,Mel-Spectrogram,Contrast,and Tonnetz are extracted from the spectrogram sequentially.The attitude of feature selection is to retain only dominant features by excluding the irrelevant ones.In this paper,the Sequential Forward Selection(SFS)and Sequential Backward Selection(SBS)techniques were employed for multiple audio cues features selection.Finally,the feature sets composed from the hybrid feature extraction methods are fed into the deep Bidirectional Long Short Term Memory(Bi-LSTM)network to discern emotions.Since the deep Bi-LSTM can hierarchically learn complex features and increases model capacity by achieving more robust temporal modeling,it is more effective than a shallow Bi-LSTM in capturing the intricate tones of emotional content existent in speech signals.The effectiveness and resilience of the proposed SER model were evaluated by experiments,comparing it to state-of-the-art SER techniques.The results indicated that the model achieved accuracy rates of 90.92%,93%,and 92%over the Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS),Berlin Database of Emotional Speech(EMO-DB),and The Interactive Emotional Dyadic Motion Capture(IEMOCAP)datasets,respectively.These findings signify a prominent enhancement in the ability to emotional depictions identification in speech,showcasing the potential of the proposed model in advancing the SER field. 展开更多
关键词 Artificial intelligence application multi features sequential selection speech emotion recognition deep Bi-LSTM
下载PDF
An Adaptive Hate Speech Detection Approach Using Neutrosophic Neural Networks for Social Media Forensics
5
作者 Yasmine M.Ibrahim Reem Essameldin Saad M.Darwish 《Computers, Materials & Continua》 SCIE EI 2024年第4期243-262,共20页
Detecting hate speech automatically in social media forensics has emerged as a highly challenging task due tothe complex nature of language used in such platforms. Currently, several methods exist for classifying hate... Detecting hate speech automatically in social media forensics has emerged as a highly challenging task due tothe complex nature of language used in such platforms. Currently, several methods exist for classifying hatespeech, but they still suffer from ambiguity when differentiating between hateful and offensive content and theyalso lack accuracy. The work suggested in this paper uses a combination of the Whale Optimization Algorithm(WOA) and Particle Swarm Optimization (PSO) to adjust the weights of two Multi-Layer Perceptron (MLPs)for neutrosophic sets classification. During the training process of the MLP, the WOA is employed to exploreand determine the optimal set of weights. The PSO algorithm adjusts the weights to optimize the performanceof the MLP as fine-tuning. Additionally, in this approach, two separate MLP models are employed. One MLPis dedicated to predicting degrees of truth membership, while the other MLP focuses on predicting degrees offalse membership. The difference between these memberships quantifies uncertainty, indicating the degree ofindeterminacy in predictions. The experimental results indicate the superior performance of our model comparedto previous work when evaluated on the Davidson dataset. 展开更多
关键词 Hate speech detection whale optimization neutrosophic sets social media forensics
下载PDF
Analysis on the Translation Methods of the Reported Speech in German Academic Papers-Taking the Translation of“Die Internationalisierung der deutschen Hochschulen”as an Example
6
作者 WANG Rui CHEN Qi 《Journal of Literature and Art Studies》 2024年第9期802-807,共6页
Reporting is essential in language use,including the re-expression of other people’s or self’s words,opinions,psychological activities,etc.Grasping the translation methods of reported speech in German academic paper... Reporting is essential in language use,including the re-expression of other people’s or self’s words,opinions,psychological activities,etc.Grasping the translation methods of reported speech in German academic papers is very important to improve the accuracy of academic paper translation.This study takes the translation of“Internationalization of German Universities”(Die Internationalisierung der deutschen Hochschulen),an academic paper of higher education,as an example to explore the translation methods of reported speech in German academic papers.It is found that the use of word order conversion,part of speech conversion and split translation methods can make the translation more accurate and fluent.This paper helps to grasp the rules and characteristics of the translation of reported speech in German academic papers,and also provides a reference for improving the quality of German-Chinese translation. 展开更多
关键词 academic paper reported speech TRANSLATION
下载PDF
Chaotic Elephant Herd Optimization with Machine Learning for Arabic Hate Speech Detection
7
作者 Badriyya B.Al-onazi Jaber S.Alzahrani +5 位作者 Najm Alotaibi Hussain Alshahrani Mohamed Ahmed Elfaki Radwa Marzouk Heba Mohsen Abdelwahed Motwakel 《Intelligent Automation & Soft Computing》 2024年第3期567-583,共17页
In recent years,the usage of social networking sites has considerably increased in the Arab world.It has empowered individuals to express their opinions,especially in politics.Furthermore,various organizations that op... In recent years,the usage of social networking sites has considerably increased in the Arab world.It has empowered individuals to express their opinions,especially in politics.Furthermore,various organizations that operate in the Arab countries have embraced social media in their day-to-day business activities at different scales.This is attributed to business owners’understanding of social media’s importance for business development.However,the Arabic morphology is too complicated to understand due to the availability of nearly 10,000 roots and more than 900 patterns that act as the basis for verbs and nouns.Hate speech over online social networking sites turns out to be a worldwide issue that reduces the cohesion of civil societies.In this background,the current study develops a Chaotic Elephant Herd Optimization with Machine Learning for Hate Speech Detection(CEHOML-HSD)model in the context of the Arabic language.The presented CEHOML-HSD model majorly concentrates on identifying and categorising the Arabic text into hate speech and normal.To attain this,the CEHOML-HSD model follows different sub-processes as discussed herewith.At the initial stage,the CEHOML-HSD model undergoes data pre-processing with the help of the TF-IDF vectorizer.Secondly,the Support Vector Machine(SVM)model is utilized to detect and classify the hate speech texts made in the Arabic language.Lastly,the CEHO approach is employed for fine-tuning the parameters involved in SVM.This CEHO approach is developed by combining the chaotic functions with the classical EHO algorithm.The design of the CEHO algorithm for parameter tuning shows the novelty of the work.A widespread experimental analysis was executed to validate the enhanced performance of the proposed CEHOML-HSD approach.The comparative study outcomes established the supremacy of the proposed CEHOML-HSD model over other approaches. 展开更多
关键词 Arabic language machine learning elephant herd optimization TF-IDF vectorizer hate speech detection
下载PDF
A Review of Indirect Speech Acts in Speech Act Theory
8
作者 张焕芹 《海外英语》 2018年第11期234-235,共2页
The theory of indirect speech acts proposed by John Searle is a problematic issue in speech act theory. The theory is subject to various criticisms.This essay reviews various arguments and the significant problems wit... The theory of indirect speech acts proposed by John Searle is a problematic issue in speech act theory. The theory is subject to various criticisms.This essay reviews various arguments and the significant problems with reference to the indirect speech acts. The review includes some important concepts of speech act theory which are related to indirect speech acts; inference theory and idiom theory which underline indirect speech acts and their major problems in accounting for indirect speech acts. 展开更多
关键词 speech act indirect speech speech act theory
下载PDF
Research on the Application of Second Language Acquisition Theory in College English Speech Teaching
9
作者 Hui Zhang 《Journal of Contemporary Educational Research》 2024年第3期173-178,共6页
The teaching of English speeches in universities aims to enhance oral communication ability,improve English communication skills,and expand English knowledge,occupying a core position in English teaching in universiti... The teaching of English speeches in universities aims to enhance oral communication ability,improve English communication skills,and expand English knowledge,occupying a core position in English teaching in universities.This article takes the theory of second language acquisition as the background,analyzes the important role and value of this theory in English speech teaching in universities,and explores how to apply the theory of second language acquisition in English speech teaching in universities.It aims to strengthen the cultivation of English skilled talents and provide a brief reference for improving English speech teaching in universities. 展开更多
关键词 Second language acquisition theory Teaching English speeches in universities Practical strategies
下载PDF
Speech enhancement based on leakage constraints DF-GSC 被引量:1
10
作者 邹采荣 陈国明 赵力 《Journal of Southeast University(English Edition)》 EI CAS 2007年第4期507-511,共5页
In order to improve the performance of general sidelobe canceller (GSC) based speech enhancement, a leakage constraints decision feedback generalized sidelobe canceller(LCDF-GSC) algorithm is proposed. The method ... In order to improve the performance of general sidelobe canceller (GSC) based speech enhancement, a leakage constraints decision feedback generalized sidelobe canceller(LCDF-GSC) algorithm is proposed. The method adopts DF-GSC against signal mismatch, and introduces a leakage factor in the cost function to deal with the speech leakage problem which is caused by the part of the speech signal in the noise reference signal. Simulation results show that although the signal-to-noise ratio (SNR) of the speech signal through LCDF-GSC is slightly less than that of DF-GSC, the IS measurements show that the distortion of the former is less than that of the latter. MOS (mean opinion score) scores also indicate that the LCDF-GSC algorithm is better than DF- GSC and the Weiner filter algorithm, 展开更多
关键词 speech enhancement general sidelobe canceller (GSC) speech leakage
下载PDF
Audiovisual speech recognition based on a deep convolutional neural network
11
作者 Shashidhar Rudregowda Sudarshan Patilkulkarni +2 位作者 Vinayakumar Ravi Gururaj H.L. Moez Krichen 《Data Science and Management》 2024年第1期25-34,共10页
Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for India... Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for Indian English linguistics and categorized it into three main categories:(1)audio recognition,(2)visual feature extraction,and(3)combined audio and visual recognition.Audio features were extracted using the mel-frequency cepstral coefficient,and classification was performed using a one-dimension convolutional neural network.Visual feature extraction uses Dlib and then classifies visual speech using a long short-term memory type of recurrent neural networks.Finally,integration was performed using a deep convolutional network.The audio speech of Indian English was successfully recognized with accuracies of 93.67%and 91.53%,respectively,using testing data from 200 epochs.The training accuracy for visual speech recognition using the Indian English dataset was 77.48%and the test accuracy was 76.19%using 60 epochs.After integration,the accuracies of audiovisual speech recognition using the Indian English dataset for training and testing were 94.67%and 91.75%,respectively. 展开更多
关键词 Audiovisual speech recognition Custom dataset 1D Convolution neural network(CNN) Deep CNN(DCNN) Long short-term memory(LSTM) LIPREADING Dlib Mel-frequency cepstral coefficient(MFCC)
下载PDF
Speech emotion recognition using semi-supervised discriminant analysis
12
作者 徐新洲 黄程韦 +2 位作者 金赟 吴尘 赵力 《Journal of Southeast University(English Edition)》 EI CAS 2014年第1期7-12,共6页
Semi-supervised discriminant analysis SDA which uses a combination of multiple embedding graphs and kernel SDA KSDA are adopted in supervised speech emotion recognition.When the emotional factors of speech signal samp... Semi-supervised discriminant analysis SDA which uses a combination of multiple embedding graphs and kernel SDA KSDA are adopted in supervised speech emotion recognition.When the emotional factors of speech signal samples are preprocessed different categories of features including pitch zero-cross rate energy durance formant and Mel frequency cepstrum coefficient MFCC as well as their statistical parameters are extracted from the utterances of samples.In the dimensionality reduction stage before the feature vectors are sent into classifiers parameter-optimized SDA and KSDA are performed to reduce dimensionality.Experiments on the Berlin speech emotion database show that SDA for supervised speech emotion recognition outperforms some other state-of-the-art dimensionality reduction methods based on spectral graph learning such as linear discriminant analysis LDA locality preserving projections LPP marginal Fisher analysis MFA etc. when multi-class support vector machine SVM classifiers are used.Additionally KSDA can achieve better recognition performance based on kernelized data mapping compared with the above methods including SDA. 展开更多
关键词 speech emotion RECOGNITION speech emotion feature semi-supervised discriminant analysis dimensionality reduction
下载PDF
AN ANALYSIS OF ACOUSTIC CHARACTERISTICS OFCLEFT PALATE SPEECH WITH COMPUTERIZED SPEECH SIGNAL PROCESSING SYSTEM 被引量:1
13
作者 李锦峰 刘建华 《Journal of Pharmaceutical Analysis》 CAS 1996年第2期162-165,共4页
The acoustic characteristics or the chinese vowels of 24 children with cleft palate and 10 normal control children were analyzed by computerized speech signal processing system (CSSPS),and the speech articulation was ... The acoustic characteristics or the chinese vowels of 24 children with cleft palate and 10 normal control children were analyzed by computerized speech signal processing system (CSSPS),and the speech articulation was judged with Glossary of clert palate speech(GCPS).The listening judgement showed that the speech articulation was significantly different between the two groups(P<0.01).The objective quantitative measurement suggested that the formant pattern(FP)of vowels in children with cleft palate was different from that of normal control children except vowel[a](P< 0.05).The acoustic vowelgraph or the Chinese vowels which demonstrated directly the relationship of vocal space and speech perception was stated with the first formant frequence(F1)and the second formant frequence(F2).The authors conclude that the values or F1 and F2 point out the upward and backward tongue movement to close the clert, which reflects the vocal characteristics of trausmission of clert palate speech. 展开更多
关键词 cleft palate speech the Chinese vowels the formant pattern the speech articulation computerized speech singnal processing system
下载PDF
Speech Recognition via CTC-CNN Model
14
作者 Wen-Tsai Sung Hao-WeiKang Sung-Jung Hsiao 《Computers, Materials & Continua》 SCIE EI 2023年第9期3833-3858,共26页
In the speech recognition system,the acoustic model is an important underlying model,and its accuracy directly affects the performance of the entire system.This paper introduces the construction and training process o... In the speech recognition system,the acoustic model is an important underlying model,and its accuracy directly affects the performance of the entire system.This paper introduces the construction and training process of the acoustic model in detail and studies the Connectionist temporal classification(CTC)algorithm,which plays an important role in the end-to-end framework,established a convolutional neural network(CNN)combined with an acoustic model of Connectionist temporal classification to improve the accuracy of speech recognition.This study uses a sound sensor,ReSpeakerMic Array v2.0.1,to convert the collected speech signals into text or corresponding speech signals to improve communication and reduce noise and hardware interference.The baseline acousticmodel in this study faces challenges such as long training time,high error rate,and a certain degree of overfitting.The model is trained through continuous design and improvement of the relevant parameters of the acousticmodel,and finally the performance is selected according to the evaluation index.Excellentmodel,which reduces the error rate to about 18%,thus improving the accuracy rate.Finally,comparative verificationwas carried out from the selection of acoustic feature parameters,the selection of modeling units,and the speaker’s speech rate,which further verified the excellent performance of the CTCCNN_5+BN+Residual model structure.In terms of experiments,to train and verify the CTC-CNN baseline acoustic model,this study uses THCHS-30 and ST-CMDS speech data sets as training data sets,and after 54 epochs of training,the word error rate of the acoustic model training set is 31%,the word error rate of the test set is stable at about 43%.This experiment also considers the surrounding environmental noise.Under the noise level of 80∼90 dB,the accuracy rate is 88.18%,which is the worst performance among all levels.In contrast,at 40–60 dB,the accuracy was as high as 97.33%due to less noise pollution. 展开更多
关键词 Artificial intelligence speech recognition speech to text convolutional neural network automatic speech recognition
下载PDF
Emotional Vietnamese Speech Synthesis Using Style-Transfer Learning
15
作者 Thanh X.Le An T.Le Quang H.Nguyen 《Computer Systems Science & Engineering》 SCIE EI 2023年第2期1263-1278,共16页
In recent years,speech synthesis systems have allowed for the produc-tion of very high-quality voices.Therefore,research in this domain is now turning to the problem of integrating emotions into speech.However,the met... In recent years,speech synthesis systems have allowed for the produc-tion of very high-quality voices.Therefore,research in this domain is now turning to the problem of integrating emotions into speech.However,the method of con-structing a speech synthesizer for each emotion has some limitations.First,this method often requires an emotional-speech data set with many sentences.Such data sets are very time-intensive and labor-intensive to complete.Second,training each of these models requires computers with large computational capabilities and a lot of effort and time for model tuning.In addition,each model for each emotion failed to take advantage of data sets of other emotions.In this paper,we propose a new method to synthesize emotional speech in which the latent expressions of emotions are learned from a small data set of professional actors through a Flow-tron model.In addition,we provide a new method to build a speech corpus that is scalable and whose quality is easy to control.Next,to produce a high-quality speech synthesis model,we used this data set to train the Tacotron 2 model.We used it as a pre-trained model to train the Flowtron model.We applied this method to synthesize Vietnamese speech with sadness and happiness.Mean opi-nion score(MOS)assessment results show that MOS is 3.61 for sadness and 3.95 for happiness.In conclusion,the proposed method proves to be more effec-tive for a high degree of automation and fast emotional sentence generation,using a small emotional-speech data set. 展开更多
关键词 Emotional speech synthesis flowtron speech synthesis style transfer vietnamese speech
下载PDF
Support vector machines for emotion recognition in Chinese speech 被引量:8
16
作者 王治平 赵力 邹采荣 《Journal of Southeast University(English Edition)》 EI CAS 2003年第4期307-310,共4页
Support vector machines (SVMs) are utilized for emotion recognition in Chinese speech in this paper. Both binary class discrimination and the multi class discrimination are discussed. It proves that the emotional fe... Support vector machines (SVMs) are utilized for emotion recognition in Chinese speech in this paper. Both binary class discrimination and the multi class discrimination are discussed. It proves that the emotional features construct a nonlinear problem in the input space, and SVMs based on nonlinear mapping can solve it more effectively than other linear methods. Multi class classification based on SVMs with a soft decision function is constructed to classify the four emotion situations. Compared with principal component analysis (PCA) method and modified PCA method, SVMs perform the best result in multi class discrimination by using nonlinear kernel mapping. 展开更多
关键词 speech signal emotion recognition support vector machines
下载PDF
A novel speech emotion recognition algorithm based on combination of emotion data field and ant colony search strategy 被引量:3
17
作者 查诚 陶华伟 +3 位作者 张昕然 周琳 赵力 杨平 《Journal of Southeast University(English Edition)》 EI CAS 2016年第2期158-163,共6页
In order to effectively conduct emotion recognition from spontaneous, non-prototypical and unsegmented speech so as to create a more natural human-machine interaction; a novel speech emotion recognition algorithm base... In order to effectively conduct emotion recognition from spontaneous, non-prototypical and unsegmented speech so as to create a more natural human-machine interaction; a novel speech emotion recognition algorithm based on the combination of the emotional data field (EDF) and the ant colony search (ACS) strategy, called the EDF-ACS algorithm, is proposed. More specifically, the inter- relationship among the turn-based acoustic feature vectors of different labels are established by using the potential function in the EDF. To perform the spontaneous speech emotion recognition, the artificial colony is used to mimic the turn- based acoustic feature vectors. Then, the canonical ACS strategy is used to investigate the movement direction of each artificial ant in the EDF, which is regarded as the emotional label of the corresponding turn-based acoustic feature vector. The proposed EDF-ACS algorithm is evaluated on the continueous audio)'visual emotion challenge (AVEC) 2012 dataset, which contains the spontaneous, non-prototypical and unsegmented speech emotion data. The experimental results show that the proposed EDF-ACS algorithm outperforms the existing state-of-the-art algorithm in turn-based speech emotion recognition. 展开更多
关键词 speech emotion recognition emotional data field ant colony search human-machine interaction
下载PDF
基于Speech SDK的语音应用程序实现 被引量:11
18
作者 高敬惠 姜子敬 胡金铭 《广西科学院学报》 2005年第3期169-172,共4页
利用MicrosoftSpeechSDK的APIforText-to-Speech和APIforSpeechRecognition,采用VisualBa-sic6.0语言,建立文本语音转换应用程序和实现语音识别程序,简单地实现了语音识别的功能,识别出来的内容即可保存为文件,也可作为命令使用,让计算... 利用MicrosoftSpeechSDK的APIforText-to-Speech和APIforSpeechRecognition,采用VisualBa-sic6.0语言,建立文本语音转换应用程序和实现语音识别程序,简单地实现了语音识别的功能,识别出来的内容即可保存为文件,也可作为命令使用,让计算机执行某项操作。 展开更多
关键词 应用程序 文本语音转换 语音识别 MICROSOFT speech SDK
下载PDF
基于Speech SDK的机器人语音交互系统设计 被引量:8
19
作者 陈景帅 周风余 《北京联合大学学报》 CAS 2010年第1期25-29,共5页
介绍了一种基于Microsoft Speech SDK5.1的机器人语音交互系统,利用Speech SDK5.1提供的应用程序编程接口SAPI进行语音识别,对识别结果在逻辑程序中处理,使用Inter-phonic5.0语音合成技术替代TTS技术来合成语音,实现了AHRR-I接待机器人... 介绍了一种基于Microsoft Speech SDK5.1的机器人语音交互系统,利用Speech SDK5.1提供的应用程序编程接口SAPI进行语音识别,对识别结果在逻辑程序中处理,使用Inter-phonic5.0语音合成技术替代TTS技术来合成语音,实现了AHRR-I接待机器人的语音对话和语音控制。 展开更多
关键词 接待机器人 speech SDK 语音识别 语音控制 SAPI
下载PDF
Speech-ABR安静及噪声环境下音位的对比研究 被引量:6
20
作者 王倩 王燕 刘志成 《中华耳科学杂志》 CSCD 北大核心 2016年第5期634-638,共5页
目的对比speech-ABR在安静及噪声环境下单音节声母、韵母及声调的变化,研究噪声对单音节音位的影响。方法招募正常听力受试者40例(男20例,女20例),母语为汉语普通话。Speech-ABR刺激声为260ms时程的合成言语声/mi/,声调为三声,刺激强度... 目的对比speech-ABR在安静及噪声环境下单音节声母、韵母及声调的变化,研究噪声对单音节音位的影响。方法招募正常听力受试者40例(男20例,女20例),母语为汉语普通话。Speech-ABR刺激声为260ms时程的合成言语声/mi/,声调为三声,刺激强度为70d B SPL,记录右耳安静状态下及噪声状态下(信噪比SNR=-10d B)speech-ABR的反应波形。对比起始反应波形(onset response,OR)、过渡反应波形(consonant-to-vowel transition)及频率跟随反应波形(frequency following response,FFR)的潜伏期的变化。并对比安静及噪声状态下声调追踪(pitch tracking)相关系数r的变化。使用SPSS18.0软件进行数据统计分析,数据采用配对t检验分析两组的差异,P<0.05时为差异有统计学意义。结果260ms时程/mi/诱发的言语听性脑干反应波形特征,主要由潜伏期为10ms内的起始反应、潜伏期为80-220ms内的频率跟随反应及最后的终止反应组成,以及潜伏期在10-80ms内的辅音-元音过渡反应。其中起始反应部分为辅音部分所诱发;过渡反应部分为辅-元音的过渡信息诱发;由/mi/中的元音部分所诱发的频率跟随反应部分共由15个波形组成。经配对t检验分析,在安静及噪声环境下进行对比,起始反应峰值(辅音部分)平均潜伏期延长0.85±0.17ms(P=0.000)。过度反应峰值平均潜伏期延长0.75±0.15ms((P=0.000)。频率跟随反应峰值平均潜伏期延长0.38±0.10ms(P=0.000),结果均具有统计学意义。安静环境下声调追踪反应相关系数r均值为0.84±0.08,噪声环境下相关系数r均值为0.74±0.12,两者对比结果具有统计学意义((P=0.000)。结论在噪声环境下,测试音的辅音、元音对应波形潜伏期均发生变化,声调追踪系数会有所下降,提示三种音位均会受到噪声的影响。与以往主观的言语识别率测试方式及诱发电位测试相比,speech-ABR是一种客观方式评估言语声受到噪声干扰情况的测试方法。 展开更多
关键词 speech-ABR 言语噪声 单音节
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部