期刊文献+
共找到20,711篇文章
< 1 2 250 >
每页显示 20 50 100
Using vector Taylor series with noise clustering for speech recognition in non-stationary noisy environments
1
作者 赵贤宇 Ou Zhijian Wang Zuoying 《High Technology Letters》 EI CAS 2006年第1期18-23,共6页
The performance of automatic speech recognizer degrades seriously when there are mismatches between the training and testing conditions. Vector Taylor Series (VTS) approach has been used to compensate mismatches cau... The performance of automatic speech recognizer degrades seriously when there are mismatches between the training and testing conditions. Vector Taylor Series (VTS) approach has been used to compensate mismatches caused by additive noise and convolutive channel distortion in the cepstral domain, in this paper, the conventional VTS is extended by incorporating noise clustering into its EM iteration procedure, improving its compensation effectiveness under non-stationary noisy environments. Recognition experiments under babble and exhibition noisy environments demonstrate that the new algorithm achieves 35% average error rate reduction compared with the conventional VTS. 展开更多
关键词 speech recognition ROBUSTNESS model adaptation CLUSTERING
下载PDF
Comparing Fine-Tuning, Zero and Few-Shot Strategies with Large Language Models in Hate Speech Detection in English
2
作者 Ronghao Pan JoséAntonio García-Díaz Rafael Valencia-García 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第9期2849-2868,共20页
Large Language Models(LLMs)are increasingly demonstrating their ability to understand natural language and solve complex tasks,especially through text generation.One of the relevant capabilities is contextual learning... Large Language Models(LLMs)are increasingly demonstrating their ability to understand natural language and solve complex tasks,especially through text generation.One of the relevant capabilities is contextual learning,which involves the ability to receive instructions in natural language or task demonstrations to generate expected outputs for test instances without the need for additional training or gradient updates.In recent years,the popularity of social networking has provided a medium through which some users can engage in offensive and harmful online behavior.In this study,we investigate the ability of different LLMs,ranging from zero-shot and few-shot learning to fine-tuning.Our experiments show that LLMs can identify sexist and hateful online texts using zero-shot and few-shot approaches through information retrieval.Furthermore,it is found that the encoder-decoder model called Zephyr achieves the best results with the fine-tuning approach,scoring 86.811%on the Explainable Detection of Online Sexism(EDOS)test-set and 57.453%on the Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter(HatEval)test-set.Finally,it is confirmed that the evaluated models perform well in hate text detection,as they beat the best result in the HatEval task leaderboard.The error analysis shows that contextual learning had difficulty distinguishing between types of hate speech and figurative language.However,the fine-tuned approach tends to produce many false positives. 展开更多
关键词 Hate speech detection zero-shot few-shot fine-tuning natural language processing
下载PDF
Multi-Objective Equilibrium Optimizer for Feature Selection in High-Dimensional English Speech Emotion Recognition
3
作者 Liya Yue Pei Hu +1 位作者 Shu-Chuan Chu Jeng-Shyang Pan 《Computers, Materials & Continua》 SCIE EI 2024年第2期1957-1975,共19页
Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is ext... Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is extremely high,so we introduce a hybrid filter-wrapper feature selection algorithm based on an improved equilibrium optimizer for constructing an emotion recognition system.The proposed algorithm implements multi-objective emotion recognition with the minimum number of selected features and maximum accuracy.First,we use the information gain and Fisher Score to sort the features extracted from signals.Then,we employ a multi-objective ranking method to evaluate these features and assign different importance to them.Features with high rankings have a large probability of being selected.Finally,we propose a repair strategy to address the problem of duplicate solutions in multi-objective feature selection,which can improve the diversity of solutions and avoid falling into local traps.Using random forest and K-nearest neighbor classifiers,four English speech emotion datasets are employed to test the proposed algorithm(MBEO)as well as other multi-objective emotion identification techniques.The results illustrate that it performs well in inverted generational distance,hypervolume,Pareto solutions,and execution time,and MBEO is appropriate for high-dimensional English SER. 展开更多
关键词 speech emotion recognition filter-wrapper HIGH-DIMENSIONAL feature selection equilibrium optimizer MULTI-OBJECTIVE
下载PDF
An Adaptive Hate Speech Detection Approach Using Neutrosophic Neural Networks for Social Media Forensics
4
作者 Yasmine M.Ibrahim Reem Essameldin Saad M.Darwish 《Computers, Materials & Continua》 SCIE EI 2024年第4期243-262,共20页
Detecting hate speech automatically in social media forensics has emerged as a highly challenging task due tothe complex nature of language used in such platforms. Currently, several methods exist for classifying hate... Detecting hate speech automatically in social media forensics has emerged as a highly challenging task due tothe complex nature of language used in such platforms. Currently, several methods exist for classifying hatespeech, but they still suffer from ambiguity when differentiating between hateful and offensive content and theyalso lack accuracy. The work suggested in this paper uses a combination of the Whale Optimization Algorithm(WOA) and Particle Swarm Optimization (PSO) to adjust the weights of two Multi-Layer Perceptron (MLPs)for neutrosophic sets classification. During the training process of the MLP, the WOA is employed to exploreand determine the optimal set of weights. The PSO algorithm adjusts the weights to optimize the performanceof the MLP as fine-tuning. Additionally, in this approach, two separate MLP models are employed. One MLPis dedicated to predicting degrees of truth membership, while the other MLP focuses on predicting degrees offalse membership. The difference between these memberships quantifies uncertainty, indicating the degree ofindeterminacy in predictions. The experimental results indicate the superior performance of our model comparedto previous work when evaluated on the Davidson dataset. 展开更多
关键词 Hate speech detection whale optimization neutrosophic sets social media forensics
下载PDF
Exploring Sequential Feature Selection in Deep Bi-LSTM Models for Speech Emotion Recognition
5
作者 Fatma Harby Mansor Alohali +1 位作者 Adel Thaljaoui Amira Samy Talaat 《Computers, Materials & Continua》 SCIE EI 2024年第2期2689-2719,共31页
Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotiona... Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotional states of speakers holds significant importance in a range of real-time applications,including but not limited to virtual reality,human-robot interaction,emergency centers,and human behavior assessment.Accurately identifying emotions in the SER process relies on extracting relevant information from audio inputs.Previous studies on SER have predominantly utilized short-time characteristics such as Mel Frequency Cepstral Coefficients(MFCCs)due to their ability to capture the periodic nature of audio signals effectively.Although these traits may improve their ability to perceive and interpret emotional depictions appropriately,MFCCS has some limitations.So this study aims to tackle the aforementioned issue by systematically picking multiple audio cues,enhancing the classifier model’s efficacy in accurately discerning human emotions.The utilized dataset is taken from the EMO-DB database,preprocessing input speech is done using a 2D Convolution Neural Network(CNN)involves applying convolutional operations to spectrograms as they afford a visual representation of the way the audio signal frequency content changes over time.The next step is the spectrogram data normalization which is crucial for Neural Network(NN)training as it aids in faster convergence.Then the five auditory features MFCCs,Chroma,Mel-Spectrogram,Contrast,and Tonnetz are extracted from the spectrogram sequentially.The attitude of feature selection is to retain only dominant features by excluding the irrelevant ones.In this paper,the Sequential Forward Selection(SFS)and Sequential Backward Selection(SBS)techniques were employed for multiple audio cues features selection.Finally,the feature sets composed from the hybrid feature extraction methods are fed into the deep Bidirectional Long Short Term Memory(Bi-LSTM)network to discern emotions.Since the deep Bi-LSTM can hierarchically learn complex features and increases model capacity by achieving more robust temporal modeling,it is more effective than a shallow Bi-LSTM in capturing the intricate tones of emotional content existent in speech signals.The effectiveness and resilience of the proposed SER model were evaluated by experiments,comparing it to state-of-the-art SER techniques.The results indicated that the model achieved accuracy rates of 90.92%,93%,and 92%over the Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS),Berlin Database of Emotional Speech(EMO-DB),and The Interactive Emotional Dyadic Motion Capture(IEMOCAP)datasets,respectively.These findings signify a prominent enhancement in the ability to emotional depictions identification in speech,showcasing the potential of the proposed model in advancing the SER field. 展开更多
关键词 Artificial intelligence application multi features sequential selection speech emotion recognition deep Bi-LSTM
下载PDF
Audio-Text Multimodal Speech Recognition via Dual-Tower Architecture for Mandarin Air Traffic Control Communications
6
作者 Shuting Ge Jin Ren +3 位作者 Yihua Shi Yujun Zhang Shunzhi Yang Jinfeng Yang 《Computers, Materials & Continua》 SCIE EI 2024年第3期3215-3245,共31页
In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a p... In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues, we propose a speech-text multimodal dual-tower architecture for speech recognition. It employs cross-modal interactions to achieve close semantic alignment during the encoding stage and strengthen its capabilities in modeling auditory long-distance context dependencies. In addition, a two-stage training strategy is elaborately devised to derive semantics-aware acoustic representations effectively. The first stage focuses on pre-training the speech-text multimodal encoding module to enhance inter-modal semantic alignment and aural long-distance context dependencies. The second stage fine-tunes the entire network to bridge the input modality variation gap between the training and inference phases and boost generalization performance. Extensive experiments demonstrate the effectiveness of the proposed speech-text multimodal speech recognition method on the ATCC and AISHELL-1 datasets. It reduces the character error rate to 6.54% and 8.73%, respectively, and exhibits substantial performance gains of 28.76% and 23.82% compared with the best baseline model. The case studies indicate that the obtained semantics-aware acoustic representations aid in accurately recognizing terms with similar pronunciations but distinctive semantics. The research provides a novel modeling paradigm for semantics-aware speech recognition in air traffic control communications, which could contribute to the advancement of intelligent and efficient aviation safety management. 展开更多
关键词 speech-text multimodal automatic speech recognition semantic alignment air traffic control communications dual-tower architecture
下载PDF
Analysis on the Translation Methods of the Reported Speech in German Academic Papers-Taking the Translation of“Die Internationalisierung der deutschen Hochschulen”as an Example
7
作者 WANG Rui CHEN Qi 《Journal of Literature and Art Studies》 2024年第9期802-807,共6页
Reporting is essential in language use,including the re-expression of other people’s or self’s words,opinions,psychological activities,etc.Grasping the translation methods of reported speech in German academic paper... Reporting is essential in language use,including the re-expression of other people’s or self’s words,opinions,psychological activities,etc.Grasping the translation methods of reported speech in German academic papers is very important to improve the accuracy of academic paper translation.This study takes the translation of“Internationalization of German Universities”(Die Internationalisierung der deutschen Hochschulen),an academic paper of higher education,as an example to explore the translation methods of reported speech in German academic papers.It is found that the use of word order conversion,part of speech conversion and split translation methods can make the translation more accurate and fluent.This paper helps to grasp the rules and characteristics of the translation of reported speech in German academic papers,and also provides a reference for improving the quality of German-Chinese translation. 展开更多
关键词 academic paper reported speech TRANSLATION
下载PDF
Chaotic Elephant Herd Optimization with Machine Learning for Arabic Hate Speech Detection
8
作者 Badriyya B.Al-onazi Jaber S.Alzahrani +5 位作者 Najm Alotaibi Hussain Alshahrani Mohamed Ahmed Elfaki Radwa Marzouk Heba Mohsen Abdelwahed Motwakel 《Intelligent Automation & Soft Computing》 2024年第3期567-583,共17页
In recent years,the usage of social networking sites has considerably increased in the Arab world.It has empowered individuals to express their opinions,especially in politics.Furthermore,various organizations that op... In recent years,the usage of social networking sites has considerably increased in the Arab world.It has empowered individuals to express their opinions,especially in politics.Furthermore,various organizations that operate in the Arab countries have embraced social media in their day-to-day business activities at different scales.This is attributed to business owners’understanding of social media’s importance for business development.However,the Arabic morphology is too complicated to understand due to the availability of nearly 10,000 roots and more than 900 patterns that act as the basis for verbs and nouns.Hate speech over online social networking sites turns out to be a worldwide issue that reduces the cohesion of civil societies.In this background,the current study develops a Chaotic Elephant Herd Optimization with Machine Learning for Hate Speech Detection(CEHOML-HSD)model in the context of the Arabic language.The presented CEHOML-HSD model majorly concentrates on identifying and categorising the Arabic text into hate speech and normal.To attain this,the CEHOML-HSD model follows different sub-processes as discussed herewith.At the initial stage,the CEHOML-HSD model undergoes data pre-processing with the help of the TF-IDF vectorizer.Secondly,the Support Vector Machine(SVM)model is utilized to detect and classify the hate speech texts made in the Arabic language.Lastly,the CEHO approach is employed for fine-tuning the parameters involved in SVM.This CEHO approach is developed by combining the chaotic functions with the classical EHO algorithm.The design of the CEHO algorithm for parameter tuning shows the novelty of the work.A widespread experimental analysis was executed to validate the enhanced performance of the proposed CEHOML-HSD approach.The comparative study outcomes established the supremacy of the proposed CEHOML-HSD model over other approaches. 展开更多
关键词 Arabic language machine learning elephant herd optimization TF-IDF vectorizer hate speech detection
下载PDF
Research on the Application of Second Language Acquisition Theory in College English Speech Teaching
9
作者 Hui Zhang 《Journal of Contemporary Educational Research》 2024年第3期173-178,共6页
The teaching of English speeches in universities aims to enhance oral communication ability,improve English communication skills,and expand English knowledge,occupying a core position in English teaching in universiti... The teaching of English speeches in universities aims to enhance oral communication ability,improve English communication skills,and expand English knowledge,occupying a core position in English teaching in universities.This article takes the theory of second language acquisition as the background,analyzes the important role and value of this theory in English speech teaching in universities,and explores how to apply the theory of second language acquisition in English speech teaching in universities.It aims to strengthen the cultivation of English skilled talents and provide a brief reference for improving English speech teaching in universities. 展开更多
关键词 Second language acquisition theory Teaching English speeches in universities Practical strategies
下载PDF
Audiovisual speech recognition based on a deep convolutional neural network
10
作者 Shashidhar Rudregowda Sudarshan Patilkulkarni +2 位作者 Vinayakumar Ravi Gururaj H.L. Moez Krichen 《Data Science and Management》 2024年第1期25-34,共10页
Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for India... Audiovisual speech recognition is an emerging research topic.Lipreading is the recognition of what someone is saying using visual information,primarily lip movements.In this study,we created a custom dataset for Indian English linguistics and categorized it into three main categories:(1)audio recognition,(2)visual feature extraction,and(3)combined audio and visual recognition.Audio features were extracted using the mel-frequency cepstral coefficient,and classification was performed using a one-dimension convolutional neural network.Visual feature extraction uses Dlib and then classifies visual speech using a long short-term memory type of recurrent neural networks.Finally,integration was performed using a deep convolutional network.The audio speech of Indian English was successfully recognized with accuracies of 93.67%and 91.53%,respectively,using testing data from 200 epochs.The training accuracy for visual speech recognition using the Indian English dataset was 77.48%and the test accuracy was 76.19%using 60 epochs.After integration,the accuracies of audiovisual speech recognition using the Indian English dataset for training and testing were 94.67%and 91.75%,respectively. 展开更多
关键词 Audiovisual speech recognition Custom dataset 1D Convolution neural network(CNN) Deep CNN(DCNN) Long short-term memory(LSTM) LIPREADING Dlib Mel-frequency cepstral coefficient(MFCC)
下载PDF
基于Speech SDK的语音应用程序实现 被引量:11
11
作者 高敬惠 姜子敬 胡金铭 《广西科学院学报》 2005年第3期169-172,共4页
利用MicrosoftSpeechSDK的APIforText-to-Speech和APIforSpeechRecognition,采用VisualBa-sic6.0语言,建立文本语音转换应用程序和实现语音识别程序,简单地实现了语音识别的功能,识别出来的内容即可保存为文件,也可作为命令使用,让计算... 利用MicrosoftSpeechSDK的APIforText-to-Speech和APIforSpeechRecognition,采用VisualBa-sic6.0语言,建立文本语音转换应用程序和实现语音识别程序,简单地实现了语音识别的功能,识别出来的内容即可保存为文件,也可作为命令使用,让计算机执行某项操作。 展开更多
关键词 应用程序 文本语音转换 语音识别 MICROSOFT speech SDK
下载PDF
基于Speech SDK的机器人语音交互系统设计 被引量:8
12
作者 陈景帅 周风余 《北京联合大学学报》 CAS 2010年第1期25-29,共5页
介绍了一种基于Microsoft Speech SDK5.1的机器人语音交互系统,利用Speech SDK5.1提供的应用程序编程接口SAPI进行语音识别,对识别结果在逻辑程序中处理,使用Inter-phonic5.0语音合成技术替代TTS技术来合成语音,实现了AHRR-I接待机器人... 介绍了一种基于Microsoft Speech SDK5.1的机器人语音交互系统,利用Speech SDK5.1提供的应用程序编程接口SAPI进行语音识别,对识别结果在逻辑程序中处理,使用Inter-phonic5.0语音合成技术替代TTS技术来合成语音,实现了AHRR-I接待机器人的语音对话和语音控制。 展开更多
关键词 接待机器人 speech SDK 语音识别 语音控制 SAPI
下载PDF
TVAR Time-frequency Analysis for Non-stationary Vibration Signals of Spacecraft 被引量:7
13
作者 杨海 程伟 朱虹 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2008年第5期423-432,共10页
Predicting the time-varying auto-spectral density of a spacecraft in high-altitude orbits requires an accurate model for the non-stationary random vibration signals with densely spaced modal frequency. The traditional... Predicting the time-varying auto-spectral density of a spacecraft in high-altitude orbits requires an accurate model for the non-stationary random vibration signals with densely spaced modal frequency. The traditional time-varying algorithm limits prediction accuracy, thus affecting a number of operational decisions. To solve this problem, a time-varying auto regressive (TVAR) model based on the process neural network (PNN) and the empirical mode decomposition (EMD) is proposed. The time-varying system is tracked on-line by establishing a time-varying parameter model, and then the relevant parameter spectrum is obtained. Firstly, the EMD method is utilized to decompose the signal into several intrinsic mode functions (IMFs). Then for each IMF, the PNN is established and the time-varying auto-spectral density is obtained. Finally, the time-frequency distribution of the signals can be reconstructed by linear superposition. The simulation and the analytical results from an example demonstrate that this approach possesses simplicity, effectiveness, and feasibility, as well as higher frequency resolution. 展开更多
关键词 non-stationary random vibration time-frequency distribution process neural network empirical mode decomposition
下载PDF
Support vector machines for emotion recognition in Chinese speech 被引量:8
14
作者 王治平 赵力 邹采荣 《Journal of Southeast University(English Edition)》 EI CAS 2003年第4期307-310,共4页
Support vector machines (SVMs) are utilized for emotion recognition in Chinese speech in this paper. Both binary class discrimination and the multi class discrimination are discussed. It proves that the emotional fe... Support vector machines (SVMs) are utilized for emotion recognition in Chinese speech in this paper. Both binary class discrimination and the multi class discrimination are discussed. It proves that the emotional features construct a nonlinear problem in the input space, and SVMs based on nonlinear mapping can solve it more effectively than other linear methods. Multi class classification based on SVMs with a soft decision function is constructed to classify the four emotion situations. Compared with principal component analysis (PCA) method and modified PCA method, SVMs perform the best result in multi class discrimination by using nonlinear kernel mapping. 展开更多
关键词 speech signal emotion recognition support vector machines
下载PDF
A novel speech emotion recognition algorithm based on combination of emotion data field and ant colony search strategy 被引量:3
15
作者 查诚 陶华伟 +3 位作者 张昕然 周琳 赵力 杨平 《Journal of Southeast University(English Edition)》 EI CAS 2016年第2期158-163,共6页
In order to effectively conduct emotion recognition from spontaneous, non-prototypical and unsegmented speech so as to create a more natural human-machine interaction; a novel speech emotion recognition algorithm base... In order to effectively conduct emotion recognition from spontaneous, non-prototypical and unsegmented speech so as to create a more natural human-machine interaction; a novel speech emotion recognition algorithm based on the combination of the emotional data field (EDF) and the ant colony search (ACS) strategy, called the EDF-ACS algorithm, is proposed. More specifically, the inter- relationship among the turn-based acoustic feature vectors of different labels are established by using the potential function in the EDF. To perform the spontaneous speech emotion recognition, the artificial colony is used to mimic the turn- based acoustic feature vectors. Then, the canonical ACS strategy is used to investigate the movement direction of each artificial ant in the EDF, which is regarded as the emotional label of the corresponding turn-based acoustic feature vector. The proposed EDF-ACS algorithm is evaluated on the continueous audio)'visual emotion challenge (AVEC) 2012 dataset, which contains the spontaneous, non-prototypical and unsegmented speech emotion data. The experimental results show that the proposed EDF-ACS algorithm outperforms the existing state-of-the-art algorithm in turn-based speech emotion recognition. 展开更多
关键词 speech emotion recognition emotional data field ant colony search human-machine interaction
下载PDF
Speech enhancement based on leakage constraints DF-GSC 被引量:1
16
作者 邹采荣 陈国明 赵力 《Journal of Southeast University(English Edition)》 EI CAS 2007年第4期507-511,共5页
In order to improve the performance of general sidelobe canceller (GSC) based speech enhancement, a leakage constraints decision feedback generalized sidelobe canceller(LCDF-GSC) algorithm is proposed. The method ... In order to improve the performance of general sidelobe canceller (GSC) based speech enhancement, a leakage constraints decision feedback generalized sidelobe canceller(LCDF-GSC) algorithm is proposed. The method adopts DF-GSC against signal mismatch, and introduces a leakage factor in the cost function to deal with the speech leakage problem which is caused by the part of the speech signal in the noise reference signal. Simulation results show that although the signal-to-noise ratio (SNR) of the speech signal through LCDF-GSC is slightly less than that of DF-GSC, the IS measurements show that the distortion of the former is less than that of the latter. MOS (mean opinion score) scores also indicate that the LCDF-GSC algorithm is better than DF- GSC and the Weiner filter algorithm, 展开更多
关键词 speech enhancement general sidelobe canceller (GSC) speech leakage
下载PDF
语音分析软件Speech Analyzer和Praat在上海市区方言鼻化韵单一化演变研究中的应用 被引量:6
17
作者 顾钦 《计算机应用与软件》 CSCD 北大核心 2006年第12期81-82,108,共3页
目前我国方言语音演变研究中音位的确定主要以传统意义上的研究方法为多,音位归纳主要凭个人经验。若记音人本身记音能力有限,可能造成记录结果与方言实际读音存在偏差。因此,如果在传统记音方法的基础上,辅以语音分析软件,将使方言语... 目前我国方言语音演变研究中音位的确定主要以传统意义上的研究方法为多,音位归纳主要凭个人经验。若记音人本身记音能力有限,可能造成记录结果与方言实际读音存在偏差。因此,如果在传统记音方法的基础上,辅以语音分析软件,将使方言语音研究更为精密、客观。以Speech Analyzer和Praat语音分析软件为例,結合上海市区方言鼻化韵单一化,即鼻化韵中前a后ɑ的对立完全消失,发成中A这一语音演变现象,以Speech Analyzer确定鼻化元音位置,以Praat做共振峰分析,为这一语音演变现象的确定提供证据。 展开更多
关键词 语音分析软件 speech ANALYZER Praat上海市区方言 鼻化韵单一化
下载PDF
Speech-ABR安静及噪声环境下音位的对比研究 被引量:6
18
作者 王倩 王燕 刘志成 《中华耳科学杂志》 CSCD 北大核心 2016年第5期634-638,共5页
目的对比speech-ABR在安静及噪声环境下单音节声母、韵母及声调的变化,研究噪声对单音节音位的影响。方法招募正常听力受试者40例(男20例,女20例),母语为汉语普通话。Speech-ABR刺激声为260ms时程的合成言语声/mi/,声调为三声,刺激强度... 目的对比speech-ABR在安静及噪声环境下单音节声母、韵母及声调的变化,研究噪声对单音节音位的影响。方法招募正常听力受试者40例(男20例,女20例),母语为汉语普通话。Speech-ABR刺激声为260ms时程的合成言语声/mi/,声调为三声,刺激强度为70d B SPL,记录右耳安静状态下及噪声状态下(信噪比SNR=-10d B)speech-ABR的反应波形。对比起始反应波形(onset response,OR)、过渡反应波形(consonant-to-vowel transition)及频率跟随反应波形(frequency following response,FFR)的潜伏期的变化。并对比安静及噪声状态下声调追踪(pitch tracking)相关系数r的变化。使用SPSS18.0软件进行数据统计分析,数据采用配对t检验分析两组的差异,P<0.05时为差异有统计学意义。结果260ms时程/mi/诱发的言语听性脑干反应波形特征,主要由潜伏期为10ms内的起始反应、潜伏期为80-220ms内的频率跟随反应及最后的终止反应组成,以及潜伏期在10-80ms内的辅音-元音过渡反应。其中起始反应部分为辅音部分所诱发;过渡反应部分为辅-元音的过渡信息诱发;由/mi/中的元音部分所诱发的频率跟随反应部分共由15个波形组成。经配对t检验分析,在安静及噪声环境下进行对比,起始反应峰值(辅音部分)平均潜伏期延长0.85±0.17ms(P=0.000)。过度反应峰值平均潜伏期延长0.75±0.15ms((P=0.000)。频率跟随反应峰值平均潜伏期延长0.38±0.10ms(P=0.000),结果均具有统计学意义。安静环境下声调追踪反应相关系数r均值为0.84±0.08,噪声环境下相关系数r均值为0.74±0.12,两者对比结果具有统计学意义((P=0.000)。结论在噪声环境下,测试音的辅音、元音对应波形潜伏期均发生变化,声调追踪系数会有所下降,提示三种音位均会受到噪声的影响。与以往主观的言语识别率测试方式及诱发电位测试相比,speech-ABR是一种客观方式评估言语声受到噪声干扰情况的测试方法。 展开更多
关键词 speech-ABR 言语噪声 单音节
下载PDF
Auditory attention model based on Chirplet for cross-corpus speech emotion recognition 被引量:1
19
作者 张昕然 宋鹏 +2 位作者 查诚 陶华伟 赵力 《Journal of Southeast University(English Edition)》 EI CAS 2016年第4期402-407,共6页
To solve the problem of mismatching features in an experimental database, which is a key technique in the field of cross-corpus speech emotion recognition, an auditory attention model based on Chirplet is proposed for... To solve the problem of mismatching features in an experimental database, which is a key technique in the field of cross-corpus speech emotion recognition, an auditory attention model based on Chirplet is proposed for feature extraction.First, in order to extract the spectra features, the auditory attention model is employed for variational emotion features detection. Then, the selective attention mechanism model is proposed to extract the salient gist features which showtheir relation to the expected performance in cross-corpus testing.Furthermore, the Chirplet time-frequency atoms are introduced to the model. By forming a complete atom database, the Chirplet can improve the spectrum feature extraction including the amount of information. Samples from multiple databases have the characteristics of multiple components. Hereby, the Chirplet expands the scale of the feature vector in the timefrequency domain. Experimental results show that, compared to the traditional feature model, the proposed feature extraction approach with the prototypical classifier has significant improvement in cross-corpus speech recognition. In addition, the proposed method has better robustness to the inconsistent sources of the training set and the testing set. 展开更多
关键词 speech emotion recognition selective attention mechanism spectrogram feature cross-corpus
下载PDF
Speech emotion recognition via discriminant-cascading dimensionality reduction 被引量:1
20
作者 王如刚 徐新洲 +3 位作者 黄程韦 吴尘 张昕然 赵力 《Journal of Southeast University(English Edition)》 EI CAS 2016年第2期151-157,共7页
In order to accurately identify speech emotion information, the discriminant-cascading effect in dimensionality reduction of speech emotion recognition is investigated. Based on the existing locality preserving projec... In order to accurately identify speech emotion information, the discriminant-cascading effect in dimensionality reduction of speech emotion recognition is investigated. Based on the existing locality preserving projections and graph embedding framework, a novel discriminant-cascading dimensionality reduction method is proposed, which is named discriminant-cascading locality preserving projections (DCLPP). The proposed method specifically utilizes supervised embedding graphs and it keeps the original space for the inner products of samples to maintain enough information for speech emotion recognition. Then, the kernel DCLPP (KDCLPP) is also proposed to extend the mapping form. Validated by the experiments on the corpus of EMO-DB and eNTERFACE'05, the proposed method can clearly outperform the existing common dimensionality reduction methods, such as principal component analysis (PCA), linear discriminant analysis (LDA), locality preserving projections (LPP), local discriminant embedding (LDE), graph-based Fisher analysis (GbFA) and so on, with different categories of classifiers. 展开更多
关键词 speech emotion recognition discriminant-cascading locality preserving projections DISCRIMINANTANALYSIS dimensionality reduction
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部