Reporting is essential in language use,including the re-expression of other people’s or self’s words,opinions,psychological activities,etc.Grasping the translation methods of reported speech in German academic paper...Reporting is essential in language use,including the re-expression of other people’s or self’s words,opinions,psychological activities,etc.Grasping the translation methods of reported speech in German academic papers is very important to improve the accuracy of academic paper translation.This study takes the translation of“Internationalization of German Universities”(Die Internationalisierung der deutschen Hochschulen),an academic paper of higher education,as an example to explore the translation methods of reported speech in German academic papers.It is found that the use of word order conversion,part of speech conversion and split translation methods can make the translation more accurate and fluent.This paper helps to grasp the rules and characteristics of the translation of reported speech in German academic papers,and also provides a reference for improving the quality of German-Chinese translation.展开更多
The teaching of English speeches in universities aims to enhance oral communication ability,improve English communication skills,and expand English knowledge,occupying a core position in English teaching in universiti...The teaching of English speeches in universities aims to enhance oral communication ability,improve English communication skills,and expand English knowledge,occupying a core position in English teaching in universities.This article takes the theory of second language acquisition as the background,analyzes the important role and value of this theory in English speech teaching in universities,and explores how to apply the theory of second language acquisition in English speech teaching in universities.It aims to strengthen the cultivation of English skilled talents and provide a brief reference for improving English speech teaching in universities.展开更多
Patients with age-related hearing loss face hearing difficulties in daily life.The causes of age-related hearing loss are complex and include changes in peripheral hearing,central processing,and cognitive-related abil...Patients with age-related hearing loss face hearing difficulties in daily life.The causes of age-related hearing loss are complex and include changes in peripheral hearing,central processing,and cognitive-related abilities.Furthermore,the factors by which aging relates to hearing loss via changes in audito ry processing ability are still unclear.In this cross-sectional study,we evaluated 27 older adults(over 60 years old) with age-related hearing loss,21 older adults(over 60years old) with normal hearing,and 30 younger subjects(18-30 years old) with normal hearing.We used the outcome of the uppe r-threshold test,including the time-compressed thres h old and the speech recognition threshold in noisy conditions,as a behavioral indicator of auditory processing ability.We also used electroencephalogra p hy to identify presbycusis-related abnormalities in the brain while the participants were in a spontaneous resting state.The timecompressed threshold and speech recognition threshold data indicated significant diffe rences among the groups.In patients with age-related hearing loss,information masking(babble noise) had a greater effect than energy masking(speech-shaped noise) on processing difficulties.In terms of resting-state electroencephalography signals,we observed enhanced fro ntal lobe(Brodmann’s area,BA11) activation in the older adults with normal hearing compared with the younger participants with normal hearing,and greater activation in the parietal(BA7) and occipital(BA19) lobes in the individuals with age-related hearing loss compared with the younger adults.Our functional connection analysis suggested that compared with younger people,the older adults with normal hearing exhibited enhanced connections among networks,including the default mode network,sensorimotor network,cingulo-opercular network,occipital network,and frontoparietal network.These results suggest that both normal aging and the development of age-related hearing loss have a negative effect on advanced audito ry processing capabilities and that hearing loss accele rates the decline in speech comprehension,especially in speech competition situations.Older adults with normal hearing may have increased compensatory attentional resource recruitment represented by the to p-down active listening mechanism,while those with age-related hearing loss exhibit decompensation of network connections involving multisensory integration.展开更多
Automatic Speech Emotion Recognition(SER)is used to recognize emotion from speech automatically.Speech Emotion recognition is working well in a laboratory environment but real-time emotion recognition has been influen...Automatic Speech Emotion Recognition(SER)is used to recognize emotion from speech automatically.Speech Emotion recognition is working well in a laboratory environment but real-time emotion recognition has been influenced by the variations in gender,age,the cultural and acoustical background of the speaker.The acoustical resemblance between emotional expressions further increases the complexity of recognition.Many recent research works are concentrated to address these effects individually.Instead of addressing every influencing attribute individually,we would like to design a system,which reduces the effect that arises on any factor.We propose a two-level Hierarchical classifier named Interpreter of responses(IR).Thefirst level of IR has been realized using Support Vector Machine(SVM)and Gaussian Mixer Model(GMM)classifiers.In the second level of IR,a discriminative SVM classifier has been trained and tested with meta information offirst-level classifiers along with the input acoustical feature vector which is used in primary classifiers.To train the system with a corpus of versatile nature,an integrated emotion corpus has been composed using emotion samples of 5 speech corpora,namely;EMO-DB,IITKGP-SESC,SAVEE Corpus,Spanish emotion corpus,CMU's Woogle corpus.The hierarchical classifier has been trained and tested using MFCC and Low-Level Descriptors(LLD).The empirical analysis shows that the proposed classifier outperforms the traditional classifiers.The proposed ensemble design is very generic and can be adapted even when the number and nature of features change.Thefirst-level classifiers GMM or SVM may be replaced with any other learning algorithm.展开更多
Purpose:Our study aims to compare speech understanding in noise and spectral-temporal resolution skills with regard to the degree of hearing loss,age,hearing aid use experience and gender of hearing aid users.Methods:...Purpose:Our study aims to compare speech understanding in noise and spectral-temporal resolution skills with regard to the degree of hearing loss,age,hearing aid use experience and gender of hearing aid users.Methods:Our study included sixty-eight hearing aid users aged between 40-70 years,with bilateral mild and moderate symmetrical sensorineural hearing loss.Random gap detection test,Turkish matrix test and spectral-temporally modulated ripple test were implemented on the participants with bilateral hearing aids.The test results acquired were compared statistically according to different variables and the correlations were examined.Results:No statistically significant differences were observed for speech-in-noise recognition,spectraltemporal resolution among older and younger adults in hearing aid users(p>0.05).There wasn’t found a statistically significant difference among test outcomes as regards different hearing loss degrees(p>0.05).Higher performances were obtained in terms of temporal resolution in male participants and participants with more hearing aid use experience(p<0.05).Significant correlations were obtained between the results of speech-in-noise recognition,temporal resolution and spectral resolution tests performed with hearing aids(p<0.05).Conclusion:Our study findings emphasized the importance of regular hearing aid use and it showed that some auditory skills can be improved with hearing aids.Observation of correlations among the speechin-noise recognition,temporal resolution and spectral resolution tests have revealed that these skills should be evaluated as a whole to maximize the patient’s communication abilities.展开更多
Speech recognition is a hot topic in the field of artificial intelligence.Generally,speech recognition models can only run on large servers or dedicated chips.This paper presents a keyword speech recognition system ba...Speech recognition is a hot topic in the field of artificial intelligence.Generally,speech recognition models can only run on large servers or dedicated chips.This paper presents a keyword speech recognition system based on a neural network and a conventional STM32 chip.To address the limited Flash and ROM resources on the STM32 MCU chip,the deployment of the speech recognition model is optimized to meet the requirements of keyword recognition.Firstly,the audio information obtained through sensors is subjected to MFCC(Mel Frequency Cepstral Coefficient)feature extraction,and the extracted MFCC features are input into a CNN(Convolutional Neural Network)for deep feature extraction.Then,the features are input into a fully connected layer,and finally,the speech keyword is classified and predicted.Deploying the model to the STM32F429,the prediction model achieves an accuracy of 90.58%,a decrease of less than 1%compared to the accuracy of 91.49%running on a computer,with good performance.展开更多
Speech recognition systems have become a unique human-computer interaction(HCI)family.Speech is one of the most naturally developed human abilities;speech signal processing opens up a transparent and hand-free computa...Speech recognition systems have become a unique human-computer interaction(HCI)family.Speech is one of the most naturally developed human abilities;speech signal processing opens up a transparent and hand-free computation experience.This paper aims to present a retrospective yet modern approach to the world of speech recognition systems.The development journey of ASR(Automatic Speech Recognition)has seen quite a few milestones and breakthrough technologies that have been highlighted in this paper.A step-by-step rundown of the fundamental stages in developing speech recognition systems has been presented,along with a brief discussion of various modern-day developments and applications in this domain.This review paper aims to summarize and provide a beginning point for those starting in the vast field of speech signal processing.Since speech recognition has a vast potential in various industries like telecommunication,emotion recognition,healthcare,etc.,this review would be helpful to researchers who aim at exploring more applications that society can quickly adopt in future years of evolution.展开更多
Speech disorders are a common type of childhood disease.Through experimental intervention,this study aims to improve the vocabulary comprehension levels and language ability of children with speech disorders through t...Speech disorders are a common type of childhood disease.Through experimental intervention,this study aims to improve the vocabulary comprehension levels and language ability of children with speech disorders through the language cognition and emotional speech community method.We also conduct a statistical analysis of the inter-ventional effect.Among children with speech disorders in Dongguan City,224 were selected and grouped accord-ing to their receptive language ability and IQ.The 112 children in the experimental group(EG)received speech therapy with language cognitive and emotional speech community,while the 112 children in the control group(CG)only received conventional treatment.After six months of experimental intervention,the Peabody Picture Vocabulary Test-Revised(PPVT-R)was used to test the language ability of the two groups.Overall,we employed a quantitative approach to obtain numerical values,examine the variables identified,and test hypotheses.Further-more,we used descriptive statistics to explore the research questions related to the study and statistically describe the overall distribution of the demographic variables.The statistical t-test was used to analyze the data.The data shows that after intervention through language cognition and emotional speech community therapy,the PPVT-R score of the EG was significantly higher than that of the CG.Therefore,we conclude that there is a significant difference in language ability between the EG and CG after the therapy.Although both groups improved,the post-therapy language level of EG is significantly higher than that of CG.The total effective rate in EG is higher than CG,and the difference is statistically significant(p<0.05).Therefore,we conclude that the language cogni-tion and emotional speech community method is effective as an interventional treatment of children’s speech dis-orders and that it is more effective than traditional treatment methods.展开更多
Applied linguistics is one of the fields in the linguistics domain and deals with the practical applications of the language studies such as speech processing,language teaching,translation and speech therapy.The ever-...Applied linguistics is one of the fields in the linguistics domain and deals with the practical applications of the language studies such as speech processing,language teaching,translation and speech therapy.The ever-growing Online Social Networks(OSNs)experience a vital issue to confront,i.e.,hate speech.Amongst the OSN-oriented security problems,the usage of offensive language is the most important threat that is prevalently found across the Internet.Based on the group targeted,the offensive language varies in terms of adult content,hate speech,racism,cyberbullying,abuse,trolling and profanity.Amongst these,hate speech is the most intimidating form of using offensive language in which the targeted groups or individuals are intimidated with the intent of creating harm,social chaos or violence.Machine Learning(ML)techniques have recently been applied to recognize hate speech-related content.The current research article introduces a Grasshopper Optimization with an Attentive Recurrent Network for Offensive Speech Detection(GOARN-OSD)model for social media.The GOARNOSD technique integrates the concepts of DL and metaheuristic algorithms for detecting hate speech.In the presented GOARN-OSD technique,the primary stage involves the data pre-processing and word embedding processes.Then,this study utilizes the Attentive Recurrent Network(ARN)model for hate speech recognition and classification.At last,the Grasshopper Optimization Algorithm(GOA)is exploited as a hyperparameter optimizer to boost the performance of the hate speech recognition process.To depict the promising performance of the proposed GOARN-OSD method,a widespread experimental analysis was conducted.The comparison study outcomes demonstrate the superior performance of the proposed GOARN-OSD model over other state-of-the-art approaches.展开更多
Recently,artificial-intelligence-based automatic customer response sys-tem has been widely used instead of customer service representatives.Therefore,it is important for automatic customer service to promptly recognize...Recently,artificial-intelligence-based automatic customer response sys-tem has been widely used instead of customer service representatives.Therefore,it is important for automatic customer service to promptly recognize emotions in a customer’s voice to provide the appropriate service accordingly.Therefore,we analyzed the performance of the emotion recognition(ER)accuracy as a function of the simulation time using the proposed chunk-based speech ER(CSER)model.The proposed CSER model divides voice signals into 3-s long chunks to effi-ciently recognize characteristically inherent emotions in the customer’s voice.We evaluated the performance of the ER of voice signal chunks by applying four RNN techniques—long short-term memory(LSTM),bidirectional-LSTM,gated recurrent units(GRU),and bidirectional-GRU—to the proposed CSER model individually to assess its ER accuracy and time efficiency.The results reveal that GRU shows the best time efficiency in recognizing emotions from speech signals in terms of accuracy as a function of simulation time.展开更多
Classification of speech signals is a vital part of speech signal processing systems.With the advent of speech coding and synthesis,the classification of the speech signal is made accurate and faster.Conventional meth...Classification of speech signals is a vital part of speech signal processing systems.With the advent of speech coding and synthesis,the classification of the speech signal is made accurate and faster.Conventional methods are considered inaccurate due to the uncertainty and diversity of speech signals in the case of real speech signal classification.In this paper,we use efficient speech signal classification using a series of neural network classifiers with reinforcement learning operations.Prior classification of speech signals,the study extracts the essential features from the speech signal using Cepstral Analysis.The features are extracted by converting the speech waveform to a parametric representation to obtain a relatively minimized data rate.Hence to improve the precision of classification,Generative Adversarial Networks are used and it tends to classify the speech signal after the extraction of features from the speech signal using the cepstral coefficient.The classifiers are trained with these features initially and the best classifier is chosen to perform the task of classification on new datasets.The validation of testing sets is evaluated using RL that provides feedback to Classifiers.Finally,at the user interface,the signals are played by decoding the signal after being retrieved from the classifier back based on the input query.The results are evaluated in the form of accuracy,recall,precision,f-measure,and error rate,where generative adversarial network attains an increased accuracy rate than other methods:Multi-Layer Perceptron,Recurrent Neural Networks,Deep belief Networks,and Convolutional Neural Networks.展开更多
The hidden danger of the automatic speaker verification(ASV)system is various spoofed speeches.These threats can be classified into two categories,namely logical access(LA)and physical access(PA).To improve identifica...The hidden danger of the automatic speaker verification(ASV)system is various spoofed speeches.These threats can be classified into two categories,namely logical access(LA)and physical access(PA).To improve identification capability of spoofed speech detection,this paper considers the research on features.Firstly,following the idea of modifying the constant-Q-based features,this work considered adding variance or mean to the constant-Q-based cepstral domain to obtain good performance.Secondly,linear frequency cepstral coefficients(LFCCs)performed comparably with constant-Q-based features.Finally,we proposed linear frequency variance-based cepstral coefficients(LVCCs)and linear frequency mean-based cepstral coefficients(LMCCs)for identification of speech spoofing.LVCCs and LMCCs could be attained by adding the frame variance or the mean to the log magnitude spectrum based on LFCC features.The proposed novel features were evaluated on ASVspoof 2019 datase.The experimental results show that compared with known hand-crafted features,LVCCs and LMCCs are more effective in resisting spoofed speech attack.展开更多
Every public speaker prepares his or her public speech meticulously.Witty remarks emerge in an endless stream,and demonstrate the rhetoric beauty of English to a great extent.Almost every speaker employs parallelism i...Every public speaker prepares his or her public speech meticulously.Witty remarks emerge in an endless stream,and demonstrate the rhetoric beauty of English to a great extent.Almost every speaker employs parallelism in his or her public speeches.The present paper is intended to study the concept,the classification and the significance of parallelism in English.展开更多
In recent years,the CET-6(College English Test,Band 6)has become an important standard to measure the English language ability of Chinese college students,and listening is a challenging task for many students.The appl...In recent years,the CET-6(College English Test,Band 6)has become an important standard to measure the English language ability of Chinese college students,and listening is a challenging task for many students.The application of indirect speech act theory in CET-6 listening is seldom studied.Therefore,it is crucial to study on it.Based on the two observation points of indirect speech act theory in CET-6 listening,this study analyzes the specific embodiment of conventional indirect speech acts and non-conventional indirect speech acts in listening,and discusses the effective teaching methods of CET-6 listening.展开更多
The research examines President Xi’s 2021 New Year speech with research questions centering around its abundant interpersonal meanings.Through qualitative content analysis,the research finds that it is typical for Ch...The research examines President Xi’s 2021 New Year speech with research questions centering around its abundant interpersonal meanings.Through qualitative content analysis,the research finds that it is typical for Chinese president to frequently use judgment and appreciation resources in reviewing the past year.Even in the face of the pandemic and natural disasters,the overall emotions of the speech remain positive,which corresponds to the forward-looking feature of New Year speech.Significance of the study abounds and future research can investigate how COVID-19 impacts the ideologies conveyed through political leaders’speeches through a comparative lens and how to produce more understandings that can help dismantle stereotypes and discrimination hidden in reports about COVID-19 by using Appraisal Theory critically,systematically,and comprehensively.展开更多
Address terms are an important resource for conveying relationships.In daily interactions,the use of address terms is unavoidable.Therefore,the choose of address terms is particularly critical to make communicate smoo...Address terms are an important resource for conveying relationships.In daily interactions,the use of address terms is unavoidable.Therefore,the choose of address terms is particularly critical to make communicate smoothly with people.Previous scholars have studied address terms from a wide range of perspectives.But there is not much research on Chinese address terms based on speech act theory,which is put forward by John Langshaw Austin.By observing communicative conversations,several findings have been emerged from the analysis.They are professional title,affectionate address terms,and common address terms.These findings express the existential significance of address terms,which are the premise for people to maximize their illocutionary act when communicating.Correct understanding and grasp of the use of address terms is conductive to unfolding the communicative dialogue more smoothly.展开更多
Large Language Models(LLMs)are increasingly demonstrating their ability to understand natural language and solve complex tasks,especially through text generation.One of the relevant capabilities is contextual learning...Large Language Models(LLMs)are increasingly demonstrating their ability to understand natural language and solve complex tasks,especially through text generation.One of the relevant capabilities is contextual learning,which involves the ability to receive instructions in natural language or task demonstrations to generate expected outputs for test instances without the need for additional training or gradient updates.In recent years,the popularity of social networking has provided a medium through which some users can engage in offensive and harmful online behavior.In this study,we investigate the ability of different LLMs,ranging from zero-shot and few-shot learning to fine-tuning.Our experiments show that LLMs can identify sexist and hateful online texts using zero-shot and few-shot approaches through information retrieval.Furthermore,it is found that the encoder-decoder model called Zephyr achieves the best results with the fine-tuning approach,scoring 86.811%on the Explainable Detection of Online Sexism(EDOS)test-set and 57.453%on the Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter(HatEval)test-set.Finally,it is confirmed that the evaluated models perform well in hate text detection,as they beat the best result in the HatEval task leaderboard.The error analysis shows that contextual learning had difficulty distinguishing between types of hate speech and figurative language.However,the fine-tuned approach tends to produce many false positives.展开更多
Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is ext...Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is extremely high,so we introduce a hybrid filter-wrapper feature selection algorithm based on an improved equilibrium optimizer for constructing an emotion recognition system.The proposed algorithm implements multi-objective emotion recognition with the minimum number of selected features and maximum accuracy.First,we use the information gain and Fisher Score to sort the features extracted from signals.Then,we employ a multi-objective ranking method to evaluate these features and assign different importance to them.Features with high rankings have a large probability of being selected.Finally,we propose a repair strategy to address the problem of duplicate solutions in multi-objective feature selection,which can improve the diversity of solutions and avoid falling into local traps.Using random forest and K-nearest neighbor classifiers,four English speech emotion datasets are employed to test the proposed algorithm(MBEO)as well as other multi-objective emotion identification techniques.The results illustrate that it performs well in inverted generational distance,hypervolume,Pareto solutions,and execution time,and MBEO is appropriate for high-dimensional English SER.展开更多
In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a p...In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues, we propose a speech-text multimodal dual-tower architecture for speech recognition. It employs cross-modal interactions to achieve close semantic alignment during the encoding stage and strengthen its capabilities in modeling auditory long-distance context dependencies. In addition, a two-stage training strategy is elaborately devised to derive semantics-aware acoustic representations effectively. The first stage focuses on pre-training the speech-text multimodal encoding module to enhance inter-modal semantic alignment and aural long-distance context dependencies. The second stage fine-tunes the entire network to bridge the input modality variation gap between the training and inference phases and boost generalization performance. Extensive experiments demonstrate the effectiveness of the proposed speech-text multimodal speech recognition method on the ATCC and AISHELL-1 datasets. It reduces the character error rate to 6.54% and 8.73%, respectively, and exhibits substantial performance gains of 28.76% and 23.82% compared with the best baseline model. The case studies indicate that the obtained semantics-aware acoustic representations aid in accurately recognizing terms with similar pronunciations but distinctive semantics. The research provides a novel modeling paradigm for semantics-aware speech recognition in air traffic control communications, which could contribute to the advancement of intelligent and efficient aviation safety management.展开更多
Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotiona...Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotional states of speakers holds significant importance in a range of real-time applications,including but not limited to virtual reality,human-robot interaction,emergency centers,and human behavior assessment.Accurately identifying emotions in the SER process relies on extracting relevant information from audio inputs.Previous studies on SER have predominantly utilized short-time characteristics such as Mel Frequency Cepstral Coefficients(MFCCs)due to their ability to capture the periodic nature of audio signals effectively.Although these traits may improve their ability to perceive and interpret emotional depictions appropriately,MFCCS has some limitations.So this study aims to tackle the aforementioned issue by systematically picking multiple audio cues,enhancing the classifier model’s efficacy in accurately discerning human emotions.The utilized dataset is taken from the EMO-DB database,preprocessing input speech is done using a 2D Convolution Neural Network(CNN)involves applying convolutional operations to spectrograms as they afford a visual representation of the way the audio signal frequency content changes over time.The next step is the spectrogram data normalization which is crucial for Neural Network(NN)training as it aids in faster convergence.Then the five auditory features MFCCs,Chroma,Mel-Spectrogram,Contrast,and Tonnetz are extracted from the spectrogram sequentially.The attitude of feature selection is to retain only dominant features by excluding the irrelevant ones.In this paper,the Sequential Forward Selection(SFS)and Sequential Backward Selection(SBS)techniques were employed for multiple audio cues features selection.Finally,the feature sets composed from the hybrid feature extraction methods are fed into the deep Bidirectional Long Short Term Memory(Bi-LSTM)network to discern emotions.Since the deep Bi-LSTM can hierarchically learn complex features and increases model capacity by achieving more robust temporal modeling,it is more effective than a shallow Bi-LSTM in capturing the intricate tones of emotional content existent in speech signals.The effectiveness and resilience of the proposed SER model were evaluated by experiments,comparing it to state-of-the-art SER techniques.The results indicated that the model achieved accuracy rates of 90.92%,93%,and 92%over the Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS),Berlin Database of Emotional Speech(EMO-DB),and The Interactive Emotional Dyadic Motion Capture(IEMOCAP)datasets,respectively.These findings signify a prominent enhancement in the ability to emotional depictions identification in speech,showcasing the potential of the proposed model in advancing the SER field.展开更多
文摘Reporting is essential in language use,including the re-expression of other people’s or self’s words,opinions,psychological activities,etc.Grasping the translation methods of reported speech in German academic papers is very important to improve the accuracy of academic paper translation.This study takes the translation of“Internationalization of German Universities”(Die Internationalisierung der deutschen Hochschulen),an academic paper of higher education,as an example to explore the translation methods of reported speech in German academic papers.It is found that the use of word order conversion,part of speech conversion and split translation methods can make the translation more accurate and fluent.This paper helps to grasp the rules and characteristics of the translation of reported speech in German academic papers,and also provides a reference for improving the quality of German-Chinese translation.
文摘The teaching of English speeches in universities aims to enhance oral communication ability,improve English communication skills,and expand English knowledge,occupying a core position in English teaching in universities.This article takes the theory of second language acquisition as the background,analyzes the important role and value of this theory in English speech teaching in universities,and explores how to apply the theory of second language acquisition in English speech teaching in universities.It aims to strengthen the cultivation of English skilled talents and provide a brief reference for improving English speech teaching in universities.
基金supported by the National Natural Science Foundation of China,Nos.82171138 (to YQZ),82071 062 (to YXC)the Natural Science Foundation of Guangdong Province,No.2021A1515012038 (to YXC)+1 种基金the Fundamental Research Funds for the Central Universities,No.20ykpy91 (to YXC)the Sun Yat-Sen Clinical Research Cultivating Program,No.SYS-Q-201903 (to YXC)。
文摘Patients with age-related hearing loss face hearing difficulties in daily life.The causes of age-related hearing loss are complex and include changes in peripheral hearing,central processing,and cognitive-related abilities.Furthermore,the factors by which aging relates to hearing loss via changes in audito ry processing ability are still unclear.In this cross-sectional study,we evaluated 27 older adults(over 60 years old) with age-related hearing loss,21 older adults(over 60years old) with normal hearing,and 30 younger subjects(18-30 years old) with normal hearing.We used the outcome of the uppe r-threshold test,including the time-compressed thres h old and the speech recognition threshold in noisy conditions,as a behavioral indicator of auditory processing ability.We also used electroencephalogra p hy to identify presbycusis-related abnormalities in the brain while the participants were in a spontaneous resting state.The timecompressed threshold and speech recognition threshold data indicated significant diffe rences among the groups.In patients with age-related hearing loss,information masking(babble noise) had a greater effect than energy masking(speech-shaped noise) on processing difficulties.In terms of resting-state electroencephalography signals,we observed enhanced fro ntal lobe(Brodmann’s area,BA11) activation in the older adults with normal hearing compared with the younger participants with normal hearing,and greater activation in the parietal(BA7) and occipital(BA19) lobes in the individuals with age-related hearing loss compared with the younger adults.Our functional connection analysis suggested that compared with younger people,the older adults with normal hearing exhibited enhanced connections among networks,including the default mode network,sensorimotor network,cingulo-opercular network,occipital network,and frontoparietal network.These results suggest that both normal aging and the development of age-related hearing loss have a negative effect on advanced audito ry processing capabilities and that hearing loss accele rates the decline in speech comprehension,especially in speech competition situations.Older adults with normal hearing may have increased compensatory attentional resource recruitment represented by the to p-down active listening mechanism,while those with age-related hearing loss exhibit decompensation of network connections involving multisensory integration.
文摘Automatic Speech Emotion Recognition(SER)is used to recognize emotion from speech automatically.Speech Emotion recognition is working well in a laboratory environment but real-time emotion recognition has been influenced by the variations in gender,age,the cultural and acoustical background of the speaker.The acoustical resemblance between emotional expressions further increases the complexity of recognition.Many recent research works are concentrated to address these effects individually.Instead of addressing every influencing attribute individually,we would like to design a system,which reduces the effect that arises on any factor.We propose a two-level Hierarchical classifier named Interpreter of responses(IR).Thefirst level of IR has been realized using Support Vector Machine(SVM)and Gaussian Mixer Model(GMM)classifiers.In the second level of IR,a discriminative SVM classifier has been trained and tested with meta information offirst-level classifiers along with the input acoustical feature vector which is used in primary classifiers.To train the system with a corpus of versatile nature,an integrated emotion corpus has been composed using emotion samples of 5 speech corpora,namely;EMO-DB,IITKGP-SESC,SAVEE Corpus,Spanish emotion corpus,CMU's Woogle corpus.The hierarchical classifier has been trained and tested using MFCC and Low-Level Descriptors(LLD).The empirical analysis shows that the proposed classifier outperforms the traditional classifiers.The proposed ensemble design is very generic and can be adapted even when the number and nature of features change.Thefirst-level classifiers GMM or SVM may be replaced with any other learning algorithm.
文摘Purpose:Our study aims to compare speech understanding in noise and spectral-temporal resolution skills with regard to the degree of hearing loss,age,hearing aid use experience and gender of hearing aid users.Methods:Our study included sixty-eight hearing aid users aged between 40-70 years,with bilateral mild and moderate symmetrical sensorineural hearing loss.Random gap detection test,Turkish matrix test and spectral-temporally modulated ripple test were implemented on the participants with bilateral hearing aids.The test results acquired were compared statistically according to different variables and the correlations were examined.Results:No statistically significant differences were observed for speech-in-noise recognition,spectraltemporal resolution among older and younger adults in hearing aid users(p>0.05).There wasn’t found a statistically significant difference among test outcomes as regards different hearing loss degrees(p>0.05).Higher performances were obtained in terms of temporal resolution in male participants and participants with more hearing aid use experience(p<0.05).Significant correlations were obtained between the results of speech-in-noise recognition,temporal resolution and spectral resolution tests performed with hearing aids(p<0.05).Conclusion:Our study findings emphasized the importance of regular hearing aid use and it showed that some auditory skills can be improved with hearing aids.Observation of correlations among the speechin-noise recognition,temporal resolution and spectral resolution tests have revealed that these skills should be evaluated as a whole to maximize the patient’s communication abilities.
文摘Speech recognition is a hot topic in the field of artificial intelligence.Generally,speech recognition models can only run on large servers or dedicated chips.This paper presents a keyword speech recognition system based on a neural network and a conventional STM32 chip.To address the limited Flash and ROM resources on the STM32 MCU chip,the deployment of the speech recognition model is optimized to meet the requirements of keyword recognition.Firstly,the audio information obtained through sensors is subjected to MFCC(Mel Frequency Cepstral Coefficient)feature extraction,and the extracted MFCC features are input into a CNN(Convolutional Neural Network)for deep feature extraction.Then,the features are input into a fully connected layer,and finally,the speech keyword is classified and predicted.Deploying the model to the STM32F429,the prediction model achieves an accuracy of 90.58%,a decrease of less than 1%compared to the accuracy of 91.49%running on a computer,with good performance.
文摘Speech recognition systems have become a unique human-computer interaction(HCI)family.Speech is one of the most naturally developed human abilities;speech signal processing opens up a transparent and hand-free computation experience.This paper aims to present a retrospective yet modern approach to the world of speech recognition systems.The development journey of ASR(Automatic Speech Recognition)has seen quite a few milestones and breakthrough technologies that have been highlighted in this paper.A step-by-step rundown of the fundamental stages in developing speech recognition systems has been presented,along with a brief discussion of various modern-day developments and applications in this domain.This review paper aims to summarize and provide a beginning point for those starting in the vast field of speech signal processing.Since speech recognition has a vast potential in various industries like telecommunication,emotion recognition,healthcare,etc.,this review would be helpful to researchers who aim at exploring more applications that society can quickly adopt in future years of evolution.
文摘Speech disorders are a common type of childhood disease.Through experimental intervention,this study aims to improve the vocabulary comprehension levels and language ability of children with speech disorders through the language cognition and emotional speech community method.We also conduct a statistical analysis of the inter-ventional effect.Among children with speech disorders in Dongguan City,224 were selected and grouped accord-ing to their receptive language ability and IQ.The 112 children in the experimental group(EG)received speech therapy with language cognitive and emotional speech community,while the 112 children in the control group(CG)only received conventional treatment.After six months of experimental intervention,the Peabody Picture Vocabulary Test-Revised(PPVT-R)was used to test the language ability of the two groups.Overall,we employed a quantitative approach to obtain numerical values,examine the variables identified,and test hypotheses.Further-more,we used descriptive statistics to explore the research questions related to the study and statistically describe the overall distribution of the demographic variables.The statistical t-test was used to analyze the data.The data shows that after intervention through language cognition and emotional speech community therapy,the PPVT-R score of the EG was significantly higher than that of the CG.Therefore,we conclude that there is a significant difference in language ability between the EG and CG after the therapy.Although both groups improved,the post-therapy language level of EG is significantly higher than that of CG.The total effective rate in EG is higher than CG,and the difference is statistically significant(p<0.05).Therefore,we conclude that the language cogni-tion and emotional speech community method is effective as an interventional treatment of children’s speech dis-orders and that it is more effective than traditional treatment methods.
基金Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2023R281)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia+1 种基金Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code: (22UQU4331004DSR031)supported via funding from Prince Sattam bin Abdulaziz University project number (PSAU/2023/R/1444).
文摘Applied linguistics is one of the fields in the linguistics domain and deals with the practical applications of the language studies such as speech processing,language teaching,translation and speech therapy.The ever-growing Online Social Networks(OSNs)experience a vital issue to confront,i.e.,hate speech.Amongst the OSN-oriented security problems,the usage of offensive language is the most important threat that is prevalently found across the Internet.Based on the group targeted,the offensive language varies in terms of adult content,hate speech,racism,cyberbullying,abuse,trolling and profanity.Amongst these,hate speech is the most intimidating form of using offensive language in which the targeted groups or individuals are intimidated with the intent of creating harm,social chaos or violence.Machine Learning(ML)techniques have recently been applied to recognize hate speech-related content.The current research article introduces a Grasshopper Optimization with an Attentive Recurrent Network for Offensive Speech Detection(GOARN-OSD)model for social media.The GOARNOSD technique integrates the concepts of DL and metaheuristic algorithms for detecting hate speech.In the presented GOARN-OSD technique,the primary stage involves the data pre-processing and word embedding processes.Then,this study utilizes the Attentive Recurrent Network(ARN)model for hate speech recognition and classification.At last,the Grasshopper Optimization Algorithm(GOA)is exploited as a hyperparameter optimizer to boost the performance of the hate speech recognition process.To depict the promising performance of the proposed GOARN-OSD method,a widespread experimental analysis was conducted.The comparison study outcomes demonstrate the superior performance of the proposed GOARN-OSD model over other state-of-the-art approaches.
基金supported by the“Regional Innovation Strategy(RIS)”through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(MOE)(2021RIS-004).
文摘Recently,artificial-intelligence-based automatic customer response sys-tem has been widely used instead of customer service representatives.Therefore,it is important for automatic customer service to promptly recognize emotions in a customer’s voice to provide the appropriate service accordingly.Therefore,we analyzed the performance of the emotion recognition(ER)accuracy as a function of the simulation time using the proposed chunk-based speech ER(CSER)model.The proposed CSER model divides voice signals into 3-s long chunks to effi-ciently recognize characteristically inherent emotions in the customer’s voice.We evaluated the performance of the ER of voice signal chunks by applying four RNN techniques—long short-term memory(LSTM),bidirectional-LSTM,gated recurrent units(GRU),and bidirectional-GRU—to the proposed CSER model individually to assess its ER accuracy and time efficiency.The results reveal that GRU shows the best time efficiency in recognizing emotions from speech signals in terms of accuracy as a function of simulation time.
文摘Classification of speech signals is a vital part of speech signal processing systems.With the advent of speech coding and synthesis,the classification of the speech signal is made accurate and faster.Conventional methods are considered inaccurate due to the uncertainty and diversity of speech signals in the case of real speech signal classification.In this paper,we use efficient speech signal classification using a series of neural network classifiers with reinforcement learning operations.Prior classification of speech signals,the study extracts the essential features from the speech signal using Cepstral Analysis.The features are extracted by converting the speech waveform to a parametric representation to obtain a relatively minimized data rate.Hence to improve the precision of classification,Generative Adversarial Networks are used and it tends to classify the speech signal after the extraction of features from the speech signal using the cepstral coefficient.The classifiers are trained with these features initially and the best classifier is chosen to perform the task of classification on new datasets.The validation of testing sets is evaluated using RL that provides feedback to Classifiers.Finally,at the user interface,the signals are played by decoding the signal after being retrieved from the classifier back based on the input query.The results are evaluated in the form of accuracy,recall,precision,f-measure,and error rate,where generative adversarial network attains an increased accuracy rate than other methods:Multi-Layer Perceptron,Recurrent Neural Networks,Deep belief Networks,and Convolutional Neural Networks.
基金National Natural Science Foundation of China(No.62001100)。
文摘The hidden danger of the automatic speaker verification(ASV)system is various spoofed speeches.These threats can be classified into two categories,namely logical access(LA)and physical access(PA).To improve identification capability of spoofed speech detection,this paper considers the research on features.Firstly,following the idea of modifying the constant-Q-based features,this work considered adding variance or mean to the constant-Q-based cepstral domain to obtain good performance.Secondly,linear frequency cepstral coefficients(LFCCs)performed comparably with constant-Q-based features.Finally,we proposed linear frequency variance-based cepstral coefficients(LVCCs)and linear frequency mean-based cepstral coefficients(LMCCs)for identification of speech spoofing.LVCCs and LMCCs could be attained by adding the frame variance or the mean to the log magnitude spectrum based on LFCC features.The proposed novel features were evaluated on ASVspoof 2019 datase.The experimental results show that compared with known hand-crafted features,LVCCs and LMCCs are more effective in resisting spoofed speech attack.
文摘Every public speaker prepares his or her public speech meticulously.Witty remarks emerge in an endless stream,and demonstrate the rhetoric beauty of English to a great extent.Almost every speaker employs parallelism in his or her public speeches.The present paper is intended to study the concept,the classification and the significance of parallelism in English.
文摘In recent years,the CET-6(College English Test,Band 6)has become an important standard to measure the English language ability of Chinese college students,and listening is a challenging task for many students.The application of indirect speech act theory in CET-6 listening is seldom studied.Therefore,it is crucial to study on it.Based on the two observation points of indirect speech act theory in CET-6 listening,this study analyzes the specific embodiment of conventional indirect speech acts and non-conventional indirect speech acts in listening,and discusses the effective teaching methods of CET-6 listening.
基金Under the major project of the Center for Language Education and Cooperation in 2021“Research on the Construction and Promotion of International Chinese Education Standard System”(21YH04A).
文摘The research examines President Xi’s 2021 New Year speech with research questions centering around its abundant interpersonal meanings.Through qualitative content analysis,the research finds that it is typical for Chinese president to frequently use judgment and appreciation resources in reviewing the past year.Even in the face of the pandemic and natural disasters,the overall emotions of the speech remain positive,which corresponds to the forward-looking feature of New Year speech.Significance of the study abounds and future research can investigate how COVID-19 impacts the ideologies conveyed through political leaders’speeches through a comparative lens and how to produce more understandings that can help dismantle stereotypes and discrimination hidden in reports about COVID-19 by using Appraisal Theory critically,systematically,and comprehensively.
文摘Address terms are an important resource for conveying relationships.In daily interactions,the use of address terms is unavoidable.Therefore,the choose of address terms is particularly critical to make communicate smoothly with people.Previous scholars have studied address terms from a wide range of perspectives.But there is not much research on Chinese address terms based on speech act theory,which is put forward by John Langshaw Austin.By observing communicative conversations,several findings have been emerged from the analysis.They are professional title,affectionate address terms,and common address terms.These findings express the existential significance of address terms,which are the premise for people to maximize their illocutionary act when communicating.Correct understanding and grasp of the use of address terms is conductive to unfolding the communicative dialogue more smoothly.
基金This work is part of the research projects LaTe4PoliticES(PID2022-138099OBI00)funded by MICIU/AEI/10.13039/501100011033the European Regional Development Fund(ERDF)-A Way of Making Europe and LT-SWM(TED2021-131167B-I00)funded by MICIU/AEI/10.13039/501100011033the European Union NextGenerationEU/PRTR.Mr.Ronghao Pan is supported by the Programa Investigo grant,funded by the Region of Murcia,the Spanish Ministry of Labour and Social Economy and the European Union-NextGenerationEU under the“Plan de Recuperación,Transformación y Resiliencia(PRTR).”。
文摘Large Language Models(LLMs)are increasingly demonstrating their ability to understand natural language and solve complex tasks,especially through text generation.One of the relevant capabilities is contextual learning,which involves the ability to receive instructions in natural language or task demonstrations to generate expected outputs for test instances without the need for additional training or gradient updates.In recent years,the popularity of social networking has provided a medium through which some users can engage in offensive and harmful online behavior.In this study,we investigate the ability of different LLMs,ranging from zero-shot and few-shot learning to fine-tuning.Our experiments show that LLMs can identify sexist and hateful online texts using zero-shot and few-shot approaches through information retrieval.Furthermore,it is found that the encoder-decoder model called Zephyr achieves the best results with the fine-tuning approach,scoring 86.811%on the Explainable Detection of Online Sexism(EDOS)test-set and 57.453%on the Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter(HatEval)test-set.Finally,it is confirmed that the evaluated models perform well in hate text detection,as they beat the best result in the HatEval task leaderboard.The error analysis shows that contextual learning had difficulty distinguishing between types of hate speech and figurative language.However,the fine-tuned approach tends to produce many false positives.
文摘Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is extremely high,so we introduce a hybrid filter-wrapper feature selection algorithm based on an improved equilibrium optimizer for constructing an emotion recognition system.The proposed algorithm implements multi-objective emotion recognition with the minimum number of selected features and maximum accuracy.First,we use the information gain and Fisher Score to sort the features extracted from signals.Then,we employ a multi-objective ranking method to evaluate these features and assign different importance to them.Features with high rankings have a large probability of being selected.Finally,we propose a repair strategy to address the problem of duplicate solutions in multi-objective feature selection,which can improve the diversity of solutions and avoid falling into local traps.Using random forest and K-nearest neighbor classifiers,four English speech emotion datasets are employed to test the proposed algorithm(MBEO)as well as other multi-objective emotion identification techniques.The results illustrate that it performs well in inverted generational distance,hypervolume,Pareto solutions,and execution time,and MBEO is appropriate for high-dimensional English SER.
基金This research was funded by Shenzhen Science and Technology Program(Grant No.RCBS20221008093121051)the General Higher Education Project of Guangdong Provincial Education Department(Grant No.2020ZDZX3085)+1 种基金China Postdoctoral Science Foundation(Grant No.2021M703371)the Post-Doctoral Foundation Project of Shenzhen Polytechnic(Grant No.6021330002K).
文摘In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues, we propose a speech-text multimodal dual-tower architecture for speech recognition. It employs cross-modal interactions to achieve close semantic alignment during the encoding stage and strengthen its capabilities in modeling auditory long-distance context dependencies. In addition, a two-stage training strategy is elaborately devised to derive semantics-aware acoustic representations effectively. The first stage focuses on pre-training the speech-text multimodal encoding module to enhance inter-modal semantic alignment and aural long-distance context dependencies. The second stage fine-tunes the entire network to bridge the input modality variation gap between the training and inference phases and boost generalization performance. Extensive experiments demonstrate the effectiveness of the proposed speech-text multimodal speech recognition method on the ATCC and AISHELL-1 datasets. It reduces the character error rate to 6.54% and 8.73%, respectively, and exhibits substantial performance gains of 28.76% and 23.82% compared with the best baseline model. The case studies indicate that the obtained semantics-aware acoustic representations aid in accurately recognizing terms with similar pronunciations but distinctive semantics. The research provides a novel modeling paradigm for semantics-aware speech recognition in air traffic control communications, which could contribute to the advancement of intelligent and efficient aviation safety management.
文摘Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotional states of speakers holds significant importance in a range of real-time applications,including but not limited to virtual reality,human-robot interaction,emergency centers,and human behavior assessment.Accurately identifying emotions in the SER process relies on extracting relevant information from audio inputs.Previous studies on SER have predominantly utilized short-time characteristics such as Mel Frequency Cepstral Coefficients(MFCCs)due to their ability to capture the periodic nature of audio signals effectively.Although these traits may improve their ability to perceive and interpret emotional depictions appropriately,MFCCS has some limitations.So this study aims to tackle the aforementioned issue by systematically picking multiple audio cues,enhancing the classifier model’s efficacy in accurately discerning human emotions.The utilized dataset is taken from the EMO-DB database,preprocessing input speech is done using a 2D Convolution Neural Network(CNN)involves applying convolutional operations to spectrograms as they afford a visual representation of the way the audio signal frequency content changes over time.The next step is the spectrogram data normalization which is crucial for Neural Network(NN)training as it aids in faster convergence.Then the five auditory features MFCCs,Chroma,Mel-Spectrogram,Contrast,and Tonnetz are extracted from the spectrogram sequentially.The attitude of feature selection is to retain only dominant features by excluding the irrelevant ones.In this paper,the Sequential Forward Selection(SFS)and Sequential Backward Selection(SBS)techniques were employed for multiple audio cues features selection.Finally,the feature sets composed from the hybrid feature extraction methods are fed into the deep Bidirectional Long Short Term Memory(Bi-LSTM)network to discern emotions.Since the deep Bi-LSTM can hierarchically learn complex features and increases model capacity by achieving more robust temporal modeling,it is more effective than a shallow Bi-LSTM in capturing the intricate tones of emotional content existent in speech signals.The effectiveness and resilience of the proposed SER model were evaluated by experiments,comparing it to state-of-the-art SER techniques.The results indicated that the model achieved accuracy rates of 90.92%,93%,and 92%over the Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS),Berlin Database of Emotional Speech(EMO-DB),and The Interactive Emotional Dyadic Motion Capture(IEMOCAP)datasets,respectively.These findings signify a prominent enhancement in the ability to emotional depictions identification in speech,showcasing the potential of the proposed model in advancing the SER field.