Brain magnetic resonance images(MRI)are used to diagnose the different diseases of the brain,such as swelling and tumor detection.The quality of the brain MR images is degraded by different noises,usually salt&pep...Brain magnetic resonance images(MRI)are used to diagnose the different diseases of the brain,such as swelling and tumor detection.The quality of the brain MR images is degraded by different noises,usually salt&pepper and Gaussian noises,which are added to the MR images during the acquisition process.In the presence of these noises,medical experts are facing problems in diagnosing diseases from noisy brain MR images.Therefore,we have proposed a de-noising method by mixing concatenation,and residual deep learning techniques called the MCR de-noising method.Our proposed MCR method is to eliminate salt&pepper and gaussian noises as much as possible from the brain MRI images.The MCR method has been trained and tested on the noise quantity levels 2%to 20%for both salt&pepper and gaussian noise.The experiments have been done on publically available brain MRI image datasets,which can easily be accessible in the experiments and result section.The Structure Similarity Index Measure(SSIM)and Peak Signal-to-Noise Ratio(PSNR)calculate the similarity score between the denoised images by the proposed MCR method and the original clean images.Also,the Mean Squared Error(MSE)measures the error or difference between generated denoised and the original images.The proposed MCR denoising method has a 0.9763 SSIM score,84.3182 PSNR,and 0.0004 MSE for salt&pepper noise;similarly,0.7402 SSIM score,72.7601 PSNR,and 0.0041 MSE for Gaussian noise at the highest level of 20%noise.In the end,we have compared the MCR method with the state-of-the-art de-noising filters such as median and wiener de-noising filters.展开更多
A method of single channel speech enhancement is proposed by de-noising using stationary wavelet transform. The approach developed herein processes multi-resolution wavelet coefficients individually and then recovery ...A method of single channel speech enhancement is proposed by de-noising using stationary wavelet transform. The approach developed herein processes multi-resolution wavelet coefficients individually and then recovery signal is reconstructed. The time invariant characteristics of stationary wavelet transform is particularly useful in speech de-noising. Experimental results show that the proposed speech enhancement by de-noising algorithm is possible to achieve an excellent balance between suppresses noise effectively and preserves as many target characteristics of original signal as possible. This de-noising algorithm offers a superior performance to speech signal noise suppress.展开更多
The problem of speech enhancement using threshold de-noising in wavelet domain was considered.The appropriate decomposition level is another key factor pertinent to de-noising performance.This paper proposed a new wav...The problem of speech enhancement using threshold de-noising in wavelet domain was considered.The appropriate decomposition level is another key factor pertinent to de-noising performance.This paper proposed a new wavelet-based de-noising scheme that can improve the enhancement performance significantly in the presence of additive white Gaussian noise.The proposed algorithm can adaptively select the optimal decomposition level of wavelet transformation according to the characteristics of noisy speech.The experimental results demonstrate that this proposed algorithm outperforms the classical wavelet-based de-noising method and effectively improves the practicability of this kind of techniques.展开更多
Large Language Models(LLMs)are increasingly demonstrating their ability to understand natural language and solve complex tasks,especially through text generation.One of the relevant capabilities is contextual learning...Large Language Models(LLMs)are increasingly demonstrating their ability to understand natural language and solve complex tasks,especially through text generation.One of the relevant capabilities is contextual learning,which involves the ability to receive instructions in natural language or task demonstrations to generate expected outputs for test instances without the need for additional training or gradient updates.In recent years,the popularity of social networking has provided a medium through which some users can engage in offensive and harmful online behavior.In this study,we investigate the ability of different LLMs,ranging from zero-shot and few-shot learning to fine-tuning.Our experiments show that LLMs can identify sexist and hateful online texts using zero-shot and few-shot approaches through information retrieval.Furthermore,it is found that the encoder-decoder model called Zephyr achieves the best results with the fine-tuning approach,scoring 86.811%on the Explainable Detection of Online Sexism(EDOS)test-set and 57.453%on the Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter(HatEval)test-set.Finally,it is confirmed that the evaluated models perform well in hate text detection,as they beat the best result in the HatEval task leaderboard.The error analysis shows that contextual learning had difficulty distinguishing between types of hate speech and figurative language.However,the fine-tuned approach tends to produce many false positives.展开更多
Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is ext...Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is extremely high,so we introduce a hybrid filter-wrapper feature selection algorithm based on an improved equilibrium optimizer for constructing an emotion recognition system.The proposed algorithm implements multi-objective emotion recognition with the minimum number of selected features and maximum accuracy.First,we use the information gain and Fisher Score to sort the features extracted from signals.Then,we employ a multi-objective ranking method to evaluate these features and assign different importance to them.Features with high rankings have a large probability of being selected.Finally,we propose a repair strategy to address the problem of duplicate solutions in multi-objective feature selection,which can improve the diversity of solutions and avoid falling into local traps.Using random forest and K-nearest neighbor classifiers,four English speech emotion datasets are employed to test the proposed algorithm(MBEO)as well as other multi-objective emotion identification techniques.The results illustrate that it performs well in inverted generational distance,hypervolume,Pareto solutions,and execution time,and MBEO is appropriate for high-dimensional English SER.展开更多
Detecting hate speech automatically in social media forensics has emerged as a highly challenging task due tothe complex nature of language used in such platforms. Currently, several methods exist for classifying hate...Detecting hate speech automatically in social media forensics has emerged as a highly challenging task due tothe complex nature of language used in such platforms. Currently, several methods exist for classifying hatespeech, but they still suffer from ambiguity when differentiating between hateful and offensive content and theyalso lack accuracy. The work suggested in this paper uses a combination of the Whale Optimization Algorithm(WOA) and Particle Swarm Optimization (PSO) to adjust the weights of two Multi-Layer Perceptron (MLPs)for neutrosophic sets classification. During the training process of the MLP, the WOA is employed to exploreand determine the optimal set of weights. The PSO algorithm adjusts the weights to optimize the performanceof the MLP as fine-tuning. Additionally, in this approach, two separate MLP models are employed. One MLPis dedicated to predicting degrees of truth membership, while the other MLP focuses on predicting degrees offalse membership. The difference between these memberships quantifies uncertainty, indicating the degree ofindeterminacy in predictions. The experimental results indicate the superior performance of our model comparedto previous work when evaluated on the Davidson dataset.展开更多
Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotiona...Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotional states of speakers holds significant importance in a range of real-time applications,including but not limited to virtual reality,human-robot interaction,emergency centers,and human behavior assessment.Accurately identifying emotions in the SER process relies on extracting relevant information from audio inputs.Previous studies on SER have predominantly utilized short-time characteristics such as Mel Frequency Cepstral Coefficients(MFCCs)due to their ability to capture the periodic nature of audio signals effectively.Although these traits may improve their ability to perceive and interpret emotional depictions appropriately,MFCCS has some limitations.So this study aims to tackle the aforementioned issue by systematically picking multiple audio cues,enhancing the classifier model’s efficacy in accurately discerning human emotions.The utilized dataset is taken from the EMO-DB database,preprocessing input speech is done using a 2D Convolution Neural Network(CNN)involves applying convolutional operations to spectrograms as they afford a visual representation of the way the audio signal frequency content changes over time.The next step is the spectrogram data normalization which is crucial for Neural Network(NN)training as it aids in faster convergence.Then the five auditory features MFCCs,Chroma,Mel-Spectrogram,Contrast,and Tonnetz are extracted from the spectrogram sequentially.The attitude of feature selection is to retain only dominant features by excluding the irrelevant ones.In this paper,the Sequential Forward Selection(SFS)and Sequential Backward Selection(SBS)techniques were employed for multiple audio cues features selection.Finally,the feature sets composed from the hybrid feature extraction methods are fed into the deep Bidirectional Long Short Term Memory(Bi-LSTM)network to discern emotions.Since the deep Bi-LSTM can hierarchically learn complex features and increases model capacity by achieving more robust temporal modeling,it is more effective than a shallow Bi-LSTM in capturing the intricate tones of emotional content existent in speech signals.The effectiveness and resilience of the proposed SER model were evaluated by experiments,comparing it to state-of-the-art SER techniques.The results indicated that the model achieved accuracy rates of 90.92%,93%,and 92%over the Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS),Berlin Database of Emotional Speech(EMO-DB),and The Interactive Emotional Dyadic Motion Capture(IEMOCAP)datasets,respectively.These findings signify a prominent enhancement in the ability to emotional depictions identification in speech,showcasing the potential of the proposed model in advancing the SER field.展开更多
In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a p...In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues, we propose a speech-text multimodal dual-tower architecture for speech recognition. It employs cross-modal interactions to achieve close semantic alignment during the encoding stage and strengthen its capabilities in modeling auditory long-distance context dependencies. In addition, a two-stage training strategy is elaborately devised to derive semantics-aware acoustic representations effectively. The first stage focuses on pre-training the speech-text multimodal encoding module to enhance inter-modal semantic alignment and aural long-distance context dependencies. The second stage fine-tunes the entire network to bridge the input modality variation gap between the training and inference phases and boost generalization performance. Extensive experiments demonstrate the effectiveness of the proposed speech-text multimodal speech recognition method on the ATCC and AISHELL-1 datasets. It reduces the character error rate to 6.54% and 8.73%, respectively, and exhibits substantial performance gains of 28.76% and 23.82% compared with the best baseline model. The case studies indicate that the obtained semantics-aware acoustic representations aid in accurately recognizing terms with similar pronunciations but distinctive semantics. The research provides a novel modeling paradigm for semantics-aware speech recognition in air traffic control communications, which could contribute to the advancement of intelligent and efficient aviation safety management.展开更多
Reporting is essential in language use,including the re-expression of other people’s or self’s words,opinions,psychological activities,etc.Grasping the translation methods of reported speech in German academic paper...Reporting is essential in language use,including the re-expression of other people’s or self’s words,opinions,psychological activities,etc.Grasping the translation methods of reported speech in German academic papers is very important to improve the accuracy of academic paper translation.This study takes the translation of“Internationalization of German Universities”(Die Internationalisierung der deutschen Hochschulen),an academic paper of higher education,as an example to explore the translation methods of reported speech in German academic papers.It is found that the use of word order conversion,part of speech conversion and split translation methods can make the translation more accurate and fluent.This paper helps to grasp the rules and characteristics of the translation of reported speech in German academic papers,and also provides a reference for improving the quality of German-Chinese translation.展开更多
In recent years,the usage of social networking sites has considerably increased in the Arab world.It has empowered individuals to express their opinions,especially in politics.Furthermore,various organizations that op...In recent years,the usage of social networking sites has considerably increased in the Arab world.It has empowered individuals to express their opinions,especially in politics.Furthermore,various organizations that operate in the Arab countries have embraced social media in their day-to-day business activities at different scales.This is attributed to business owners’understanding of social media’s importance for business development.However,the Arabic morphology is too complicated to understand due to the availability of nearly 10,000 roots and more than 900 patterns that act as the basis for verbs and nouns.Hate speech over online social networking sites turns out to be a worldwide issue that reduces the cohesion of civil societies.In this background,the current study develops a Chaotic Elephant Herd Optimization with Machine Learning for Hate Speech Detection(CEHOML-HSD)model in the context of the Arabic language.The presented CEHOML-HSD model majorly concentrates on identifying and categorising the Arabic text into hate speech and normal.To attain this,the CEHOML-HSD model follows different sub-processes as discussed herewith.At the initial stage,the CEHOML-HSD model undergoes data pre-processing with the help of the TF-IDF vectorizer.Secondly,the Support Vector Machine(SVM)model is utilized to detect and classify the hate speech texts made in the Arabic language.Lastly,the CEHO approach is employed for fine-tuning the parameters involved in SVM.This CEHO approach is developed by combining the chaotic functions with the classical EHO algorithm.The design of the CEHO algorithm for parameter tuning shows the novelty of the work.A widespread experimental analysis was executed to validate the enhanced performance of the proposed CEHOML-HSD approach.The comparative study outcomes established the supremacy of the proposed CEHOML-HSD model over other approaches.展开更多
The teaching of English speeches in universities aims to enhance oral communication ability,improve English communication skills,and expand English knowledge,occupying a core position in English teaching in universiti...The teaching of English speeches in universities aims to enhance oral communication ability,improve English communication skills,and expand English knowledge,occupying a core position in English teaching in universities.This article takes the theory of second language acquisition as the background,analyzes the important role and value of this theory in English speech teaching in universities,and explores how to apply the theory of second language acquisition in English speech teaching in universities.It aims to strengthen the cultivation of English skilled talents and provide a brief reference for improving English speech teaching in universities.展开更多
A noise reduction method for infrared detector output signal is studied during dynamic calibration of thermocou- pie. Firstly, the deficiency of the classical filter method is analyzed and the application of the wavel...A noise reduction method for infrared detector output signal is studied during dynamic calibration of thermocou- pie. Firstly, the deficiency of the classical filter method is analyzed and the application of the wavelet analysis is introduced for signal de-noising during the dynamic testing. Secondly, the theoretical basis of wavelet analysis, the choice of wavelet base and the determination of decomposed series and threshold are analyzed. Finally, the de-noising experiment for infrared detector signal is carried out on the Matlab platform. The results indicate the proposed wavelet de-noising method is effective to remove fixed frequency and high-frequency noise; furthermore, good synchronization is achieved between the de-noised signal and the useful signal components in the original signal, which is of great significance to thermocouple modeling analys- is.展开更多
Using numerical simulation data of the forward differential propagation shift (ΦDP) of polarimetric radar,the principle and performing steps of noise reduction by wavelet analysis are introduced in detail.Profiting...Using numerical simulation data of the forward differential propagation shift (ΦDP) of polarimetric radar,the principle and performing steps of noise reduction by wavelet analysis are introduced in detail.Profiting from the multiscale analysis,various types of noises can be identified according to their characteristics in different scales,and suppressed in different resolutions by a penalty threshold strategy through which a fixed threshold value is applied,a default threshold strategy through which the threshold value is determined by the noise intensity,or a ΦDP penalty threshold strategy through which a special value is designed for ΦDP de-noising.Then,a hard-or soft-threshold function,depending on the de-noising purpose,is selected to reconstruct the signal.Combining the three noise suppression strategies and the two signal reconstruction functions,and without loss of generality,two schemes are presented to verify the de-noising effect by dbN wavelets:(1) the penalty threshold strategy with the soft threshold function scheme (PSS); (2) the ΦDP penalty threshold strategy with the soft threshold function scheme (PPSS).Furthermore,the wavelet de-noising is compared with the mean,median,Kalman,and finite impulse response (FIR) methods with simulation data and two actual cases.The results suggest that both of the two schemes perform well,especially when ΦDP data are simultaneously polluted by various scales and types of noises.A slight difference is that the PSS method can retain more detail,and the PPSS can smooth the signal more successfully.展开更多
Gyro's drift is not only the main drift error which influences gyro's precision but also the primary factor that affects gyro's reliability. Reducing zero drift and random drift is a key problem to the output of a ...Gyro's drift is not only the main drift error which influences gyro's precision but also the primary factor that affects gyro's reliability. Reducing zero drift and random drift is a key problem to the output of a gyro signal. A three-layer de-nosing threshold algorithm is proposed based on the wavelet decomposition to dispose the signal which is collected from a running fiber optic gyro (FOG). The coefficients are obtained from the three-layer wavelet packet decomposition. By setting the high frequency part which is greater than wavelet packet threshold as zero, then reconstructing the nodes which have been filtered out noise and interruption, the soft threshold function is constructed by the coefficients of the third nodes. Compared wavelet packet de-noise with forced de-noising method, the proposed method is more effective. Simulation results show that the random drift compensation is enhanced by 13.1%, and reduces zero drift by 0.052 6°/h.展开更多
In the present study of peak particle velocity(PPV)and frequency,an improved algorithm(principal empirical mode decomposition,PEMD)based on principal component analysis(PCA)and empirical mode decomposition(EMD)is prop...In the present study of peak particle velocity(PPV)and frequency,an improved algorithm(principal empirical mode decomposition,PEMD)based on principal component analysis(PCA)and empirical mode decomposition(EMD)is proposed,with the goal of addressing poor filtering de-noising effects caused by the occurrences of modal aliasing phenomena in EMD blasting vibration signal decomposition processes.Test results showed that frequency of intrinsic mode function(IMF)components decomposed by PEMD gradually decreases and that the main frequency is unique,which eliminates the phenomenon of modal aliasing.In the simulation experiment,the signal-to-noise(SNR)and root mean square errors(RMSE)ratio of the signal de-noised by PEMD are the largest when compared to EMD and ensemble empirical mode decomposition(EEMD).The main frequency of the de-noising signal through PEMD is 75 Hz,which is closest to the frequency of the noiseless simulation signal.In geotechnical engineering blasting experiments,compared to EMD and EEMD,the signal de-noised by PEMD has the lowest level of distortion,and the frequency band is distributed in a range of 0-64 Hz,which is closest to the frequency band of the blasting vibration signal.In addition,the proportion of noise energy was the lowest,at 1.8%.展开更多
Phase-frequency characte ristics of approximate sinusoidal geomagnetic signals can be used fo r projectile roll positioning and other high-precision trajectory correction applications.The sinusoidal geomagnetic signal...Phase-frequency characte ristics of approximate sinusoidal geomagnetic signals can be used fo r projectile roll positioning and other high-precision trajectory correction applications.The sinusoidal geomagnetic signal deforms in the exposed and magnetically contaminated environment.In order to preciously recognize the roll information and effectively separate the noise component from the original geomagnetic sequence,based on the error source analysis,we propose a moving horizon based wavelet de-noising method for the dual-observed geomagnetic signal filtering where the captured rough roll frequency value provides reasonable wavelet decomposition and reconstruction level selection basis for sampled sequence;a moving horizon window guarantees real-time performance and non-cumulative calculation amount.The complete geomagnetic data in full ballistic range and three intercepted paragraphs are used for performance assessment.The positioning performance of the moving horizon wavelet de-noising method is compared with the band-pass filter.The results show that both noise reduction techniques improve the positioning accuracy while the wavelet de-noising method is always better than the band-pass filter.These results suggest that the proposed moving horizon based wavelet de-noising method of the dual-observed geomagnetic signal is more applicable for various launch conditions with better positioning performance.展开更多
With the rapid development of mechanical equipment,mechanical health monitoring field has entered the era of big data.Deep learning has made a great achievement in the processing of large data of image and speech due ...With the rapid development of mechanical equipment,mechanical health monitoring field has entered the era of big data.Deep learning has made a great achievement in the processing of large data of image and speech due to the powerful modeling capabilities,this also brings influence to the mechanical fault diagnosis field.Therefore,according to the characteristics of motor vibration signals(nonstationary and difficult to deal with)and mechanical‘big data’,combined with deep learning,a motor fault diagnosis method based on stacked de-noising auto-encoder is proposed.The frequency domain signals obtained by the Fourier transform are used as input to the network.This method can extract features adaptively and unsupervised,and get rid of the dependence of traditional machine learning methods on human extraction features.A supervised fine tuning of the model is then carried out by backpropagation.The Asynchronous motor in Drivetrain Dynamics Simulator system was taken as the research object,the effectiveness of the proposed method was verified by a large number of data,and research on visualization of network output,the results shown that the SDAE method is more efficient and more intelligent.展开更多
In view of the problem that noises are prone to be mixed in the signals,an adaptive signal de-noising system based on reursive least squares (RLS) algorithm is introduced.The principle of adaptive filtering and the ...In view of the problem that noises are prone to be mixed in the signals,an adaptive signal de-noising system based on reursive least squares (RLS) algorithm is introduced.The principle of adaptive filtering and the process flow of RLS algorithm are described.Through example simulation,simulation figures of the adaptive de-noising system are obtained.By analysis and comparison,it can be proved that RLS adaptive filtering is capable of eliminating the noises and obtaining useful signals in a relatively good manner.Therefore,the validity of this method and the rationality of this system are demonstrated.展开更多
Based on wavelet transform theory,a method for signal de-noising and singularity detection and elimination is proposed,which can reduce the noises and express local singularity.Each singularity can also be detected an...Based on wavelet transform theory,a method for signal de-noising and singularity detection and elimination is proposed,which can reduce the noises and express local singularity.Each singularity can also be detected and located through the local modulus maxima of wavelet transform.Simulation experiments are conducted with MATLAB software.The experimental results demonstrate that the method proposed in this paper is effective and feasible.展开更多
A novel synthetic aperture radar(SAR)image de-noising method based on the local pixel grouping(LPG)principal component analysis(PCA)and guided filter is proposed.This method contains two steps.In the first step,we pro...A novel synthetic aperture radar(SAR)image de-noising method based on the local pixel grouping(LPG)principal component analysis(PCA)and guided filter is proposed.This method contains two steps.In the first step,we process the noisy image by coarse filters,which can suppress the speckle effectively.The original SAR image is transformed into the additive noise model by logarithmic transform with deviation correction.Then,we use the pixel and its nearest neighbors as a vector to select training samples from the local window by LPG based on the block similar matching.The LPG method ensures that only the similar sample patches are used in the local statistical calculation of PCA transform estimation,so that the local features of the image can be well preserved after coefficients shrinkage in the PCA domain.In the second step,we do the guided filtering which can effectively eliminate small artifacts left over from the coarse filtering.Experimental results of simulated and real SAR images show that the proposed method outstrips the state-of-the-art image de-noising methods in the peak signalto-noise ratio(PSNR),the structural similarity(SSIM)index and the equivalent number of looks(ENLs),and is of perceived image quality.展开更多
文摘Brain magnetic resonance images(MRI)are used to diagnose the different diseases of the brain,such as swelling and tumor detection.The quality of the brain MR images is degraded by different noises,usually salt&pepper and Gaussian noises,which are added to the MR images during the acquisition process.In the presence of these noises,medical experts are facing problems in diagnosing diseases from noisy brain MR images.Therefore,we have proposed a de-noising method by mixing concatenation,and residual deep learning techniques called the MCR de-noising method.Our proposed MCR method is to eliminate salt&pepper and gaussian noises as much as possible from the brain MRI images.The MCR method has been trained and tested on the noise quantity levels 2%to 20%for both salt&pepper and gaussian noise.The experiments have been done on publically available brain MRI image datasets,which can easily be accessible in the experiments and result section.The Structure Similarity Index Measure(SSIM)and Peak Signal-to-Noise Ratio(PSNR)calculate the similarity score between the denoised images by the proposed MCR method and the original clean images.Also,the Mean Squared Error(MSE)measures the error or difference between generated denoised and the original images.The proposed MCR denoising method has a 0.9763 SSIM score,84.3182 PSNR,and 0.0004 MSE for salt&pepper noise;similarly,0.7402 SSIM score,72.7601 PSNR,and 0.0041 MSE for Gaussian noise at the highest level of 20%noise.In the end,we have compared the MCR method with the state-of-the-art de-noising filters such as median and wiener de-noising filters.
基金Supported by the Education Foundation of Anhui Province (No.2002kj003)
文摘A method of single channel speech enhancement is proposed by de-noising using stationary wavelet transform. The approach developed herein processes multi-resolution wavelet coefficients individually and then recovery signal is reconstructed. The time invariant characteristics of stationary wavelet transform is particularly useful in speech de-noising. Experimental results show that the proposed speech enhancement by de-noising algorithm is possible to achieve an excellent balance between suppresses noise effectively and preserves as many target characteristics of original signal as possible. This de-noising algorithm offers a superior performance to speech signal noise suppress.
文摘The problem of speech enhancement using threshold de-noising in wavelet domain was considered.The appropriate decomposition level is another key factor pertinent to de-noising performance.This paper proposed a new wavelet-based de-noising scheme that can improve the enhancement performance significantly in the presence of additive white Gaussian noise.The proposed algorithm can adaptively select the optimal decomposition level of wavelet transformation according to the characteristics of noisy speech.The experimental results demonstrate that this proposed algorithm outperforms the classical wavelet-based de-noising method and effectively improves the practicability of this kind of techniques.
基金This work is part of the research projects LaTe4PoliticES(PID2022-138099OBI00)funded by MICIU/AEI/10.13039/501100011033the European Regional Development Fund(ERDF)-A Way of Making Europe and LT-SWM(TED2021-131167B-I00)funded by MICIU/AEI/10.13039/501100011033the European Union NextGenerationEU/PRTR.Mr.Ronghao Pan is supported by the Programa Investigo grant,funded by the Region of Murcia,the Spanish Ministry of Labour and Social Economy and the European Union-NextGenerationEU under the“Plan de Recuperación,Transformación y Resiliencia(PRTR).”。
文摘Large Language Models(LLMs)are increasingly demonstrating their ability to understand natural language and solve complex tasks,especially through text generation.One of the relevant capabilities is contextual learning,which involves the ability to receive instructions in natural language or task demonstrations to generate expected outputs for test instances without the need for additional training or gradient updates.In recent years,the popularity of social networking has provided a medium through which some users can engage in offensive and harmful online behavior.In this study,we investigate the ability of different LLMs,ranging from zero-shot and few-shot learning to fine-tuning.Our experiments show that LLMs can identify sexist and hateful online texts using zero-shot and few-shot approaches through information retrieval.Furthermore,it is found that the encoder-decoder model called Zephyr achieves the best results with the fine-tuning approach,scoring 86.811%on the Explainable Detection of Online Sexism(EDOS)test-set and 57.453%on the Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter(HatEval)test-set.Finally,it is confirmed that the evaluated models perform well in hate text detection,as they beat the best result in the HatEval task leaderboard.The error analysis shows that contextual learning had difficulty distinguishing between types of hate speech and figurative language.However,the fine-tuned approach tends to produce many false positives.
文摘Speech emotion recognition(SER)uses acoustic analysis to find features for emotion recognition and examines variations in voice that are caused by emotions.The number of features acquired with acoustic analysis is extremely high,so we introduce a hybrid filter-wrapper feature selection algorithm based on an improved equilibrium optimizer for constructing an emotion recognition system.The proposed algorithm implements multi-objective emotion recognition with the minimum number of selected features and maximum accuracy.First,we use the information gain and Fisher Score to sort the features extracted from signals.Then,we employ a multi-objective ranking method to evaluate these features and assign different importance to them.Features with high rankings have a large probability of being selected.Finally,we propose a repair strategy to address the problem of duplicate solutions in multi-objective feature selection,which can improve the diversity of solutions and avoid falling into local traps.Using random forest and K-nearest neighbor classifiers,four English speech emotion datasets are employed to test the proposed algorithm(MBEO)as well as other multi-objective emotion identification techniques.The results illustrate that it performs well in inverted generational distance,hypervolume,Pareto solutions,and execution time,and MBEO is appropriate for high-dimensional English SER.
文摘Detecting hate speech automatically in social media forensics has emerged as a highly challenging task due tothe complex nature of language used in such platforms. Currently, several methods exist for classifying hatespeech, but they still suffer from ambiguity when differentiating between hateful and offensive content and theyalso lack accuracy. The work suggested in this paper uses a combination of the Whale Optimization Algorithm(WOA) and Particle Swarm Optimization (PSO) to adjust the weights of two Multi-Layer Perceptron (MLPs)for neutrosophic sets classification. During the training process of the MLP, the WOA is employed to exploreand determine the optimal set of weights. The PSO algorithm adjusts the weights to optimize the performanceof the MLP as fine-tuning. Additionally, in this approach, two separate MLP models are employed. One MLPis dedicated to predicting degrees of truth membership, while the other MLP focuses on predicting degrees offalse membership. The difference between these memberships quantifies uncertainty, indicating the degree ofindeterminacy in predictions. The experimental results indicate the superior performance of our model comparedto previous work when evaluated on the Davidson dataset.
文摘Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotional states of speakers holds significant importance in a range of real-time applications,including but not limited to virtual reality,human-robot interaction,emergency centers,and human behavior assessment.Accurately identifying emotions in the SER process relies on extracting relevant information from audio inputs.Previous studies on SER have predominantly utilized short-time characteristics such as Mel Frequency Cepstral Coefficients(MFCCs)due to their ability to capture the periodic nature of audio signals effectively.Although these traits may improve their ability to perceive and interpret emotional depictions appropriately,MFCCS has some limitations.So this study aims to tackle the aforementioned issue by systematically picking multiple audio cues,enhancing the classifier model’s efficacy in accurately discerning human emotions.The utilized dataset is taken from the EMO-DB database,preprocessing input speech is done using a 2D Convolution Neural Network(CNN)involves applying convolutional operations to spectrograms as they afford a visual representation of the way the audio signal frequency content changes over time.The next step is the spectrogram data normalization which is crucial for Neural Network(NN)training as it aids in faster convergence.Then the five auditory features MFCCs,Chroma,Mel-Spectrogram,Contrast,and Tonnetz are extracted from the spectrogram sequentially.The attitude of feature selection is to retain only dominant features by excluding the irrelevant ones.In this paper,the Sequential Forward Selection(SFS)and Sequential Backward Selection(SBS)techniques were employed for multiple audio cues features selection.Finally,the feature sets composed from the hybrid feature extraction methods are fed into the deep Bidirectional Long Short Term Memory(Bi-LSTM)network to discern emotions.Since the deep Bi-LSTM can hierarchically learn complex features and increases model capacity by achieving more robust temporal modeling,it is more effective than a shallow Bi-LSTM in capturing the intricate tones of emotional content existent in speech signals.The effectiveness and resilience of the proposed SER model were evaluated by experiments,comparing it to state-of-the-art SER techniques.The results indicated that the model achieved accuracy rates of 90.92%,93%,and 92%over the Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS),Berlin Database of Emotional Speech(EMO-DB),and The Interactive Emotional Dyadic Motion Capture(IEMOCAP)datasets,respectively.These findings signify a prominent enhancement in the ability to emotional depictions identification in speech,showcasing the potential of the proposed model in advancing the SER field.
基金This research was funded by Shenzhen Science and Technology Program(Grant No.RCBS20221008093121051)the General Higher Education Project of Guangdong Provincial Education Department(Grant No.2020ZDZX3085)+1 种基金China Postdoctoral Science Foundation(Grant No.2021M703371)the Post-Doctoral Foundation Project of Shenzhen Polytechnic(Grant No.6021330002K).
文摘In air traffic control communications (ATCC), misunderstandings between pilots and controllers could result in fatal aviation accidents. Fortunately, advanced automatic speech recognition technology has emerged as a promising means of preventing miscommunications and enhancing aviation safety. However, most existing speech recognition methods merely incorporate external language models on the decoder side, leading to insufficient semantic alignment between speech and text modalities during the encoding phase. Furthermore, it is challenging to model acoustic context dependencies over long distances due to the longer speech sequences than text, especially for the extended ATCC data. To address these issues, we propose a speech-text multimodal dual-tower architecture for speech recognition. It employs cross-modal interactions to achieve close semantic alignment during the encoding stage and strengthen its capabilities in modeling auditory long-distance context dependencies. In addition, a two-stage training strategy is elaborately devised to derive semantics-aware acoustic representations effectively. The first stage focuses on pre-training the speech-text multimodal encoding module to enhance inter-modal semantic alignment and aural long-distance context dependencies. The second stage fine-tunes the entire network to bridge the input modality variation gap between the training and inference phases and boost generalization performance. Extensive experiments demonstrate the effectiveness of the proposed speech-text multimodal speech recognition method on the ATCC and AISHELL-1 datasets. It reduces the character error rate to 6.54% and 8.73%, respectively, and exhibits substantial performance gains of 28.76% and 23.82% compared with the best baseline model. The case studies indicate that the obtained semantics-aware acoustic representations aid in accurately recognizing terms with similar pronunciations but distinctive semantics. The research provides a novel modeling paradigm for semantics-aware speech recognition in air traffic control communications, which could contribute to the advancement of intelligent and efficient aviation safety management.
文摘Reporting is essential in language use,including the re-expression of other people’s or self’s words,opinions,psychological activities,etc.Grasping the translation methods of reported speech in German academic papers is very important to improve the accuracy of academic paper translation.This study takes the translation of“Internationalization of German Universities”(Die Internationalisierung der deutschen Hochschulen),an academic paper of higher education,as an example to explore the translation methods of reported speech in German academic papers.It is found that the use of word order conversion,part of speech conversion and split translation methods can make the translation more accurate and fluent.This paper helps to grasp the rules and characteristics of the translation of reported speech in German academic papers,and also provides a reference for improving the quality of German-Chinese translation.
基金Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2024R263)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.This study is supported via funding from Prince Sattam bin Abdulaziz University Project Number(PSAU/2024/R/1445).
文摘In recent years,the usage of social networking sites has considerably increased in the Arab world.It has empowered individuals to express their opinions,especially in politics.Furthermore,various organizations that operate in the Arab countries have embraced social media in their day-to-day business activities at different scales.This is attributed to business owners’understanding of social media’s importance for business development.However,the Arabic morphology is too complicated to understand due to the availability of nearly 10,000 roots and more than 900 patterns that act as the basis for verbs and nouns.Hate speech over online social networking sites turns out to be a worldwide issue that reduces the cohesion of civil societies.In this background,the current study develops a Chaotic Elephant Herd Optimization with Machine Learning for Hate Speech Detection(CEHOML-HSD)model in the context of the Arabic language.The presented CEHOML-HSD model majorly concentrates on identifying and categorising the Arabic text into hate speech and normal.To attain this,the CEHOML-HSD model follows different sub-processes as discussed herewith.At the initial stage,the CEHOML-HSD model undergoes data pre-processing with the help of the TF-IDF vectorizer.Secondly,the Support Vector Machine(SVM)model is utilized to detect and classify the hate speech texts made in the Arabic language.Lastly,the CEHO approach is employed for fine-tuning the parameters involved in SVM.This CEHO approach is developed by combining the chaotic functions with the classical EHO algorithm.The design of the CEHO algorithm for parameter tuning shows the novelty of the work.A widespread experimental analysis was executed to validate the enhanced performance of the proposed CEHOML-HSD approach.The comparative study outcomes established the supremacy of the proposed CEHOML-HSD model over other approaches.
文摘The teaching of English speeches in universities aims to enhance oral communication ability,improve English communication skills,and expand English knowledge,occupying a core position in English teaching in universities.This article takes the theory of second language acquisition as the background,analyzes the important role and value of this theory in English speech teaching in universities,and explores how to apply the theory of second language acquisition in English speech teaching in universities.It aims to strengthen the cultivation of English skilled talents and provide a brief reference for improving English speech teaching in universities.
文摘A noise reduction method for infrared detector output signal is studied during dynamic calibration of thermocou- pie. Firstly, the deficiency of the classical filter method is analyzed and the application of the wavelet analysis is introduced for signal de-noising during the dynamic testing. Secondly, the theoretical basis of wavelet analysis, the choice of wavelet base and the determination of decomposed series and threshold are analyzed. Finally, the de-noising experiment for infrared detector signal is carried out on the Matlab platform. The results indicate the proposed wavelet de-noising method is effective to remove fixed frequency and high-frequency noise; furthermore, good synchronization is achieved between the de-noised signal and the useful signal components in the original signal, which is of great significance to thermocouple modeling analys- is.
基金funded by National Natural Science Foundation of China (Grant No. 41375038)China Meteorological Administration Special Public Welfare Research Fund (Grant No. GYHY201306040,GYHY201306075)
文摘Using numerical simulation data of the forward differential propagation shift (ΦDP) of polarimetric radar,the principle and performing steps of noise reduction by wavelet analysis are introduced in detail.Profiting from the multiscale analysis,various types of noises can be identified according to their characteristics in different scales,and suppressed in different resolutions by a penalty threshold strategy through which a fixed threshold value is applied,a default threshold strategy through which the threshold value is determined by the noise intensity,or a ΦDP penalty threshold strategy through which a special value is designed for ΦDP de-noising.Then,a hard-or soft-threshold function,depending on the de-noising purpose,is selected to reconstruct the signal.Combining the three noise suppression strategies and the two signal reconstruction functions,and without loss of generality,two schemes are presented to verify the de-noising effect by dbN wavelets:(1) the penalty threshold strategy with the soft threshold function scheme (PSS); (2) the ΦDP penalty threshold strategy with the soft threshold function scheme (PPSS).Furthermore,the wavelet de-noising is compared with the mean,median,Kalman,and finite impulse response (FIR) methods with simulation data and two actual cases.The results suggest that both of the two schemes perform well,especially when ΦDP data are simultaneously polluted by various scales and types of noises.A slight difference is that the PSS method can retain more detail,and the PPSS can smooth the signal more successfully.
文摘Gyro's drift is not only the main drift error which influences gyro's precision but also the primary factor that affects gyro's reliability. Reducing zero drift and random drift is a key problem to the output of a gyro signal. A three-layer de-nosing threshold algorithm is proposed based on the wavelet decomposition to dispose the signal which is collected from a running fiber optic gyro (FOG). The coefficients are obtained from the three-layer wavelet packet decomposition. By setting the high frequency part which is greater than wavelet packet threshold as zero, then reconstructing the nodes which have been filtered out noise and interruption, the soft threshold function is constructed by the coefficients of the third nodes. Compared wavelet packet de-noise with forced de-noising method, the proposed method is more effective. Simulation results show that the random drift compensation is enhanced by 13.1%, and reduces zero drift by 0.052 6°/h.
基金National Natural Science Foundation of China under Grant Nos.52064015 and 51404111Jiangxi Provincial Natural Science Foundation under Grant No.20192BAB206017+1 种基金Scientific Research Project of Jiangxi Provincial Education Department under Grant No.GJJ160643the Program of Qingjiang Excellent Young Talents,Jiangxi University of Science and Technology under Grant No.JXUSTQJYX2016007。
文摘In the present study of peak particle velocity(PPV)and frequency,an improved algorithm(principal empirical mode decomposition,PEMD)based on principal component analysis(PCA)and empirical mode decomposition(EMD)is proposed,with the goal of addressing poor filtering de-noising effects caused by the occurrences of modal aliasing phenomena in EMD blasting vibration signal decomposition processes.Test results showed that frequency of intrinsic mode function(IMF)components decomposed by PEMD gradually decreases and that the main frequency is unique,which eliminates the phenomenon of modal aliasing.In the simulation experiment,the signal-to-noise(SNR)and root mean square errors(RMSE)ratio of the signal de-noised by PEMD are the largest when compared to EMD and ensemble empirical mode decomposition(EEMD).The main frequency of the de-noising signal through PEMD is 75 Hz,which is closest to the frequency of the noiseless simulation signal.In geotechnical engineering blasting experiments,compared to EMD and EEMD,the signal de-noised by PEMD has the lowest level of distortion,and the frequency band is distributed in a range of 0-64 Hz,which is closest to the frequency band of the blasting vibration signal.In addition,the proportion of noise energy was the lowest,at 1.8%.
基金funded by National Natural Science Foundation of China(61201391)。
文摘Phase-frequency characte ristics of approximate sinusoidal geomagnetic signals can be used fo r projectile roll positioning and other high-precision trajectory correction applications.The sinusoidal geomagnetic signal deforms in the exposed and magnetically contaminated environment.In order to preciously recognize the roll information and effectively separate the noise component from the original geomagnetic sequence,based on the error source analysis,we propose a moving horizon based wavelet de-noising method for the dual-observed geomagnetic signal filtering where the captured rough roll frequency value provides reasonable wavelet decomposition and reconstruction level selection basis for sampled sequence;a moving horizon window guarantees real-time performance and non-cumulative calculation amount.The complete geomagnetic data in full ballistic range and three intercepted paragraphs are used for performance assessment.The positioning performance of the moving horizon wavelet de-noising method is compared with the band-pass filter.The results show that both noise reduction techniques improve the positioning accuracy while the wavelet de-noising method is always better than the band-pass filter.These results suggest that the proposed moving horizon based wavelet de-noising method of the dual-observed geomagnetic signal is more applicable for various launch conditions with better positioning performance.
基金This research is supported financially by Natural Science Foundation of China(Grant No.51505234,51405241,51575283).
文摘With the rapid development of mechanical equipment,mechanical health monitoring field has entered the era of big data.Deep learning has made a great achievement in the processing of large data of image and speech due to the powerful modeling capabilities,this also brings influence to the mechanical fault diagnosis field.Therefore,according to the characteristics of motor vibration signals(nonstationary and difficult to deal with)and mechanical‘big data’,combined with deep learning,a motor fault diagnosis method based on stacked de-noising auto-encoder is proposed.The frequency domain signals obtained by the Fourier transform are used as input to the network.This method can extract features adaptively and unsupervised,and get rid of the dependence of traditional machine learning methods on human extraction features.A supervised fine tuning of the model is then carried out by backpropagation.The Asynchronous motor in Drivetrain Dynamics Simulator system was taken as the research object,the effectiveness of the proposed method was verified by a large number of data,and research on visualization of network output,the results shown that the SDAE method is more efficient and more intelligent.
基金The Key Program of National Natural Science of China(No.U1261205)Shandong University of Science and Technology Research Fund(No.2010KYTD101)
文摘In view of the problem that noises are prone to be mixed in the signals,an adaptive signal de-noising system based on reursive least squares (RLS) algorithm is introduced.The principle of adaptive filtering and the process flow of RLS algorithm are described.Through example simulation,simulation figures of the adaptive de-noising system are obtained.By analysis and comparison,it can be proved that RLS adaptive filtering is capable of eliminating the noises and obtaining useful signals in a relatively good manner.Therefore,the validity of this method and the rationality of this system are demonstrated.
文摘Based on wavelet transform theory,a method for signal de-noising and singularity detection and elimination is proposed,which can reduce the noises and express local singularity.Each singularity can also be detected and located through the local modulus maxima of wavelet transform.Simulation experiments are conducted with MATLAB software.The experimental results demonstrate that the method proposed in this paper is effective and feasible.
基金supported by the National Natural Science Foundation of China(6200220861572063+1 种基金61603225)the Natural Science Foundation of Shandong Province(ZR2016FQ04)。
文摘A novel synthetic aperture radar(SAR)image de-noising method based on the local pixel grouping(LPG)principal component analysis(PCA)and guided filter is proposed.This method contains two steps.In the first step,we process the noisy image by coarse filters,which can suppress the speckle effectively.The original SAR image is transformed into the additive noise model by logarithmic transform with deviation correction.Then,we use the pixel and its nearest neighbors as a vector to select training samples from the local window by LPG based on the block similar matching.The LPG method ensures that only the similar sample patches are used in the local statistical calculation of PCA transform estimation,so that the local features of the image can be well preserved after coefficients shrinkage in the PCA domain.In the second step,we do the guided filtering which can effectively eliminate small artifacts left over from the coarse filtering.Experimental results of simulated and real SAR images show that the proposed method outstrips the state-of-the-art image de-noising methods in the peak signalto-noise ratio(PSNR),the structural similarity(SSIM)index and the equivalent number of looks(ENLs),and is of perceived image quality.