期刊文献+
共找到9篇文章
< 1 >
每页显示 20 50 100
Comprehensive Analysis of Gender Classification Accuracy across Varied Geographic Regions through the Application of Deep Learning Algorithms to Speech Signals
1
作者 Abhishek Singhal Devendra Kumar Sharma 《Computer Systems Science & Engineering》 2024年第3期609-625,共17页
This article presents an exhaustive comparative investigation into the accuracy of gender identification across diverse geographical regions,employing a deep learning classification algorithm for speech signal analysi... This article presents an exhaustive comparative investigation into the accuracy of gender identification across diverse geographical regions,employing a deep learning classification algorithm for speech signal analysis.In this study,speech samples are categorized for both training and testing purposes based on their geographical origin.Category 1 comprises speech samples from speakers outside of India,whereas Category 2 comprises live-recorded speech samples from Indian speakers.Testing speech samples are likewise classified into four distinct sets,taking into consideration both geographical origin and the language spoken by the speakers.Significantly,the results indicate a noticeable difference in gender identification accuracy among speakers from different geographical areas.Indian speakers,utilizing 52 Hindi and 26 English phonemes in their speech,demonstrate a notably higher gender identification accuracy of 85.75%compared to those speakers who predominantly use 26 English phonemes in their conversations when the system is trained using speech samples from Indian speakers.The gender identification accuracy of the proposed model reaches 83.20%when the system is trained using speech samples from speakers outside of India.In the analysis of speech signals,Mel Frequency Cepstral Coefficients(MFCCs)serve as relevant features for the speech data.The deep learning classification algorithm utilized in this research is based on a Bidirectional Long Short-Term Memory(BiLSTM)architecture within a Recurrent Neural Network(RNN)model. 展开更多
关键词 Deep learning recurrent neural network voice signal mel frequency cepstral coefficients geographical area GENDER
下载PDF
Research on blind source separation of operation sounds of metro power transformer through an Adaptive Threshold REPET algorithm
2
作者 Liang Chen Liyi Xiong +2 位作者 Fang Zhao Yanfei Ju An Jin 《Railway Sciences》 2024年第5期609-621,共13页
Purpose–The safe operation of the metro power transformer directly relates to the safety and efficiency of the entire metro system.Through voiceprint technology,the sounds emitted by the transformer can be monitored ... Purpose–The safe operation of the metro power transformer directly relates to the safety and efficiency of the entire metro system.Through voiceprint technology,the sounds emitted by the transformer can be monitored in real-time,thereby achieving real-time monitoring of the transformer’s operational status.However,the environment surrounding power transformers is filled with various interfering sounds that intertwine with both the normal operational voiceprints and faulty voiceprints of the transformer,severely impacting the accuracy and reliability of voiceprint identification.Therefore,effective preprocessing steps are required to identify and separate the sound signals of transformer operation,which is a prerequisite for subsequent analysis.Design/methodology/approach–This paper proposes an Adaptive Threshold Repeating Pattern Extraction Technique(REPET)algorithm to separate and denoise the transformer operation sound signals.By analyzing the Short-Time Fourier Transform(STFT)amplitude spectrum,the algorithm identifies and utilizes the repeating periodic structures within the signal to automatically adjust the threshold,effectively distinguishing and extracting stable background signals from transient foreground events.The REPET algorithm first calculates the autocorrelation matrix of the signal to determine the repeating period,then constructs a repeating segment model.Through comparison with the amplitude spectrum of the original signal,repeating patterns are extracted and a soft time-frequency mask is generated.Findings–After adaptive thresholding processing,the target signal is separated.Experiments conducted on mixed sounds to separate background sounds from foreground sounds using this algorithm and comparing the results with those obtained using the FastICA algorithm demonstrate that the Adaptive Threshold REPET method achieves good separation effects.Originality/value–A REPET method with adaptive threshold is proposed,which adopts the dynamic threshold adjustment mechanism,adaptively calculates the threshold for blind source separation and improves the adaptability and robustness of the algorithm to the statistical characteristics of the signal.It also lays the foundation for transformer fault detection based on acoustic fingerprinting. 展开更多
关键词 TRANSFORMER Voiceprint recognition Blind source separation mel frequency cepstral coefficients(MFCC) Adaptive threshold
下载PDF
Extraction of novel features for emotion recognition
3
作者 李翔 郑宇 李昕 《Journal of Shanghai University(English Edition)》 CAS 2011年第5期479-486,共8页
Hilbert-Huang transform method has been widely utilized from its inception because of the superiority in varieties of areas. The Hilbert spectrum thus obtained is able to reflect the distribution of the signal energy ... Hilbert-Huang transform method has been widely utilized from its inception because of the superiority in varieties of areas. The Hilbert spectrum thus obtained is able to reflect the distribution of the signal energy in a number of scales accurately. In this paper, a novel feature called ECC is proposed via feature extraction of the Hilbert energy spectrum which describes the distribution of the instantaneous energy. The experimental results conspicuously demonstrate that ECC outperforms the traditional short-term average energy. Combination of the ECC with mel frequency cepstral coefficients (MFCC) delineates the distribution of energy in the time domain and frequency domain, and the features of this group achieve a better recognition effect compared with the feature combination of the short-term average energy, pitch and MFCC. Afterwards, further improvements of ECC are developed. TECC is gained by combining ECC with the teager energy operator, and EFCC is obtained by introducing the instantaneous frequency to the energy. In the experiments, seven status of emotion are selected to be recognized and the highest recognition rate 83.57% is achieved within the classification accuracy of boredom reaching 100%. The numerical results indicate that the proposed features ECC, TECC and EFCC can improve the performance of speech emotion recognition substantially. 展开更多
关键词 emotion recognition mel frequency cepstral coefficients (MFCC) feature extraction
下载PDF
Autonomous Surveillance of Infants’ Needs Using CNN Model for Audio Cry Classification
4
作者 Geofrey Owino Anthony Waititu +1 位作者 Anthony Wanjoya John Okwiri 《Journal of Data Analysis and Information Processing》 2022年第4期198-219,共22页
Infants portray suggestive unique cries while sick, having belly pain, discomfort, tiredness, attention and desire for a change of diapers among other needs. There exists limited knowledge in accessing the infants’ n... Infants portray suggestive unique cries while sick, having belly pain, discomfort, tiredness, attention and desire for a change of diapers among other needs. There exists limited knowledge in accessing the infants’ needs as they only relay information through suggestive cries. Many teenagers tend to give birth at an early age, thereby exposing them to be the key monitors of their own babies. They tend not to have sufficient skills in monitoring the infant’s dire needs, more so during the early stages of infant development. Artificial intelligence has shown promising efficient predictive analytics from supervised, and unsupervised to reinforcement learning models. This study, therefore, seeks to develop an android app that could be used to discriminate the infant audio cries by leveraging the strength of convolution neural networks as a classifier model. Audio analytics from many kinds of literature is an untapped area by researchers as it’s attributed to messy and huge data generation. This study, therefore, strongly leverages convolution neural networks, a deep learning model that is capable of handling more than one-dimensional datasets. To achieve this, the audio data in form of a wave was converted to images through Mel spectrum frequencies which were classified using the computer vision CNN model. The Librosa library was used to convert the audio to Mel spectrum which was then presented as pixels serving as the input for classifying the audio classes such as sick, burping, tired, and hungry. The study goal was to incorporate the model as an android tool that can be utilized at the domestic level and hospital facilities for surveillance of the infant’s health and social needs status all time round. 展开更多
关键词 Convolutional Neural Network (CNN) mel frequency cepstral coefficients (MFCCs) Rectified Linear Unit (ReLU) Activation Function Audio Analytics Deep Neural Network (DNN)
下载PDF
An Approach to Speech Emotion Classification Using k-NN and SVMs
5
作者 Disne SIVALINGAM 《Instrumentation》 2021年第3期36-45,共10页
The interaction between humans and machines has become an issue of concern in recent years.Besides facial expressions or gestures,speech has been evidenced as one of the foremost promising modalities for automatic emo... The interaction between humans and machines has become an issue of concern in recent years.Besides facial expressions or gestures,speech has been evidenced as one of the foremost promising modalities for automatic emotion recognition.Effective computing means to support HCI(Human-Computer Interaction)at a psychological level,allowing PCs to adjust their reactions as per human requirements.Therefore,the recognition of emotion is pivotal in High-level interactions.Each Emotion has distinctive properties that form us to recognize them.The acoustic signal produced for identical expression or sentence changes is essentially a direct result of biophysical changes,(for example,the stress instigated narrowing of the larynx)set off by emotions.This connection between acoustic cues and emotions made Speech Emotion Recognition one of the moving subjects of the emotive computing area.The most motivation behind a Speech Emotion Recognition algorithm is to observe the emotional condition of a speaker from recorded Speech signals.The results from the application of k-NN and OVA-SVM for MFCC features without and with a feature selection approach are presented in this research.The MFCC features from the audio signal were initially extracted to characterize the properties of emotional speech.Secondly,nine basic statistical measures were calculated from MFCC and 117-dimensional features were consequently obtained to train the classifiers for seven different classes(Anger,Happiness,Disgust,Fear,Sadness,Disgust,Boredom and Neutral)of emotions.Next,Classification was done in four steps.First,all the 117-features are classified using both classifiers.Second,the best classifier was found and then features were scaled to[-1,1]and classified.In the third step,the with or without feature scaling which gives better performance was derived from the results of the second step and the classification was done for each of the basic statistical measures separately.Finally,in the fourth step,the combination of statistical measures which gives better performance was derived using the forward feature selection method Experiments were carried out using k-NN with different k values and a linear OVA-based SVM classifier with different optimal values.Berlin emotional speech database for the German language was utilized for testing the planned methodology and recognition rates as high as 60%accomplished for the recognition of emotion from voice signal for the set of statistical measures(median,maximum,mean,Inter-quartile range,skewness).OVA-SVM performs better than k-NN and the use of the feature selection technique gives a high rate. 展开更多
关键词 mel frequency cepstral coefficients(MFCC) Fast Fourier Transformation(FFT) Discrete Cosine Transformation(DCT) k Nearest Neighbors(k-NN) Support Vector Machine(SVM) One-Vs-All(OVA)
下载PDF
Real Time Speech Based Integrated Development Environment for C Program
6
作者 Bharathi Bhagavathsingh Kavitha Srinivasan Mariappan Natrajan 《Circuits and Systems》 2016年第3期69-82,共14页
This Automatic Speech Recognition (ASR) is the process which converts an acoustic signal captured by the microphone to written text. The motivation of the paper is to create a speech based Integrated Development Envir... This Automatic Speech Recognition (ASR) is the process which converts an acoustic signal captured by the microphone to written text. The motivation of the paper is to create a speech based Integrated Development Environment (IDE) for C program. This paper proposes a technique to facilitate the visually impaired people or the person with arm injuries with excellent programming skills that can code the C program through voice input. The proposed system accepts the C program as voice input and produces compiled C program as output. The user should utter each line of the C program through voice input. First the voice input is recognized as text. The recognized text will be converted into C program by using syntactic constructs of the C language. After conversion, C program will be fetched as input to the IDE. Furthermore, the IDE commands like open, save, close, compile, run are also given through voice input only. If any error occurs during the compilation process, the error is corrected through voice input only. The errors can be corrected by specifying the line number through voice input. Performance of the speech recognition system is analyzed by varying the vocabulary size as well as number of mixture components in HMM. 展开更多
关键词 Automatic Speech Recognition Integrated Development Environment Hidden Markov Model mel frequency cepstral coefficients
下载PDF
Multi-Factor Authentication for Secured Financial Transactions in Cloud Environment
7
作者 D.Prabakaran Shyamala Ramachandran 《Computers, Materials & Continua》 SCIE EI 2022年第1期1781-1798,共18页
The rise of the digital economy and the comfort of accessing by way of user mobile devices expedite human endeavors in financial transactions over the Virtual Private Network(VPN)backbone.This prominent application of... The rise of the digital economy and the comfort of accessing by way of user mobile devices expedite human endeavors in financial transactions over the Virtual Private Network(VPN)backbone.This prominent application of VPN evades the hurdles involved in physical money exchange.The VPN acts as a gateway for the authorized user in accessing the banking server to provide mutual authentication between the user and the server.The security in the cloud authentication server remains vulnerable to the results of threat in JP Morgan Data breach in 2014,Capital One Data Breach in 2019,and manymore cloud server attacks over and over again.These attacks necessitate the demand for a strong framework for authentication to secure from any class of threat.This research paper,propose a framework with a base of EllipticalCurve Cryptography(ECC)to performsecure financial transactions throughVirtual PrivateNetwork(VPN)by implementing strongMulti-Factor Authentication(MFA)using authentication credentials and biometric identity.The research results prove that the proposed model is to be an ideal scheme for real-time implementation.The security analysis reports that the proposed model exhibits high level of security with a minimal response time of 12 s on an average of 1000 users. 展开更多
关键词 Cloud computing elliptical curve cryptography multi-factor authentication mel frequency cepstral coefficient privacy protection secured framework secure financial transactions
下载PDF
Comparison of Khasi Speech Representations with Different Spectral Features and Hidden Markov States
8
作者 Bronson Syiem Sushanta Kabir Dutta +1 位作者 Juwesh Binong Lairenlakpam Joyprakash Singh 《Journal of Electronic Science and Technology》 CAS CSCD 2021年第2期155-162,共8页
In this paper,we present a comparison of Khasi speech representations with four different spectral features and novel extension towards the development of Khasi speech corpora.These four features include linear predic... In this paper,we present a comparison of Khasi speech representations with four different spectral features and novel extension towards the development of Khasi speech corpora.These four features include linear predictive coding(LPC),linear prediction cepstrum coefficient(LPCC),perceptual linear prediction(PLP),and Mel frequency cepstral coefficient(MFCC).The 10-hour speech data were used for training and 3-hour data for testing.For each spectral feature,different hidden Markov model(HMM)based recognizers with variations in HMM states and different Gaussian mixture models(GMMs)were built.The performance was evaluated by using the word error rate(WER).The experimental results show that MFCC provides a better representation for Khasi speech compared with the other three spectral features. 展开更多
关键词 Acoustic model(AM) Gaussian mixture model(GMM) hidden Markov model(HMM) language model(LM) linear predictive coding(LPC) linear prediction cepstral coefficient(LPCC) mel frequency cepstral coefficient(MFCC) perceptual linear prediction(PLP)
下载PDF
An Efficient Approach for Segmentation, Feature Extraction and Classification of Audio Signals
9
作者 Muthumari Arumugam Mala Kaliappan 《Circuits and Systems》 2016年第4期255-279,共25页
Due to the presence of non-stationarities and discontinuities in the audio signal, segmentation and classification of audio signal is a really challenging task. Automatic music classification and annotation is still c... Due to the presence of non-stationarities and discontinuities in the audio signal, segmentation and classification of audio signal is a really challenging task. Automatic music classification and annotation is still considered as a challenging task due to the difficulty of extracting and selecting the optimal audio features. Hence, this paper proposes an efficient approach for segmentation, feature extraction and classification of audio signals. Enhanced Mel Frequency Cepstral Coefficient (EMFCC)-Enhanced Power Normalized Cepstral Coefficients (EPNCC) based feature extraction is applied for the extraction of features from the audio signal. Then, multi-level classification is done to classify the audio signal as a musical or non-musical signal. The proposed approach achieves better performance in terms of precision, Normalized Mutual Information (NMI), F-score and entropy. The PNN classifier shows high False Rejection Rate (FRR), False Acceptance Rate (FAR), Genuine Acceptance rate (GAR), sensitivity, specificity and accuracy with respect to the number of classes. 展开更多
关键词 Audio Signal Enhanced mel frequency cepstral Coefficient (EMFCC) Enhanced Power Normalized cepstral coefficients (EPNCC) Probabilistic Neural Network (PNN) Classifier
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部