Smoothed cepstral peak prominence(CPPs)is a measurement of the distance from the prominent cepstral peak to the linear regression line directly beneath it.Variations of CPPs data acquisition and analysis lead to the c...Smoothed cepstral peak prominence(CPPs)is a measurement of the distance from the prominent cepstral peak to the linear regression line directly beneath it.Variations of CPPs data acquisition and analysis lead to the complexity of the clinical cut-off values,and there are no agreeable values for a specific voice disorder,such as hypokinetic dysarthria associated with Parkinson’s disease(PD).This study examined the CPPs in people with hypokinetic dysarthria associated with PD compared with healthy participants.Results demonstrated significant differences in speech tasks of sustained vowel and connected speech,with CPPs of connected speech more sensitive to dysphonia and gender difference in PD participants.Males in PD participants presented higher CPPs for sustained vowels and lower CPPs for connected speech than females.It is implied that a consistent clinical application protocol is necessary,and multiple acoustic measures are needed to ensure the accuracy of clinical decisions.展开更多
To support the rapid automatic services composition and fulfill multi-quality of service (multi- QoS) demand, we propose a novel approach to realize services composition automatically by a prefihering process. Aimed...To support the rapid automatic services composition and fulfill multi-quality of service (multi- QoS) demand, we propose a novel approach to realize services composition automatically by a prefihering process. Aimed at a set of web services with similar functionality and different quality of service (QoS) , a semantic services chain is given and a corresponding constructing algorithm is proposed to construct the data structure. A pre-filtering process is put forward to find whether a composition service before planning exists. It can avoid aborted planning. An optimal planning algorithm is proposed which can choose the most suitable service from a lot of similar candidate services based on semantic service chains and multi-QoS values. The algorithms can improve the correctness and automation performances of automated semantic web services composition. As an example, a concrete composite process is analyzed. Experimental results show the validity of the composite process.展开更多
The hidden danger of the automatic speaker verification(ASV)system is various spoofed speeches.These threats can be classified into two categories,namely logical access(LA)and physical access(PA).To improve identifica...The hidden danger of the automatic speaker verification(ASV)system is various spoofed speeches.These threats can be classified into two categories,namely logical access(LA)and physical access(PA).To improve identification capability of spoofed speech detection,this paper considers the research on features.Firstly,following the idea of modifying the constant-Q-based features,this work considered adding variance or mean to the constant-Q-based cepstral domain to obtain good performance.Secondly,linear frequency cepstral coefficients(LFCCs)performed comparably with constant-Q-based features.Finally,we proposed linear frequency variance-based cepstral coefficients(LVCCs)and linear frequency mean-based cepstral coefficients(LMCCs)for identification of speech spoofing.LVCCs and LMCCs could be attained by adding the frame variance or the mean to the log magnitude spectrum based on LFCC features.The proposed novel features were evaluated on ASVspoof 2019 datase.The experimental results show that compared with known hand-crafted features,LVCCs and LMCCs are more effective in resisting spoofed speech attack.展开更多
This article presents an exhaustive comparative investigation into the accuracy of gender identification across diverse geographical regions,employing a deep learning classification algorithm for speech signal analysi...This article presents an exhaustive comparative investigation into the accuracy of gender identification across diverse geographical regions,employing a deep learning classification algorithm for speech signal analysis.In this study,speech samples are categorized for both training and testing purposes based on their geographical origin.Category 1 comprises speech samples from speakers outside of India,whereas Category 2 comprises live-recorded speech samples from Indian speakers.Testing speech samples are likewise classified into four distinct sets,taking into consideration both geographical origin and the language spoken by the speakers.Significantly,the results indicate a noticeable difference in gender identification accuracy among speakers from different geographical areas.Indian speakers,utilizing 52 Hindi and 26 English phonemes in their speech,demonstrate a notably higher gender identification accuracy of 85.75%compared to those speakers who predominantly use 26 English phonemes in their conversations when the system is trained using speech samples from Indian speakers.The gender identification accuracy of the proposed model reaches 83.20%when the system is trained using speech samples from speakers outside of India.In the analysis of speech signals,Mel Frequency Cepstral Coefficients(MFCCs)serve as relevant features for the speech data.The deep learning classification algorithm utilized in this research is based on a Bidirectional Long Short-Term Memory(BiLSTM)architecture within a Recurrent Neural Network(RNN)model.展开更多
Purpose–The safe operation of the metro power transformer directly relates to the safety and efficiency of the entire metro system.Through voiceprint technology,the sounds emitted by the transformer can be monitored ...Purpose–The safe operation of the metro power transformer directly relates to the safety and efficiency of the entire metro system.Through voiceprint technology,the sounds emitted by the transformer can be monitored in real-time,thereby achieving real-time monitoring of the transformer’s operational status.However,the environment surrounding power transformers is filled with various interfering sounds that intertwine with both the normal operational voiceprints and faulty voiceprints of the transformer,severely impacting the accuracy and reliability of voiceprint identification.Therefore,effective preprocessing steps are required to identify and separate the sound signals of transformer operation,which is a prerequisite for subsequent analysis.Design/methodology/approach–This paper proposes an Adaptive Threshold Repeating Pattern Extraction Technique(REPET)algorithm to separate and denoise the transformer operation sound signals.By analyzing the Short-Time Fourier Transform(STFT)amplitude spectrum,the algorithm identifies and utilizes the repeating periodic structures within the signal to automatically adjust the threshold,effectively distinguishing and extracting stable background signals from transient foreground events.The REPET algorithm first calculates the autocorrelation matrix of the signal to determine the repeating period,then constructs a repeating segment model.Through comparison with the amplitude spectrum of the original signal,repeating patterns are extracted and a soft time-frequency mask is generated.Findings–After adaptive thresholding processing,the target signal is separated.Experiments conducted on mixed sounds to separate background sounds from foreground sounds using this algorithm and comparing the results with those obtained using the FastICA algorithm demonstrate that the Adaptive Threshold REPET method achieves good separation effects.Originality/value–A REPET method with adaptive threshold is proposed,which adopts the dynamic threshold adjustment mechanism,adaptively calculates the threshold for blind source separation and improves the adaptability and robustness of the algorithm to the statistical characteristics of the signal.It also lays the foundation for transformer fault detection based on acoustic fingerprinting.展开更多
文摘Smoothed cepstral peak prominence(CPPs)is a measurement of the distance from the prominent cepstral peak to the linear regression line directly beneath it.Variations of CPPs data acquisition and analysis lead to the complexity of the clinical cut-off values,and there are no agreeable values for a specific voice disorder,such as hypokinetic dysarthria associated with Parkinson’s disease(PD).This study examined the CPPs in people with hypokinetic dysarthria associated with PD compared with healthy participants.Results demonstrated significant differences in speech tasks of sustained vowel and connected speech,with CPPs of connected speech more sensitive to dysphonia and gender difference in PD participants.Males in PD participants presented higher CPPs for sustained vowels and lower CPPs for connected speech than females.It is implied that a consistent clinical application protocol is necessary,and multiple acoustic measures are needed to ensure the accuracy of clinical decisions.
基金Supported by the National Natural Science Foundation of China (No. 61201252, 60775037) , Humanities and Social Sciences Foundation of the Ministry of Education (No. 10YJC870046 ), Natural Science Research Key Project of Anhui Provincial Higher Education (No. KJ2011 A128) , Soft Science Project of Anhui Province ( No. 11020503009).
文摘To support the rapid automatic services composition and fulfill multi-quality of service (multi- QoS) demand, we propose a novel approach to realize services composition automatically by a prefihering process. Aimed at a set of web services with similar functionality and different quality of service (QoS) , a semantic services chain is given and a corresponding constructing algorithm is proposed to construct the data structure. A pre-filtering process is put forward to find whether a composition service before planning exists. It can avoid aborted planning. An optimal planning algorithm is proposed which can choose the most suitable service from a lot of similar candidate services based on semantic service chains and multi-QoS values. The algorithms can improve the correctness and automation performances of automated semantic web services composition. As an example, a concrete composite process is analyzed. Experimental results show the validity of the composite process.
基金National Natural Science Foundation of China(No.62001100)。
文摘The hidden danger of the automatic speaker verification(ASV)system is various spoofed speeches.These threats can be classified into two categories,namely logical access(LA)and physical access(PA).To improve identification capability of spoofed speech detection,this paper considers the research on features.Firstly,following the idea of modifying the constant-Q-based features,this work considered adding variance or mean to the constant-Q-based cepstral domain to obtain good performance.Secondly,linear frequency cepstral coefficients(LFCCs)performed comparably with constant-Q-based features.Finally,we proposed linear frequency variance-based cepstral coefficients(LVCCs)and linear frequency mean-based cepstral coefficients(LMCCs)for identification of speech spoofing.LVCCs and LMCCs could be attained by adding the frame variance or the mean to the log magnitude spectrum based on LFCC features.The proposed novel features were evaluated on ASVspoof 2019 datase.The experimental results show that compared with known hand-crafted features,LVCCs and LMCCs are more effective in resisting spoofed speech attack.
文摘This article presents an exhaustive comparative investigation into the accuracy of gender identification across diverse geographical regions,employing a deep learning classification algorithm for speech signal analysis.In this study,speech samples are categorized for both training and testing purposes based on their geographical origin.Category 1 comprises speech samples from speakers outside of India,whereas Category 2 comprises live-recorded speech samples from Indian speakers.Testing speech samples are likewise classified into four distinct sets,taking into consideration both geographical origin and the language spoken by the speakers.Significantly,the results indicate a noticeable difference in gender identification accuracy among speakers from different geographical areas.Indian speakers,utilizing 52 Hindi and 26 English phonemes in their speech,demonstrate a notably higher gender identification accuracy of 85.75%compared to those speakers who predominantly use 26 English phonemes in their conversations when the system is trained using speech samples from Indian speakers.The gender identification accuracy of the proposed model reaches 83.20%when the system is trained using speech samples from speakers outside of India.In the analysis of speech signals,Mel Frequency Cepstral Coefficients(MFCCs)serve as relevant features for the speech data.The deep learning classification algorithm utilized in this research is based on a Bidirectional Long Short-Term Memory(BiLSTM)architecture within a Recurrent Neural Network(RNN)model.
基金the China Academy of Railway Sciences Corporation Limited(2023YJ257).
文摘Purpose–The safe operation of the metro power transformer directly relates to the safety and efficiency of the entire metro system.Through voiceprint technology,the sounds emitted by the transformer can be monitored in real-time,thereby achieving real-time monitoring of the transformer’s operational status.However,the environment surrounding power transformers is filled with various interfering sounds that intertwine with both the normal operational voiceprints and faulty voiceprints of the transformer,severely impacting the accuracy and reliability of voiceprint identification.Therefore,effective preprocessing steps are required to identify and separate the sound signals of transformer operation,which is a prerequisite for subsequent analysis.Design/methodology/approach–This paper proposes an Adaptive Threshold Repeating Pattern Extraction Technique(REPET)algorithm to separate and denoise the transformer operation sound signals.By analyzing the Short-Time Fourier Transform(STFT)amplitude spectrum,the algorithm identifies and utilizes the repeating periodic structures within the signal to automatically adjust the threshold,effectively distinguishing and extracting stable background signals from transient foreground events.The REPET algorithm first calculates the autocorrelation matrix of the signal to determine the repeating period,then constructs a repeating segment model.Through comparison with the amplitude spectrum of the original signal,repeating patterns are extracted and a soft time-frequency mask is generated.Findings–After adaptive thresholding processing,the target signal is separated.Experiments conducted on mixed sounds to separate background sounds from foreground sounds using this algorithm and comparing the results with those obtained using the FastICA algorithm demonstrate that the Adaptive Threshold REPET method achieves good separation effects.Originality/value–A REPET method with adaptive threshold is proposed,which adopts the dynamic threshold adjustment mechanism,adaptively calculates the threshold for blind source separation and improves the adaptability and robustness of the algorithm to the statistical characteristics of the signal.It also lays the foundation for transformer fault detection based on acoustic fingerprinting.