Smoothed cepstral peak prominence(CPPs)is a measurement of the distance from the prominent cepstral peak to the linear regression line directly beneath it.Variations of CPPs data acquisition and analysis lead to the c...Smoothed cepstral peak prominence(CPPs)is a measurement of the distance from the prominent cepstral peak to the linear regression line directly beneath it.Variations of CPPs data acquisition and analysis lead to the complexity of the clinical cut-off values,and there are no agreeable values for a specific voice disorder,such as hypokinetic dysarthria associated with Parkinson’s disease(PD).This study examined the CPPs in people with hypokinetic dysarthria associated with PD compared with healthy participants.Results demonstrated significant differences in speech tasks of sustained vowel and connected speech,with CPPs of connected speech more sensitive to dysphonia and gender difference in PD participants.Males in PD participants presented higher CPPs for sustained vowels and lower CPPs for connected speech than females.It is implied that a consistent clinical application protocol is necessary,and multiple acoustic measures are needed to ensure the accuracy of clinical decisions.展开更多
The hidden danger of the automatic speaker verification(ASV)system is various spoofed speeches.These threats can be classified into two categories,namely logical access(LA)and physical access(PA).To improve identifica...The hidden danger of the automatic speaker verification(ASV)system is various spoofed speeches.These threats can be classified into two categories,namely logical access(LA)and physical access(PA).To improve identification capability of spoofed speech detection,this paper considers the research on features.Firstly,following the idea of modifying the constant-Q-based features,this work considered adding variance or mean to the constant-Q-based cepstral domain to obtain good performance.Secondly,linear frequency cepstral coefficients(LFCCs)performed comparably with constant-Q-based features.Finally,we proposed linear frequency variance-based cepstral coefficients(LVCCs)and linear frequency mean-based cepstral coefficients(LMCCs)for identification of speech spoofing.LVCCs and LMCCs could be attained by adding the frame variance or the mean to the log magnitude spectrum based on LFCC features.The proposed novel features were evaluated on ASVspoof 2019 datase.The experimental results show that compared with known hand-crafted features,LVCCs and LMCCs are more effective in resisting spoofed speech attack.展开更多
Passive target detection through shipping-radiated noise is a key technology in current underwater operations and is of great research value in civil and military fields.In this study,the stable spectral line componen...Passive target detection through shipping-radiated noise is a key technology in current underwater operations and is of great research value in civil and military fields.In this study,the stable spectral line component of shipping-radiated noise is used as the research object,and the classification of multisource targets is studied from the perspective of underwater channels.We utilize the channel impulse response function as the classification basis of different targets.First,the underwater channel is estimated by the cepstrum.Then,the channel cepstral features carried by different spectral line components are extracted in turn.Finally,the spectral line components belonging to the same target are clustered by the cepstral feature distance to realize the classification of different targets.The simulation and experimental results verify the effectiveness of the proposed method in this research.展开更多
Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life.Recently,Machine-learning based approaches to speech enhancement have shown great pro...Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life.Recently,Machine-learning based approaches to speech enhancement have shown great promise for improving speech intelligibility.Two key issues of these approaches are acoustic features extracted from noisy signals and classifiers used for supervised learning.In this paper,features are focused.Multi-resolution power-normalized cepstral coefficients(MRPNCC)are proposed as a new feature to enhance the speech intelligibility for hearing impaired.The new feature is constructed by combining four cepstrum at different time–frequency(T–F)resolutions in order to capture both the local and contextual information.MRPNCC vectors and binary masking labels calculated by signals passed through gammatone filterbank are used to train support vector machine(SVM)classifier,which aim to identify the binary masking values of the T–F units in the enhancement stage.The enhanced speech is synthesized by using the estimated masking values and wiener filtered T–F unit.Objective experimental results demonstrate that the proposed feature is superior to other comparing features in terms of HIT-FA,STOI,HASPI and PESQ,and that the proposed algorithm not only improves speech intelligibility but also improves speech quality slightly.Subjective tests validate the effectiveness of the proposed algorithm for hearing impaired.展开更多
Seismic edge detection algorithm unmasks blurred discontinuity in an image and its efficiency is dependent on the precession of the processing scheme adopted.Data-driven modeling is a fast machine learning scheme and ...Seismic edge detection algorithm unmasks blurred discontinuity in an image and its efficiency is dependent on the precession of the processing scheme adopted.Data-driven modeling is a fast machine learning scheme and a formal automatic version of the empirical approach in existence for a long time and which can be used in many different contexts.Here,a desired algorithm that can identify masked connection and correlation from a set of observations is built and used.Geologic models of hydrocarbon reservoirs facilitate enhanced visualization,volumetric calculation,well planning and prediction of migration path for fluid.In order to obtain new insights and test the mappability of a geologic feature,spectral decomposition techniques i.e.Discrete Fourier Transform(DFT),etc and Cepstral decomposition techniques,i.e Complex Cepstral Transform(CCT),etc can be employed.Cepstral decomposition is a new approach that extends the widely used process of spectral decomposition which is rigorous when analyzing very subtle stratigraphic plays and fractured reservoirs.This paper presents the results of the application of DFT and CCT to a two dimensional,50Hz low impedance Channel sand model,representing typical geologic environment around a prospective hydrocarbon zone largely trapped in various types of channel structures.While the DFT represents the frequency and phase spectra of a signal,assumes stationarity and highlights the average properties of its dominant portion,assuming analytical,the CCT represents the quefrency and saphe cepstra of a signal in quefrency domain.The transform filters the field data recorded in time domain,and recovers lost sub-seismic geologic information in quefrency domain by separating source and transmission path effects.Our algorithm is based on fast Fourier transform(FFT)techniques and the programming code was written within Matlab software.It was developed from first principles and outside oil industry’s interpretational platform using standard processing routines.The results of the algorithm,when implemented on both commercial and general platforms,were comparable.The cepstral properties of the channel model indicate that cepstral attributes can be utilized as powerful tool in exploration problems to enhance visualization of small scale anomalies and obtain reliable estimates of wavelet and stratigraphic parameters.The practical relevance of this investigation is illustrated by means of sample results of spectral and cepstral attribute plots and pseudo-sections of phase and saphe constructed from the model data.The cepstral attributes reveal more details in terms of quefrency required for clearer imaging and better interpretation of subtle edges/discontinuities,sand-shale interbedding,differences in lithology.These positively impact on production as they serve as basis for the interpretation of similar geologic situations in field data.展开更多
This article presents an exhaustive comparative investigation into the accuracy of gender identification across diverse geographical regions,employing a deep learning classification algorithm for speech signal analysi...This article presents an exhaustive comparative investigation into the accuracy of gender identification across diverse geographical regions,employing a deep learning classification algorithm for speech signal analysis.In this study,speech samples are categorized for both training and testing purposes based on their geographical origin.Category 1 comprises speech samples from speakers outside of India,whereas Category 2 comprises live-recorded speech samples from Indian speakers.Testing speech samples are likewise classified into four distinct sets,taking into consideration both geographical origin and the language spoken by the speakers.Significantly,the results indicate a noticeable difference in gender identification accuracy among speakers from different geographical areas.Indian speakers,utilizing 52 Hindi and 26 English phonemes in their speech,demonstrate a notably higher gender identification accuracy of 85.75%compared to those speakers who predominantly use 26 English phonemes in their conversations when the system is trained using speech samples from Indian speakers.The gender identification accuracy of the proposed model reaches 83.20%when the system is trained using speech samples from speakers outside of India.In the analysis of speech signals,Mel Frequency Cepstral Coefficients(MFCCs)serve as relevant features for the speech data.The deep learning classification algorithm utilized in this research is based on a Bidirectional Long Short-Term Memory(BiLSTM)architecture within a Recurrent Neural Network(RNN)model.展开更多
Purpose–The safe operation of the metro power transformer directly relates to the safety and efficiency of the entire metro system.Through voiceprint technology,the sounds emitted by the transformer can be monitored ...Purpose–The safe operation of the metro power transformer directly relates to the safety and efficiency of the entire metro system.Through voiceprint technology,the sounds emitted by the transformer can be monitored in real-time,thereby achieving real-time monitoring of the transformer’s operational status.However,the environment surrounding power transformers is filled with various interfering sounds that intertwine with both the normal operational voiceprints and faulty voiceprints of the transformer,severely impacting the accuracy and reliability of voiceprint identification.Therefore,effective preprocessing steps are required to identify and separate the sound signals of transformer operation,which is a prerequisite for subsequent analysis.Design/methodology/approach–This paper proposes an Adaptive Threshold Repeating Pattern Extraction Technique(REPET)algorithm to separate and denoise the transformer operation sound signals.By analyzing the Short-Time Fourier Transform(STFT)amplitude spectrum,the algorithm identifies and utilizes the repeating periodic structures within the signal to automatically adjust the threshold,effectively distinguishing and extracting stable background signals from transient foreground events.The REPET algorithm first calculates the autocorrelation matrix of the signal to determine the repeating period,then constructs a repeating segment model.Through comparison with the amplitude spectrum of the original signal,repeating patterns are extracted and a soft time-frequency mask is generated.Findings–After adaptive thresholding processing,the target signal is separated.Experiments conducted on mixed sounds to separate background sounds from foreground sounds using this algorithm and comparing the results with those obtained using the FastICA algorithm demonstrate that the Adaptive Threshold REPET method achieves good separation effects.Originality/value–A REPET method with adaptive threshold is proposed,which adopts the dynamic threshold adjustment mechanism,adaptively calculates the threshold for blind source separation and improves the adaptability and robustness of the algorithm to the statistical characteristics of the signal.It also lays the foundation for transformer fault detection based on acoustic fingerprinting.展开更多
文摘Smoothed cepstral peak prominence(CPPs)is a measurement of the distance from the prominent cepstral peak to the linear regression line directly beneath it.Variations of CPPs data acquisition and analysis lead to the complexity of the clinical cut-off values,and there are no agreeable values for a specific voice disorder,such as hypokinetic dysarthria associated with Parkinson’s disease(PD).This study examined the CPPs in people with hypokinetic dysarthria associated with PD compared with healthy participants.Results demonstrated significant differences in speech tasks of sustained vowel and connected speech,with CPPs of connected speech more sensitive to dysphonia and gender difference in PD participants.Males in PD participants presented higher CPPs for sustained vowels and lower CPPs for connected speech than females.It is implied that a consistent clinical application protocol is necessary,and multiple acoustic measures are needed to ensure the accuracy of clinical decisions.
基金National Natural Science Foundation of China(No.62001100)。
文摘The hidden danger of the automatic speaker verification(ASV)system is various spoofed speeches.These threats can be classified into two categories,namely logical access(LA)and physical access(PA).To improve identification capability of spoofed speech detection,this paper considers the research on features.Firstly,following the idea of modifying the constant-Q-based features,this work considered adding variance or mean to the constant-Q-based cepstral domain to obtain good performance.Secondly,linear frequency cepstral coefficients(LFCCs)performed comparably with constant-Q-based features.Finally,we proposed linear frequency variance-based cepstral coefficients(LVCCs)and linear frequency mean-based cepstral coefficients(LMCCs)for identification of speech spoofing.LVCCs and LMCCs could be attained by adding the frame variance or the mean to the log magnitude spectrum based on LFCC features.The proposed novel features were evaluated on ASVspoof 2019 datase.The experimental results show that compared with known hand-crafted features,LVCCs and LMCCs are more effective in resisting spoofed speech attack.
基金This study was supported by the National Natural Sci-ence Foundation of China(No.11774073)the State Key Laboratory of Acoustics(No.SKLA201904).
文摘Passive target detection through shipping-radiated noise is a key technology in current underwater operations and is of great research value in civil and military fields.In this study,the stable spectral line component of shipping-radiated noise is used as the research object,and the classification of multisource targets is studied from the perspective of underwater channels.We utilize the channel impulse response function as the classification basis of different targets.First,the underwater channel is estimated by the cepstrum.Then,the channel cepstral features carried by different spectral line components are extracted in turn.Finally,the spectral line components belonging to the same target are clustered by the cepstral feature distance to realize the classification of different targets.The simulation and experimental results verify the effectiveness of the proposed method in this research.
基金supported by the National Natural Science Foundation of China(Nos.61902158,61673108)the Science and Technology Program of Nantong(JC2018129,MS12018082)Top-notch Academic Programs Project of Jiangsu Higher Education Institu-tions(PPZY2015B135).
文摘Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life.Recently,Machine-learning based approaches to speech enhancement have shown great promise for improving speech intelligibility.Two key issues of these approaches are acoustic features extracted from noisy signals and classifiers used for supervised learning.In this paper,features are focused.Multi-resolution power-normalized cepstral coefficients(MRPNCC)are proposed as a new feature to enhance the speech intelligibility for hearing impaired.The new feature is constructed by combining four cepstrum at different time–frequency(T–F)resolutions in order to capture both the local and contextual information.MRPNCC vectors and binary masking labels calculated by signals passed through gammatone filterbank are used to train support vector machine(SVM)classifier,which aim to identify the binary masking values of the T–F units in the enhancement stage.The enhanced speech is synthesized by using the estimated masking values and wiener filtered T–F unit.Objective experimental results demonstrate that the proposed feature is superior to other comparing features in terms of HIT-FA,STOI,HASPI and PESQ,and that the proposed algorithm not only improves speech intelligibility but also improves speech quality slightly.Subjective tests validate the effectiveness of the proposed algorithm for hearing impaired.
文摘Seismic edge detection algorithm unmasks blurred discontinuity in an image and its efficiency is dependent on the precession of the processing scheme adopted.Data-driven modeling is a fast machine learning scheme and a formal automatic version of the empirical approach in existence for a long time and which can be used in many different contexts.Here,a desired algorithm that can identify masked connection and correlation from a set of observations is built and used.Geologic models of hydrocarbon reservoirs facilitate enhanced visualization,volumetric calculation,well planning and prediction of migration path for fluid.In order to obtain new insights and test the mappability of a geologic feature,spectral decomposition techniques i.e.Discrete Fourier Transform(DFT),etc and Cepstral decomposition techniques,i.e Complex Cepstral Transform(CCT),etc can be employed.Cepstral decomposition is a new approach that extends the widely used process of spectral decomposition which is rigorous when analyzing very subtle stratigraphic plays and fractured reservoirs.This paper presents the results of the application of DFT and CCT to a two dimensional,50Hz low impedance Channel sand model,representing typical geologic environment around a prospective hydrocarbon zone largely trapped in various types of channel structures.While the DFT represents the frequency and phase spectra of a signal,assumes stationarity and highlights the average properties of its dominant portion,assuming analytical,the CCT represents the quefrency and saphe cepstra of a signal in quefrency domain.The transform filters the field data recorded in time domain,and recovers lost sub-seismic geologic information in quefrency domain by separating source and transmission path effects.Our algorithm is based on fast Fourier transform(FFT)techniques and the programming code was written within Matlab software.It was developed from first principles and outside oil industry’s interpretational platform using standard processing routines.The results of the algorithm,when implemented on both commercial and general platforms,were comparable.The cepstral properties of the channel model indicate that cepstral attributes can be utilized as powerful tool in exploration problems to enhance visualization of small scale anomalies and obtain reliable estimates of wavelet and stratigraphic parameters.The practical relevance of this investigation is illustrated by means of sample results of spectral and cepstral attribute plots and pseudo-sections of phase and saphe constructed from the model data.The cepstral attributes reveal more details in terms of quefrency required for clearer imaging and better interpretation of subtle edges/discontinuities,sand-shale interbedding,differences in lithology.These positively impact on production as they serve as basis for the interpretation of similar geologic situations in field data.
文摘This article presents an exhaustive comparative investigation into the accuracy of gender identification across diverse geographical regions,employing a deep learning classification algorithm for speech signal analysis.In this study,speech samples are categorized for both training and testing purposes based on their geographical origin.Category 1 comprises speech samples from speakers outside of India,whereas Category 2 comprises live-recorded speech samples from Indian speakers.Testing speech samples are likewise classified into four distinct sets,taking into consideration both geographical origin and the language spoken by the speakers.Significantly,the results indicate a noticeable difference in gender identification accuracy among speakers from different geographical areas.Indian speakers,utilizing 52 Hindi and 26 English phonemes in their speech,demonstrate a notably higher gender identification accuracy of 85.75%compared to those speakers who predominantly use 26 English phonemes in their conversations when the system is trained using speech samples from Indian speakers.The gender identification accuracy of the proposed model reaches 83.20%when the system is trained using speech samples from speakers outside of India.In the analysis of speech signals,Mel Frequency Cepstral Coefficients(MFCCs)serve as relevant features for the speech data.The deep learning classification algorithm utilized in this research is based on a Bidirectional Long Short-Term Memory(BiLSTM)architecture within a Recurrent Neural Network(RNN)model.
基金the China Academy of Railway Sciences Corporation Limited(2023YJ257).
文摘Purpose–The safe operation of the metro power transformer directly relates to the safety and efficiency of the entire metro system.Through voiceprint technology,the sounds emitted by the transformer can be monitored in real-time,thereby achieving real-time monitoring of the transformer’s operational status.However,the environment surrounding power transformers is filled with various interfering sounds that intertwine with both the normal operational voiceprints and faulty voiceprints of the transformer,severely impacting the accuracy and reliability of voiceprint identification.Therefore,effective preprocessing steps are required to identify and separate the sound signals of transformer operation,which is a prerequisite for subsequent analysis.Design/methodology/approach–This paper proposes an Adaptive Threshold Repeating Pattern Extraction Technique(REPET)algorithm to separate and denoise the transformer operation sound signals.By analyzing the Short-Time Fourier Transform(STFT)amplitude spectrum,the algorithm identifies and utilizes the repeating periodic structures within the signal to automatically adjust the threshold,effectively distinguishing and extracting stable background signals from transient foreground events.The REPET algorithm first calculates the autocorrelation matrix of the signal to determine the repeating period,then constructs a repeating segment model.Through comparison with the amplitude spectrum of the original signal,repeating patterns are extracted and a soft time-frequency mask is generated.Findings–After adaptive thresholding processing,the target signal is separated.Experiments conducted on mixed sounds to separate background sounds from foreground sounds using this algorithm and comparing the results with those obtained using the FastICA algorithm demonstrate that the Adaptive Threshold REPET method achieves good separation effects.Originality/value–A REPET method with adaptive threshold is proposed,which adopts the dynamic threshold adjustment mechanism,adaptively calculates the threshold for blind source separation and improves the adaptability and robustness of the algorithm to the statistical characteristics of the signal.It also lays the foundation for transformer fault detection based on acoustic fingerprinting.