Auditory neuropathy (AN) was reported 30 years ago in 1979 when Davis and Hirsh presented the first case with normal or near normal hearing threshold but absent auditory brainstem responses.Many names have been given ...Auditory neuropathy (AN) was reported 30 years ago in 1979 when Davis and Hirsh presented the first case with normal or near normal hearing threshold but absent auditory brainstem responses.Many names have been given since then including paradoxical hearing loss,brainstem auditory processing syndrome,central auditory dysfunction,neural synchrony disorder or neural dyssynchrony.The term auditory neuropathy was first given by Sininger and colleagues in 1995.More and more AN articles have been published in recent years.The present short review and case report focus on the most important characteristics from a clinical point of view in order to let young physicians know AN,and consequently make correct diagnosis.展开更多
The Perceptual Spectrum Distortion (PSD), based on auditory properties of human being, is presented to measure speech distortion. The PSD measure calculates the speech distortion distance by simulating the auditory p...The Perceptual Spectrum Distortion (PSD), based on auditory properties of human being, is presented to measure speech distortion. The PSD measure calculates the speech distortion distance by simulating the auditory properties of human being and converting short-time speech power spectrum to auditory perceptual spectrum. Preliminary simulative experiments in comparison with the Itakura measure have been done. The results show that the PSD measure is a perferable speech distortion measure and more consistent with subjective assessment of speech quality.展开更多
A frequency following response(FFR) of speech auditory brainstem response(speech-ABR) elicited by the speech syllable/da/contains three distinct waves named as D, E and F, corresponding to the structure of the stimulu...A frequency following response(FFR) of speech auditory brainstem response(speech-ABR) elicited by the speech syllable/da/contains three distinct waves named as D, E and F, corresponding to the structure of the stimulus sound. The detection and characterization of FFRs are critical in the study and application of speech-ABRs. Conventional methods detect the latencies of the waves in time domain by measuring the maximal amplitudes of the waveform in the preset windows, which suffers the problem of low quality of FFR waves. In this paper, we defined an instantaneous energy(IE) spectrum based on empirical mode decomposition(EMD)method(EMD-IE method) to detect FFR and measured the latencies of the waves. The results reveal that the FFRs are mostly evident on the second layer of the IE spectra,which would benefit the detection and measurement of the FFRs in clinic.展开更多
The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of au...The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of automatic speech recognition(ASR) systems.In this overview paper,we review the techniques proposed in the last two decades in attacking this problem.We focus our discussions on the speech separation problem given its central role in the cocktail party environment,and describe the conventional single-channel techniques such as computational auditory scene analysis(CASA),non-negative matrix factorization(NMF) and generative models,the conventional multi-channel techniques such as beamforming and multi-channel blind source separation,and the newly developed deep learning-based techniques,such as deep clustering(DPCL),the deep attractor network(DANet),and permutation invariant training(PIT).We also present techniques developed to improve ASR accuracy and speaker identification in the cocktail party environment.We argue effectively exploiting information in the microphone array,the acoustic training set,and the language itself using a more powerful model.Better optimization ob jective and techniques will be the approach to solving the cocktail party problem.展开更多
文摘Auditory neuropathy (AN) was reported 30 years ago in 1979 when Davis and Hirsh presented the first case with normal or near normal hearing threshold but absent auditory brainstem responses.Many names have been given since then including paradoxical hearing loss,brainstem auditory processing syndrome,central auditory dysfunction,neural synchrony disorder or neural dyssynchrony.The term auditory neuropathy was first given by Sininger and colleagues in 1995.More and more AN articles have been published in recent years.The present short review and case report focus on the most important characteristics from a clinical point of view in order to let young physicians know AN,and consequently make correct diagnosis.
文摘The Perceptual Spectrum Distortion (PSD), based on auditory properties of human being, is presented to measure speech distortion. The PSD measure calculates the speech distortion distance by simulating the auditory properties of human being and converting short-time speech power spectrum to auditory perceptual spectrum. Preliminary simulative experiments in comparison with the Itakura measure have been done. The results show that the PSD measure is a perferable speech distortion measure and more consistent with subjective assessment of speech quality.
基金National Natural Science Foundation of Chinagrant number:F61172033
文摘A frequency following response(FFR) of speech auditory brainstem response(speech-ABR) elicited by the speech syllable/da/contains three distinct waves named as D, E and F, corresponding to the structure of the stimulus sound. The detection and characterization of FFRs are critical in the study and application of speech-ABRs. Conventional methods detect the latencies of the waves in time domain by measuring the maximal amplitudes of the waveform in the preset windows, which suffers the problem of low quality of FFR waves. In this paper, we defined an instantaneous energy(IE) spectrum based on empirical mode decomposition(EMD)method(EMD-IE method) to detect FFR and measured the latencies of the waves. The results reveal that the FFRs are mostly evident on the second layer of the IE spectra,which would benefit the detection and measurement of the FFRs in clinic.
基金supported by the Tencent and Shanghai Jiao Tong University Joint Project
文摘The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of automatic speech recognition(ASR) systems.In this overview paper,we review the techniques proposed in the last two decades in attacking this problem.We focus our discussions on the speech separation problem given its central role in the cocktail party environment,and describe the conventional single-channel techniques such as computational auditory scene analysis(CASA),non-negative matrix factorization(NMF) and generative models,the conventional multi-channel techniques such as beamforming and multi-channel blind source separation,and the newly developed deep learning-based techniques,such as deep clustering(DPCL),the deep attractor network(DANet),and permutation invariant training(PIT).We also present techniques developed to improve ASR accuracy and speaker identification in the cocktail party environment.We argue effectively exploiting information in the microphone array,the acoustic training set,and the language itself using a more powerful model.Better optimization ob jective and techniques will be the approach to solving the cocktail party problem.