Speech perception is essential for daily communication.Background noise or concurrent talkers,on the other hand,can make it challenging for listeners to track the target speech(i.e.,cocktail party problem).The present...Speech perception is essential for daily communication.Background noise or concurrent talkers,on the other hand,can make it challenging for listeners to track the target speech(i.e.,cocktail party problem).The present study reviews and compares existing findings on speech perception and unmasking in cocktail party listening environments in English and Mandarin Chinese.The review starts with an introduction section followed by related concepts of auditory masking.The next two sections review factors that release speech perception from masking in English and Mandarin Chinese,respectively.The last section presents an overall summary of the findings with comparisons between the two languages.Future research directions with respect to the difference in literature on the reviewed topic between the two languages are also discussed.展开更多
Objective:Contribute to clarifying the existence of subclinical hearing deficits associated with aging.Design:In this work,we study and compare the auditory perceptual and electrophysiological performance of normal-he...Objective:Contribute to clarifying the existence of subclinical hearing deficits associated with aging.Design:In this work,we study and compare the auditory perceptual and electrophysiological performance of normal-hearing young and adult subjects(tonal audiometry,high-frequency tone threshold,a triplet of digits in noise,and click-evoked auditory brainstem response).Study sample:45 normal hearing volunteers were evaluated and divided into two groups according to age.27 subjects were included in the“young group”(mean 22.1 years),and 18 subjects(mean 42.22 years)were included in the“adult group.”Results:In the perceptual tests,the adult group presented significantly worse tonal thresholds in the high frequencies(12 and 16 kHz)and worse performance in the digit triplet tests in noise.In the electrophysiological test using the auditory brainstem response technique,the adult group presented significantly lower I and V wave amplitudes and higher V wave latencies at the supra-threshold level.At the threshold level,we observed a significantly higher latency in wave V in the adult group.In addition,in the partial correlation analysis,controlling for the hearing level,we observed a relationship(negative)between age and speech in noise performance and high-frequency thresholds.No significant association was observed between age and the auditory brainstem response.Conclusion:The results are compatible with subclinical hearing loss associated with aging.展开更多
Based on the Motor Theory of speech perception, the interaction between the auditory and motor systems plays an essential role in speech perception. Since the Motor Theory was proposed, it has received remarkable atte...Based on the Motor Theory of speech perception, the interaction between the auditory and motor systems plays an essential role in speech perception. Since the Motor Theory was proposed, it has received remarkable attention in the field. However, each of the three hypotheses of the theory still needs further verification. In this review, we focus on how the auditory-motor anatomical and functional associations play a role in speech perception and discuss why previous studies could not reach an agreement and particularly whether the motor system involvement in speech perception is task-load dependent. Finally, we suggest that the function of the auditory-motor link is particularly useful for speech perception under adverse listening conditions and the further revised Motor Theory is a potential solution to the "cocktail-party" problem.展开更多
Realization of an intelligent human-machine interface requires us to investigate human mechanisms and learn from them. This study focuses on communication between speech production and perception within human brain an...Realization of an intelligent human-machine interface requires us to investigate human mechanisms and learn from them. This study focuses on communication between speech production and perception within human brain and realizing it in an artificial system. A physiological research study based on electromyographic signals (Honda, 1996) suggested that speech communication in human brain might be based on a topological mapping between speech production and perception, according to an analogous topology between motor and sensory representations. Following this hypothesis, this study first investigated the topologies of the vowel system across the motor, kinematic, and acoustic spaces by means of a model simulation, and then examined the linkage between vowel production and perception in terms of a transformed auditory feedback (TAF) experiment. The model simulation indicated that there exists an invariant mapping from muscle activations (motor space) to articulations (kinematic space) via a coordinate consisting of force-dependent equilibrium positions, and the mapping from the motor space to kinematic space is unique. The motor-kinematic-acoustic deduction in the model simulation showed that the topologies were compatible from one space to another. In the TAF experiment, vowel production exhibited a compensatory response for a perturbation in the feedback sound. This implied that vowel production is controlled in reference to perception monitoring.展开更多
The aim of the study was to determine the development of syntax in language development of children who are deaf or hard-of-hearing, who are taught to new dynamic linguistic features with the help of computers. The sa...The aim of the study was to determine the development of syntax in language development of children who are deaf or hard-of-hearing, who are taught to new dynamic linguistic features with the help of computers. The sample consisted of 70 children who are deaf or hard-of-hearing, aged 7-17 years. To assess language development were applied following variables: total number of words used, the total number of different words used, the correct and incorrect statements (sentences) of the respondents. We calculated the basic statistical parameters on which it was found that the experimental program computer teaching children who are deaf or hard-of-hearing gave better results in the development of syntax. Also, canonical discriminate analysis revealed a statistically significant difference in the applied variables between the control and experimental groups the level of statistical significance ofp = 0.000. The results showed a significant improvement of the experimental group and that dynamic computer programming activities, which were challenged participants of the experimental group, contribute to a better linguistic competence of children who are deaf or hard-of-hearing.展开更多
The Perception Spectrogram Structure Boundary(PSSB)parameter is proposed for speech endpoint detection as a preprocess of speech or speaker recognition.At first a hearing perception speech enhancement is carried out...The Perception Spectrogram Structure Boundary(PSSB)parameter is proposed for speech endpoint detection as a preprocess of speech or speaker recognition.At first a hearing perception speech enhancement is carried out.Then the two-dimensional enhancement is performed upon the sound spectrogram according to the difference between the determinacy distribution characteristic of speech and the random distribution characteristic of noise.Finally a decision for endpoint was made by the PSSB parameter.Experimental results show that,in a low SNR environment from-10 dB to 10 dB,the algorithm proposed in this paper may achieve higher accuracy than the extant endpoint detection algorithms.The detection accuracy of 75.2%can be reached even in the extremely low SNR at-10 dB.Therefore it is suitable for speech endpoint detection in low-SNRs environment.展开更多
Computer-aided pronunciation training(CAPT) technologies enable the use of automatic speech recognition to detect mispronunciations in second language(L2) learners' speech. In order to further facilitate learning...Computer-aided pronunciation training(CAPT) technologies enable the use of automatic speech recognition to detect mispronunciations in second language(L2) learners' speech. In order to further facilitate learning, we aim to develop a principle-based method for generating a gradation of the severity of mispronunciations. This paper presents an approach towards gradation that is motivated by auditory perception. We have developed a computational method for generating a perceptual distance(PD) between two spoken phonemes. This is used to compute the auditory confusion of native language(L1). PD is found to correlate well with the mispronunciations detected in CAPT system for Chinese learners of English,i.e., L1 being Chinese(Mandarin and Cantonese) and L2 being US English. The results show that auditory confusion is indicative of pronunciation confusions in L2 learning. PD can also be used to help us grade the severity of errors(i.e.,mispronunciations that confuse more distant phonemes are more severe) and accordingly prioritize the order of corrective feedback generated for the learners.展开更多
Older adults often find it difficult to perceive speech, especially in noisy conditions. Though hearing aid is one of the rehabilitative devices available to older adults to alleviate hearing loss, some of them may ex...Older adults often find it difficult to perceive speech, especially in noisy conditions. Though hearing aid is one of the rehabilitative devices available to older adults to alleviate hearing loss, some of them may experience annoyance through hearing aid and hence reject it, may be due to circuitry noise and/or background noise. Acceptable noise level is a direct behavioural measure to estimate the extent of how much a person is able to put up with noise while simultaneously listening to speech. Acceptable noise level is a central auditory measure and it is not influenced by age, gender, presentation level or speaker. Using this measure, we can quantify the annoyance level experienced by an individual. This in-formation is of utmost importance and caution should be paid before setting the parameters in hearing aid, especially for those who are unable to accept noise. In this review article, an attempt has been made to document how to optimize the hearing aid program by setting parameters such as noise reduction circuit, microphone sensitivity and gain. These adjustments of parameters might help to reduce rejection rate of hearing aids, especially in those individuals who are annoyed by background noise. Copyright ? 2015 The Authors. Production & hosting by Elsevier (Singapore) Pte Ltd On behalf of PLA General Hospital Department of Otolaryngology Head and Neck Surgery. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).展开更多
Objective: To demonstrate the performance benefit of the Automatic Scene Classifier (SCAN) algorithm available in the Nucleus 6 (CP900 series) sound processor over the default processing algorithms of the previou...Objective: To demonstrate the performance benefit of the Automatic Scene Classifier (SCAN) algorithm available in the Nucleus 6 (CP900 series) sound processor over the default processing algorithms of the previous generation Nucleus 5 (CP810) and Freedom HybridTM sound processors. Methods: Eighty-two cochlear implant recipients (40 Nucleus 5 processor users and 42 Freedom Hybrid processor users) listened to and repeated AzBio sentences in noise with their current processor and with the Nucleus 6 processor. Results: The SCAN algorithm when enabled yielded statistically significant non-inferior and superior performance when compared to the Nucleus 5 and Freedom Hybrid sound processors programmed with ASC + ADRO. Conclusion: The results of these studies demonstrate the superior performance and clinical utility of the SCAN algorithm in the Nucleus 6 processor over the Nucleus 5 and Freedom Hybrid processors.展开更多
Objective:To evaluate the auditory function of an individual with genetically confirmed hemochromatosis. Methods: A 57 year old male with mildly impaired sound detection thresholds underwent a range of behavioural, el...Objective:To evaluate the auditory function of an individual with genetically confirmed hemochromatosis. Methods: A 57 year old male with mildly impaired sound detection thresholds underwent a range of behavioural, electroacoustic and elec-trophysiologic assessments. These included the recording of otoacoustic emissions and auditory brainstem responses, measurement of monaural temporal resolution and evaluation of binaural speech processing. Findings for this patient were subsequently compared with those of 80 healthy controls with similar audiometric thresholds. Results: The patient showed the three cardinal features of auditory neuropathy, presenting with evidence of normal cochlear outer hair cell function, disrupted neural activity in the auditory nerve/brainstem and impaired temporal processing. His functional hearing ability (speech perception) was significantly affected and suggested a reduced capacity to use localization cues to segregate signals in the presence of back-ground noise. Conclusion:We present the first case of an individual with hemochromatosis and auditory neuropathy. The findings for this patient highlight the need for careful evaluation of auditory function in individuals with the disorder.展开更多
Perception is the interaction interface between an intelligent system and the real world. Without sophisticated and flexible perceptual capabilities, it is impossible to create advanced artificial intelligence (AI) ...Perception is the interaction interface between an intelligent system and the real world. Without sophisticated and flexible perceptual capabilities, it is impossible to create advanced artificial intelligence (AI) systems. For the next-generation AI, called 'AI 2.0', one of the most significant features will be that AI is empowered with intelligent perceptual capabilities, which can simulate human brain's mechanisms and are likely to surpass human brain in terms of performance. In this paper, we briefly review the state-of-the-art advances across different areas of perception, including visual perception, auditory perception, speech perception, and perceptual information processing and learning engines. On this basis, we envision several R&D trends in intelligent perception for the forthcoming era of AI 2.0, including: (1) human-like and transhuman active vision; (2) auditory perception and computation in an actual auditory setting; (3) speech perception and computation in a natural interaction setting; (4) autonomous learning of perceptual information; (5) large-scale perceptual information processing and learning platforms; and (6) urban omnidirectional intelligent perception and reasoning engines. We believe these research directions should be highlighted in the future plans for AI 2.0.展开更多
Background Many factors interfering with a listener attempting to grasp speech in noisy environments. The spatial hearing by which speech and noise can be spatially separated may play a crucial role in speech recognit...Background Many factors interfering with a listener attempting to grasp speech in noisy environments. The spatial hearing by which speech and noise can be spatially separated may play a crucial role in speech recognition in the presence of competing noise. This study aimed to assess whether, and to what degree, spatial hearing benefit speech recognition in young normal-hearing participants in both quiet and noisy environments. Methods Twenty-eight young participants were tested by Mandarin Hearing In Noise Test (MHINT) in quiet and noisy environments. The assessment method used was characterized by modifications of speech and noise configurations, as well as by changes of speech presentation mode. The benefit of spatial hearing was measured by speech recognition threshold (SRT) variation between speech condition 1 (SC1) and speech condition 2 (SC2). Results There was no significant difference found in the SRT between SC1 and SC2 in quiet. SRT in SC1 was about 4.2 dB lower than that in SC2, both in speech-shaped and four-babble noise conditions. SRTs measured in both SC1 and SC2 were lower in the speech-shaped noise condition than in the four-babble noise condition. Conclusion Spatial hearing in young normal-hearing participants contribute to speech recognition in noisy environments, but provide no benefit to speech recognition in quiet environments, which may be due to the offset of auditory extrinsic redundancy against the lack of spatial hearing.展开更多
Brain mechanisms of lexical-semantic processing have been well researched using electroencephalography(EEG)technique with high temporal resolution.However,the detailed brain dynamics regarding spatial connectivity and...Brain mechanisms of lexical-semantic processing have been well researched using electroencephalography(EEG)technique with high temporal resolution.However,the detailed brain dynamics regarding spatial connectivity and the spectral characteristics remain to be clarified.For this reason,this study performed frequency-specific effective connectivity analysis for the EEG recordings during the processing of real and pseudowords.In addition,we introduced f MRI-based network templates into a representational similarity analysis to compare the functional differences between real and pseudowords in different frequency bands.Our results revealed that real words could rapidly activate the brain network for speech perception and complete its comprehension with efficiency,especially when the first syllable of the real word has clear categorical features.In contrast,the pseudowords were delayed in the initiation of speech perception and required a longer time span to retrieve its meaning.The frequency-specific analysis showed that the theta,alpha,and beta rhythms contribute more to semantic processing than the gamma oscillation.These results showed that semantic processing is frequency-specific and time-dependent on the word categories.展开更多
This paper aims to examine the second language(L2)phonetic categorical perception(CP)pattern by Chinese learners of English,regarding the contrast of dark/l/and vowel/?/.Three perception experiments were carried out p...This paper aims to examine the second language(L2)phonetic categorical perception(CP)pattern by Chinese learners of English,regarding the contrast of dark/l/and vowel/?/.Three perception experiments were carried out progressively:a simple identification task,an AXB identification task,and a revised AX discrimination task.The study discovered a significant difference in vowel contexts in the perception of dark/l/and vowel/?/,in which high vowels stand out,and demonstrated that English proficiency evaluated by standard examinations cannot be reflected in L2 phonetic discrimination.The study also proved the validity of adding reference stimuli in enhancing CP performance,but this improvement only benefits the identification tasks.The study helps to fill in the current knowledge gap concerning Chinese L2 learners’difficulty in distinguishing dark/l/and vowel/?/.The new finding contributes to a deeper understanding of the vowel-context effect on CP performance,as well as implications in second language teaching in exploring the connections between L2 speech perception and production.展开更多
After entering the peripheral auditory system, a sound undergoes many significant changes. The excitation pattern describes these changes psychoacoustically as inner expression. This study investigates the relations b...After entering the peripheral auditory system, a sound undergoes many significant changes. The excitation pattern describes these changes psychoacoustically as inner expression. This study investigates the relations between excitation patterns and their phonetic qualities for Chinese steady-state vowels. First, the peak positions of the envelope of excitation patterns were measured on a database. The results demonstrated that each Chinese vowel has its own special position for the representative peak of the excitation pattern. Second, to examine the sufficiency of these results, a series of experiments that consisted of identification and evaluation tasks were conducted, in which spectral components of natural isolated vowels were manipulated to create certain excitation patterns. Subjects' responses of these stimuli show that the position of the representative peak of the excitation pattern of a vowel plays a crucial role on its phonetic identity. The results suggest that the phonetic identity of vowels is determined by the position of the representative peak of the excitation pattern evoked by it, and other peaks, if any, do not have phonetic meaning. Additionally, several phenomena about speech perception are discussed on the basis of this study.展开更多
The perception of human languages is inherently a multi-modalprocess, in which audio information can be compensated by visual information to improve the recognition performance. Such a phenomenon in English, German, S...The perception of human languages is inherently a multi-modalprocess, in which audio information can be compensated by visual information to improve the recognition performance. Such a phenomenon in English, German, Spanish and so on has been researched, but in Chinese it has not been reported yet. In our experiment, 14 syllables (/ba, bi, bian, biao, bin, de, di, dian, duo, dong, gai, gan, gen, gu/), extracted from Chinese audiovisual bimodal speech database CAVSR-1.0, were pronounced by 10 subjects. The audio-only stimuli, audiovisual stimuli, and visual-only stimuli were recognized by 20 observers. The audio-only stimuli and audiovisual stimuli both were presented under 5 conditions: no noise, SNR 0 dB, -8 dB, -12 dB, and -16 dB. The experimental result is studied and the following conclusions for Chinese speech are reached. Human beings can recognize visual-only stimuli rather well. The place of articulation determines the visual distinction. In noisy environment, audio information can remarkably be compensated by visual information and as a result the recognition performance is greatly improved.展开更多
文摘Speech perception is essential for daily communication.Background noise or concurrent talkers,on the other hand,can make it challenging for listeners to track the target speech(i.e.,cocktail party problem).The present study reviews and compares existing findings on speech perception and unmasking in cocktail party listening environments in English and Mandarin Chinese.The review starts with an introduction section followed by related concepts of auditory masking.The next two sections review factors that release speech perception from masking in English and Mandarin Chinese,respectively.The last section presents an overall summary of the findings with comparisons between the two languages.Future research directions with respect to the difference in literature on the reviewed topic between the two languages are also discussed.
基金supported by a grant of the University of Chile(UI-10/16)to EA.
文摘Objective:Contribute to clarifying the existence of subclinical hearing deficits associated with aging.Design:In this work,we study and compare the auditory perceptual and electrophysiological performance of normal-hearing young and adult subjects(tonal audiometry,high-frequency tone threshold,a triplet of digits in noise,and click-evoked auditory brainstem response).Study sample:45 normal hearing volunteers were evaluated and divided into two groups according to age.27 subjects were included in the“young group”(mean 22.1 years),and 18 subjects(mean 42.22 years)were included in the“adult group.”Results:In the perceptual tests,the adult group presented significantly worse tonal thresholds in the high frequencies(12 and 16 kHz)and worse performance in the digit triplet tests in noise.In the electrophysiological test using the auditory brainstem response technique,the adult group presented significantly lower I and V wave amplitudes and higher V wave latencies at the supra-threshold level.At the threshold level,we observed a significantly higher latency in wave V in the adult group.In addition,in the partial correlation analysis,controlling for the hearing level,we observed a relationship(negative)between age and speech in noise performance and high-frequency thresholds.No significant association was observed between age and the auditory brainstem response.Conclusion:The results are compatible with subclinical hearing loss associated with aging.
基金supported by the National Basic Research Development Program of China (2009CB320901, 2011CB707805, 2013CB329304)the National Natural Science Foundation of China (31170985, 91120001, 61121002)"985" project grants from Peking University
文摘Based on the Motor Theory of speech perception, the interaction between the auditory and motor systems plays an essential role in speech perception. Since the Motor Theory was proposed, it has received remarkable attention in the field. However, each of the three hypotheses of the theory still needs further verification. In this review, we focus on how the auditory-motor anatomical and functional associations play a role in speech perception and discuss why previous studies could not reach an agreement and particularly whether the motor system involvement in speech perception is task-load dependent. Finally, we suggest that the function of the auditory-motor link is particularly useful for speech perception under adverse listening conditions and the further revised Motor Theory is a potential solution to the "cocktail-party" problem.
文摘Realization of an intelligent human-machine interface requires us to investigate human mechanisms and learn from them. This study focuses on communication between speech production and perception within human brain and realizing it in an artificial system. A physiological research study based on electromyographic signals (Honda, 1996) suggested that speech communication in human brain might be based on a topological mapping between speech production and perception, according to an analogous topology between motor and sensory representations. Following this hypothesis, this study first investigated the topologies of the vowel system across the motor, kinematic, and acoustic spaces by means of a model simulation, and then examined the linkage between vowel production and perception in terms of a transformed auditory feedback (TAF) experiment. The model simulation indicated that there exists an invariant mapping from muscle activations (motor space) to articulations (kinematic space) via a coordinate consisting of force-dependent equilibrium positions, and the mapping from the motor space to kinematic space is unique. The motor-kinematic-acoustic deduction in the model simulation showed that the topologies were compatible from one space to another. In the TAF experiment, vowel production exhibited a compensatory response for a perturbation in the feedback sound. This implied that vowel production is controlled in reference to perception monitoring.
文摘The aim of the study was to determine the development of syntax in language development of children who are deaf or hard-of-hearing, who are taught to new dynamic linguistic features with the help of computers. The sample consisted of 70 children who are deaf or hard-of-hearing, aged 7-17 years. To assess language development were applied following variables: total number of words used, the total number of different words used, the correct and incorrect statements (sentences) of the respondents. We calculated the basic statistical parameters on which it was found that the experimental program computer teaching children who are deaf or hard-of-hearing gave better results in the development of syntax. Also, canonical discriminate analysis revealed a statistically significant difference in the applied variables between the control and experimental groups the level of statistical significance ofp = 0.000. The results showed a significant improvement of the experimental group and that dynamic computer programming activities, which were challenged participants of the experimental group, contribute to a better linguistic competence of children who are deaf or hard-of-hearing.
基金supported by the National Natural Science Foundation of China.(61071215,61271359,61372146)
文摘The Perception Spectrogram Structure Boundary(PSSB)parameter is proposed for speech endpoint detection as a preprocess of speech or speaker recognition.At first a hearing perception speech enhancement is carried out.Then the two-dimensional enhancement is performed upon the sound spectrogram according to the difference between the determinacy distribution characteristic of speech and the random distribution characteristic of noise.Finally a decision for endpoint was made by the PSSB parameter.Experimental results show that,in a low SNR environment from-10 dB to 10 dB,the algorithm proposed in this paper may achieve higher accuracy than the extant endpoint detection algorithms.The detection accuracy of 75.2%can be reached even in the extremely low SNR at-10 dB.Therefore it is suitable for speech endpoint detection in low-SNRs environment.
基金supported by the National Basic Research 973 Program of China under Grant No.2013CB329304the National Natural Science Foundation of China under Grant No.61370023+2 种基金the Major Project of the National Social Science Foundation of China under Grant No.13&ZD189partially supported by the General Research Fund of the Hong Kong SAR Government under Project No.415511the CUHK Teaching Development Grant
文摘Computer-aided pronunciation training(CAPT) technologies enable the use of automatic speech recognition to detect mispronunciations in second language(L2) learners' speech. In order to further facilitate learning, we aim to develop a principle-based method for generating a gradation of the severity of mispronunciations. This paper presents an approach towards gradation that is motivated by auditory perception. We have developed a computational method for generating a perceptual distance(PD) between two spoken phonemes. This is used to compute the auditory confusion of native language(L1). PD is found to correlate well with the mispronunciations detected in CAPT system for Chinese learners of English,i.e., L1 being Chinese(Mandarin and Cantonese) and L2 being US English. The results show that auditory confusion is indicative of pronunciation confusions in L2 learning. PD can also be used to help us grade the severity of errors(i.e.,mispronunciations that confuse more distant phonemes are more severe) and accordingly prioritize the order of corrective feedback generated for the learners.
文摘Older adults often find it difficult to perceive speech, especially in noisy conditions. Though hearing aid is one of the rehabilitative devices available to older adults to alleviate hearing loss, some of them may experience annoyance through hearing aid and hence reject it, may be due to circuitry noise and/or background noise. Acceptable noise level is a direct behavioural measure to estimate the extent of how much a person is able to put up with noise while simultaneously listening to speech. Acceptable noise level is a central auditory measure and it is not influenced by age, gender, presentation level or speaker. Using this measure, we can quantify the annoyance level experienced by an individual. This in-formation is of utmost importance and caution should be paid before setting the parameters in hearing aid, especially for those who are unable to accept noise. In this review article, an attempt has been made to document how to optimize the hearing aid program by setting parameters such as noise reduction circuit, microphone sensitivity and gain. These adjustments of parameters might help to reduce rejection rate of hearing aids, especially in those individuals who are annoyed by background noise. Copyright ? 2015 The Authors. Production & hosting by Elsevier (Singapore) Pte Ltd On behalf of PLA General Hospital Department of Otolaryngology Head and Neck Surgery. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
文摘Objective: To demonstrate the performance benefit of the Automatic Scene Classifier (SCAN) algorithm available in the Nucleus 6 (CP900 series) sound processor over the default processing algorithms of the previous generation Nucleus 5 (CP810) and Freedom HybridTM sound processors. Methods: Eighty-two cochlear implant recipients (40 Nucleus 5 processor users and 42 Freedom Hybrid processor users) listened to and repeated AzBio sentences in noise with their current processor and with the Nucleus 6 processor. Results: The SCAN algorithm when enabled yielded statistically significant non-inferior and superior performance when compared to the Nucleus 5 and Freedom Hybrid sound processors programmed with ASC + ADRO. Conclusion: The results of these studies demonstrate the superior performance and clinical utility of the SCAN algorithm in the Nucleus 6 processor over the Nucleus 5 and Freedom Hybrid processors.
基金supported by the HEARing CRC(established and supported under the Australian Government's Cooperative Research Centres Program)
文摘Objective:To evaluate the auditory function of an individual with genetically confirmed hemochromatosis. Methods: A 57 year old male with mildly impaired sound detection thresholds underwent a range of behavioural, electroacoustic and elec-trophysiologic assessments. These included the recording of otoacoustic emissions and auditory brainstem responses, measurement of monaural temporal resolution and evaluation of binaural speech processing. Findings for this patient were subsequently compared with those of 80 healthy controls with similar audiometric thresholds. Results: The patient showed the three cardinal features of auditory neuropathy, presenting with evidence of normal cochlear outer hair cell function, disrupted neural activity in the auditory nerve/brainstem and impaired temporal processing. His functional hearing ability (speech perception) was significantly affected and suggested a reduced capacity to use localization cues to segregate signals in the presence of back-ground noise. Conclusion:We present the first case of an individual with hemochromatosis and auditory neuropathy. The findings for this patient highlight the need for careful evaluation of auditory function in individuals with the disorder.
基金supported by the Strategic Consulting Research Project of Chinese Academy of Engineering(No.2016-ZD-04-03)
文摘Perception is the interaction interface between an intelligent system and the real world. Without sophisticated and flexible perceptual capabilities, it is impossible to create advanced artificial intelligence (AI) systems. For the next-generation AI, called 'AI 2.0', one of the most significant features will be that AI is empowered with intelligent perceptual capabilities, which can simulate human brain's mechanisms and are likely to surpass human brain in terms of performance. In this paper, we briefly review the state-of-the-art advances across different areas of perception, including visual perception, auditory perception, speech perception, and perceptual information processing and learning engines. On this basis, we envision several R&D trends in intelligent perception for the forthcoming era of AI 2.0, including: (1) human-like and transhuman active vision; (2) auditory perception and computation in an actual auditory setting; (3) speech perception and computation in a natural interaction setting; (4) autonomous learning of perceptual information; (5) large-scale perceptual information processing and learning platforms; and (6) urban omnidirectional intelligent perception and reasoning engines. We believe these research directions should be highlighted in the future plans for AI 2.0.
基金This research was supported by a grant from the National Natural Science Foundation of China (No. 30973309).
文摘Background Many factors interfering with a listener attempting to grasp speech in noisy environments. The spatial hearing by which speech and noise can be spatially separated may play a crucial role in speech recognition in the presence of competing noise. This study aimed to assess whether, and to what degree, spatial hearing benefit speech recognition in young normal-hearing participants in both quiet and noisy environments. Methods Twenty-eight young participants were tested by Mandarin Hearing In Noise Test (MHINT) in quiet and noisy environments. The assessment method used was characterized by modifications of speech and noise configurations, as well as by changes of speech presentation mode. The benefit of spatial hearing was measured by speech recognition threshold (SRT) variation between speech condition 1 (SC1) and speech condition 2 (SC2). Results There was no significant difference found in the SRT between SC1 and SC2 in quiet. SRT in SC1 was about 4.2 dB lower than that in SC2, both in speech-shaped and four-babble noise conditions. SRTs measured in both SC1 and SC2 were lower in the speech-shaped noise condition than in the four-babble noise condition. Conclusion Spatial hearing in young normal-hearing participants contribute to speech recognition in noisy environments, but provide no benefit to speech recognition in quiet environments, which may be due to the offset of auditory extrinsic redundancy against the lack of spatial hearing.
基金supported partially by JSPS KAKENHI Grant(20K11883)
文摘Brain mechanisms of lexical-semantic processing have been well researched using electroencephalography(EEG)technique with high temporal resolution.However,the detailed brain dynamics regarding spatial connectivity and the spectral characteristics remain to be clarified.For this reason,this study performed frequency-specific effective connectivity analysis for the EEG recordings during the processing of real and pseudowords.In addition,we introduced f MRI-based network templates into a representational similarity analysis to compare the functional differences between real and pseudowords in different frequency bands.Our results revealed that real words could rapidly activate the brain network for speech perception and complete its comprehension with efficiency,especially when the first syllable of the real word has clear categorical features.In contrast,the pseudowords were delayed in the initiation of speech perception and required a longer time span to retrieve its meaning.The frequency-specific analysis showed that the theta,alpha,and beta rhythms contribute more to semantic processing than the gamma oscillation.These results showed that semantic processing is frequency-specific and time-dependent on the word categories.
基金supported by the Shanghai Social Science Project(2018BYY003)the Major Program of National Social Science Foundation of China(No.15ZDB103)China Scholarship Council
文摘This paper aims to examine the second language(L2)phonetic categorical perception(CP)pattern by Chinese learners of English,regarding the contrast of dark/l/and vowel/?/.Three perception experiments were carried out progressively:a simple identification task,an AXB identification task,and a revised AX discrimination task.The study discovered a significant difference in vowel contexts in the perception of dark/l/and vowel/?/,in which high vowels stand out,and demonstrated that English proficiency evaluated by standard examinations cannot be reflected in L2 phonetic discrimination.The study also proved the validity of adding reference stimuli in enhancing CP performance,but this improvement only benefits the identification tasks.The study helps to fill in the current knowledge gap concerning Chinese L2 learners’difficulty in distinguishing dark/l/and vowel/?/.The new finding contributes to a deeper understanding of the vowel-context effect on CP performance,as well as implications in second language teaching in exploring the connections between L2 speech perception and production.
基金Supported by the "211 Key Projects" of Communication University of China
文摘After entering the peripheral auditory system, a sound undergoes many significant changes. The excitation pattern describes these changes psychoacoustically as inner expression. This study investigates the relations between excitation patterns and their phonetic qualities for Chinese steady-state vowels. First, the peak positions of the envelope of excitation patterns were measured on a database. The results demonstrated that each Chinese vowel has its own special position for the representative peak of the excitation pattern. Second, to examine the sufficiency of these results, a series of experiments that consisted of identification and evaluation tasks were conducted, in which spectral components of natural isolated vowels were manipulated to create certain excitation patterns. Subjects' responses of these stimuli show that the position of the representative peak of the excitation pattern of a vowel plays a crucial role on its phonetic identity. The results suggest that the phonetic identity of vowels is determined by the position of the representative peak of the excitation pattern evoked by it, and other peaks, if any, do not have phonetic meaning. Additionally, several phenomena about speech perception are discussed on the basis of this study.
基金This work was supported by the President Foundation of the Institute of Acoustics, Chinese Academy of Sciences (No.98-02) "863" High Tech R&D Project of China (No. 863-306-ZD-11-1).
文摘The perception of human languages is inherently a multi-modalprocess, in which audio information can be compensated by visual information to improve the recognition performance. Such a phenomenon in English, German, Spanish and so on has been researched, but in Chinese it has not been reported yet. In our experiment, 14 syllables (/ba, bi, bian, biao, bin, de, di, dian, duo, dong, gai, gan, gen, gu/), extracted from Chinese audiovisual bimodal speech database CAVSR-1.0, were pronounced by 10 subjects. The audio-only stimuli, audiovisual stimuli, and visual-only stimuli were recognized by 20 observers. The audio-only stimuli and audiovisual stimuli both were presented under 5 conditions: no noise, SNR 0 dB, -8 dB, -12 dB, and -16 dB. The experimental result is studied and the following conclusions for Chinese speech are reached. Human beings can recognize visual-only stimuli rather well. The place of articulation determines the visual distinction. In noisy environment, audio information can remarkably be compensated by visual information and as a result the recognition performance is greatly improved.