The deployment of vehicle micro-motors has witnessed an expansion owing to the progression in electrification and intelligent technologies.However,some micro-motors may exhibit design deficiencies,component wear,assem...The deployment of vehicle micro-motors has witnessed an expansion owing to the progression in electrification and intelligent technologies.However,some micro-motors may exhibit design deficiencies,component wear,assembly errors,and other imperfections that may arise during the design or manufacturing phases.Conse-quently,these micro-motors might generate anomalous noises during their operation,consequently exerting a substantial adverse influence on the overall comfort of drivers and passengers.Automobile micro-motors exhibit a diverse array of structural variations,consequently leading to the manifestation of a multitude of distinctive auditory irregularities.To address the identification of diverse forms of abnormal noise,this research presents a novel approach rooted in the utilization of vibro-acoustic fusion-convolutional neural network(VAF-CNN).This method entails the deployment of distinct network branches,each serving to capture disparate features from the multi-sensor data,all the while considering the auditory perception traits inherent in the human auditory sys-tem.The intermediary layer integrates the concept of adaptive weighting of multi-sensor features,thus affording a calibration mechanism for the features hailing from multiple sensors,thereby enabling a further refinement of features within the branch network.For optimal model efficacy,a feature fusion mechanism is implemented in the concluding layer.To substantiate the efficacy of the proposed approach,this paper initially employs an augmented data methodology inspired by modified SpecAugment,applied to the dataset of abnormal noise sam-ples,encompassing scenarios both with and without in-vehicle interior noise.This serves to mitigate the issue of limited sample availability.Subsequent comparative evaluations are executed,contrasting the performance of the model founded upon single-sensor data against other feature fusion models reliant on multi-sensor data.The experimental results substantiate that the suggested methodology yields heightened recognition accuracy and greater resilience against interference.Moreover,it holds notable practical significance in the engineering domain,as it furnishes valuable support for the targeted management of noise emanating from vehicle micro-motors.展开更多
Purpose:There is a growing interest in speech intelligibility and audito ry perception of deaf children.The aim of the present study was to compare speech intelligibility and auditory perception of pre-school children...Purpose:There is a growing interest in speech intelligibility and audito ry perception of deaf children.The aim of the present study was to compare speech intelligibility and auditory perception of pre-school children with Hearing Aid(HA),Cochlear Implant(Cl),and Typical Hearing(TH).Methods:The research design was descriptive-analytic and comparative.The participants comprised 75 male pre-school children aged 4-6 years in the 2017-2018 from Tehran,Iran.The participants were divided into three groups,and each group consisted of 25 children.The first and second groups were respectively selected from pre-school children with HA and CI using the convenience sampling method,while the third group was selected from pre-school children with TH by random sampling method.All children completed Speech Intelligibility Rating and Catego ries of Auditory Performance Questionnaires.Results:The findings indicated that the mean scores of speech intelligibility and auditory perception of the group with TH were significantly higher than those of the other groups(P<0.0001).The mean scores of speech intelligibility in the group with CI did not significantly differ from those of the group with HA(P<0.38).Also,the mean scores of auditory perception in the group with CI were significantly higher than those of the group with HA(P<0.002).Conclusion:The results showed that auditory perception in children with CI was significantly higher than children with HA.This finding highlights the importance of cochlear implantation at a younger age and its significant impact on auditory perception in deaf children.展开更多
This paper addresses the JND(Just Noticeable Difference)change of auditory perception with synchronous visual stimuli.Through psychoacoustics experimentS,loudness JND,subjective duration JND and pitch JND of pure to...This paper addresses the JND(Just Noticeable Difference)change of auditory perception with synchronous visual stimuli.Through psychoacoustics experimentS,loudness JND,subjective duration JND and pitch JND of pure tone were measured in auditory-only mode and visual_auditory mode with different visual stimuli which have different attributes such as color,illumination,quality and moving state.Statistical analyses of the experimental data indicare that,comparing with JND in auditory-only mode,the amount of JND with visual stimuli is often larger.The JND'S average increment of subjective duration,pitch and loudness are 45.1%,14.8%and 12.3%,respectively.The conclusion is that the ability of JNDbased auditory perception often decreases with visual stimuli.The incremental amount of JND is afiected bv the attributes of visual stimuli.If the visual stimuli make subjects feel more comfortable,the JND of auditory perception will change smaller.展开更多
Similar to the visual dual-pathway model, neurophysiological studies in non-human primates have suggested that the dual-pathway model is also applicable for explaining auditory cortical processing, including the ventr...Similar to the visual dual-pathway model, neurophysiological studies in non-human primates have suggested that the dual-pathway model is also applicable for explaining auditory cortical processing, including the ventral "what" pathway for object identification and the dorsal "where" pathway for spatial localization. This review summarizes evidence from human neuroimaging studies supporting the dual-pathway model for auditory cortical processing in humans.展开更多
The perceptual effect of the phase information in speech has been studied by auditorysubjective tests. On the condition that the phase spectrum in speech is changed while amplitudespectrum is unchanged, the tests show...The perceptual effect of the phase information in speech has been studied by auditorysubjective tests. On the condition that the phase spectrum in speech is changed while amplitudespectrum is unchanged, the tests show that: (1) If the envelop of the reconstructed speech signalis unchanged, there is indistinctive auditory perception between the original speech and thereconstructed speech; (2) The auditory perception effect of the reconstructed speech mainly lieson the amplitude of the derivative of the additive phase; (3) td is the maximum relative time shiftbetween different frequency components of the reconstructed speech signal. The speech qualityis excellent while td <10ms; good while 10ms< td <20ms; common while 20ms< td <35ms, andpoor while td >35ms.展开更多
Background:Recent developments in virtual acoustic technology has levered promising applications in the field of auditory sciences,especially in spatial perception.While conventional auditory spatial assessment using ...Background:Recent developments in virtual acoustic technology has levered promising applications in the field of auditory sciences,especially in spatial perception.While conventional auditory spatial assessment using loudspeakers,interaural differences and/or questionnaires are limited by the availability and cost of instruments,the use of virtual acoustic space identification(VASI)test has widespread applications in spatial test battery as it overcomes these constraints.Purpose:The lack of test-retest reliability data of VASI test narrows its direct application in auditory spatial assessment,which is explored in the present study.Methods:Data from 75 normal-hearing young adults(mean age:25.11 y±4.65 SD)was collected in three sessions:baseline,within 15 min of baseline(intra-session),and one week after baseline session(inter-session).Test-retest reliability was assessed using the intra-class correlation coefficient(ICC),coefficient of variation(CV),and cluster plots.Results:The results showed excellent reliability for both accuracy and reaction time measures of VASI,with ICC values of 0.93 and 0.87,respectively.The CV values for overall VASI accuracy and reaction time 9.66% and 11.88%,respectively.This was also complemented by the cluster plot analyses,which showed 93.33% and 96.00% of temporal stability in the accuracy and reaction time measures,indicative of high test-retest reliability of VASI test in auditory spatial assessment.Conclusions:The high temporal stability(test-retest reliability)of VASI test validates its application in spatial hearing test battery.展开更多
Aim:To evaluate the hearing of children with congenital hypothyroidism(CH)and to analyze the knowledge that parents’have on the possible auditory impacts of the disease.Methods:A total of 263 parents/guardians were i...Aim:To evaluate the hearing of children with congenital hypothyroidism(CH)and to analyze the knowledge that parents’have on the possible auditory impacts of the disease.Methods:A total of 263 parents/guardians were interviewed about aspects of CH and hearing.Audiological evaluation was performed on 80 participants,divided into two groups:with CH(n?50)and without CH(n紏30).Clinical and laboratory CH data were obtained from medical records,pure tone auditory thresholds and acoustic reflexes were analyzed.The auditory data was compared between groups.Student’s t-test and Chi-square were used for statistical analysis at a significance level of 5%(p<0.05).Results:The majority(78%),of the parents were unaware that CH when not treated early is a potential risk to hearing.There was no correlation between socioeconomic class and level of information about CH and hearing(p>0,05;p=0.026).There was a statistically significant difference between the auditory tone thresholds of the groups and between the levels of intensity necessary for the triggering of the acoustic reflex.The group with CH presented the worst results(p<0.05)and absence of acoustic reflex in a normal tympanometric condition.Conclusions:Children with CH are more likely to develop damage to the auditory system involving retrocochlear structures when compared to healthy children,and that the disease may have been a risk factor for functional deficits without deteriorating hearing sensitivity.The possible impacts of CH on hearing,when not treated early,should be more publicized among the parents/guardians of this population.展开更多
Background:The activation of the medial olivocochlear reflex reduces the cochlear gain,which is manifested perceptually as decreased auditory sensitivity.However,it has remained unclear whether the extent of this supp...Background:The activation of the medial olivocochlear reflex reduces the cochlear gain,which is manifested perceptually as decreased auditory sensitivity.However,it has remained unclear whether the extent of this suppression varies according to the cochlear region involved.Here we aims to assess the magnitude of contralateral efferent suppression across human cochlea,at low levels,and its impact on hearing sensitivity.Methods:Assuming that acoustic stimulation activates the contralateral medial olivocochlear reflex,we evaluated the magnitude of the suppressive effect as a function of frequency in 17 subjects with normal hearing.Absolute thresholds were measured for bursts tones of various durations(10,100,and 500 ms)and frequencies(250,500,1000,4000,and 8000 Hz)in the presence or absence of contralateral white noise at 60 d B SPL.Results:We found that contralateral noise raised the absolute threshold for the burst tones evaluated.The effect was greater at lower than higher frequencies(3.85 d B at 250 Hz vs.2.22 d B at 8000 Hz).Conclusions:Our findings suggest that in humans,the magnitude of this suppression varies according to the cochlear region stimulated,with a greater effect towards the apex(lower frequencies)than the base(higher frequencies)of the cochlea.展开更多
Hiding efficiency of traditional audio information hiding methods is always low since the sentience similarity cannot be guaranteed. A new audio information hiding method is proposed in this letter which can impose th...Hiding efficiency of traditional audio information hiding methods is always low since the sentience similarity cannot be guaranteed. A new audio information hiding method is proposed in this letter which can impose the insensitivity with the audio phase for auditory and realize the information hiding through specific algorithm in order to modify local phase within the auditory perception. The algorithm is to introduce the operation of "set 1" and "set 0" for every phase vectors, then the phases must lie on the boundary of a phase area after modified. If it lies on "1" boundary, it comes by set 1 operation. If it lies on "0" boundary, it comes by set 0 operation. The results show that, compared with the legacy method, the proposed method has better auditory similarity, larger information embedding capacity and lower code error rate. As a kind of blind detect method, it fits for application scenario without channel interference.展开更多
Synesthesia is the "union of the senses" whereby two or more of the five senses that are normally experienced separately are involuntarily and automatically joined together in experience. For example, some synesthet...Synesthesia is the "union of the senses" whereby two or more of the five senses that are normally experienced separately are involuntarily and automatically joined together in experience. For example, some synesthetes experience a color when they hear a sound or see a letter. In this paper, I examine two cases of synesthesia in light of the notions of "experiential parts" and "conscious unity." I first provide some background on the unity of consciousness and the question of experiential parts. I then describe two very different cases of synesthesia. Finally, I critically examine the cases in light of two central notions of"unity." I argue that there is good reason to think that the neural "vehicles" of conscious states are distributed widely and can include multiple modalities. I also argue that some synesthetie experiences do not really enjoy the same "object unity" associated with normal vision.展开更多
Perception is the interaction interface between an intelligent system and the real world. Without sophisticated and flexible perceptual capabilities, it is impossible to create advanced artificial intelligence (AI) ...Perception is the interaction interface between an intelligent system and the real world. Without sophisticated and flexible perceptual capabilities, it is impossible to create advanced artificial intelligence (AI) systems. For the next-generation AI, called 'AI 2.0', one of the most significant features will be that AI is empowered with intelligent perceptual capabilities, which can simulate human brain's mechanisms and are likely to surpass human brain in terms of performance. In this paper, we briefly review the state-of-the-art advances across different areas of perception, including visual perception, auditory perception, speech perception, and perceptual information processing and learning engines. On this basis, we envision several R&D trends in intelligent perception for the forthcoming era of AI 2.0, including: (1) human-like and transhuman active vision; (2) auditory perception and computation in an actual auditory setting; (3) speech perception and computation in a natural interaction setting; (4) autonomous learning of perceptual information; (5) large-scale perceptual information processing and learning platforms; and (6) urban omnidirectional intelligent perception and reasoning engines. We believe these research directions should be highlighted in the future plans for AI 2.0.展开更多
A binaural-loudness-model-based method for evaluating the spatial discrimination threshold of magnitudes of head-related transfer function(HRTF) is proposed.As the input of the binaural loudness model,the HRTF magni...A binaural-loudness-model-based method for evaluating the spatial discrimination threshold of magnitudes of head-related transfer function(HRTF) is proposed.As the input of the binaural loudness model,the HRTF magnitude variations caused by spatial position variations were firstly calculated from a high-resolution HRTF dataset.Then,three perceptualrelevant parameters,namely interaural loudness level difference,binaural loudness level spectra,and total binaural loudness level,were derived from the binaural loudness model.Finally,the spatial discrimination thresholds of HRTF magnitude were evaluated according to just-noticedifference of the above-mentioned perceptual-relevant parameters.A series of psychoacoustic experiments was also conducted to obtain the spatial discrimination threshold of HRTF magnitudes.Results indicate that the threshold derived from the proposed binaural-loudness-modelbased method is consistent with that obtained from the traditional psychoacoustic experiment,validating the effectiveness of the proposed method.展开更多
How to extract robust feature is an important research topic in machine learning community. In this paper, we investigate robust feature extraction for speech signal based on tensor structure and develop a new method ...How to extract robust feature is an important research topic in machine learning community. In this paper, we investigate robust feature extraction for speech signal based on tensor structure and develop a new method called constrained Nonnegative Tensor Factorization (cNTF). A novel feature extraction framework based on the cortical representation in primary auditory cortex (A1) is proposed for robust speaker recognition. Motivated by the neural firing rates model in A1, the speech signal first is represented as a general higher order tensor, cNTF is used to learn the basis functions from multiple interrelated feature subspaces and find a robust sparse representation for speech signal. Computer simulations are given to evaluate the performance of our method and comparisons with existing speaker recognition methods are also provided. The experimental results demonstrate that the proposed method achieves higher recognition accuracy in noisy environment.展开更多
基金The author received the funding from Sichuan Natural Science Foundation(2022NSFSC1892).
文摘The deployment of vehicle micro-motors has witnessed an expansion owing to the progression in electrification and intelligent technologies.However,some micro-motors may exhibit design deficiencies,component wear,assembly errors,and other imperfections that may arise during the design or manufacturing phases.Conse-quently,these micro-motors might generate anomalous noises during their operation,consequently exerting a substantial adverse influence on the overall comfort of drivers and passengers.Automobile micro-motors exhibit a diverse array of structural variations,consequently leading to the manifestation of a multitude of distinctive auditory irregularities.To address the identification of diverse forms of abnormal noise,this research presents a novel approach rooted in the utilization of vibro-acoustic fusion-convolutional neural network(VAF-CNN).This method entails the deployment of distinct network branches,each serving to capture disparate features from the multi-sensor data,all the while considering the auditory perception traits inherent in the human auditory sys-tem.The intermediary layer integrates the concept of adaptive weighting of multi-sensor features,thus affording a calibration mechanism for the features hailing from multiple sensors,thereby enabling a further refinement of features within the branch network.For optimal model efficacy,a feature fusion mechanism is implemented in the concluding layer.To substantiate the efficacy of the proposed approach,this paper initially employs an augmented data methodology inspired by modified SpecAugment,applied to the dataset of abnormal noise sam-ples,encompassing scenarios both with and without in-vehicle interior noise.This serves to mitigate the issue of limited sample availability.Subsequent comparative evaluations are executed,contrasting the performance of the model founded upon single-sensor data against other feature fusion models reliant on multi-sensor data.The experimental results substantiate that the suggested methodology yields heightened recognition accuracy and greater resilience against interference.Moreover,it holds notable practical significance in the engineering domain,as it furnishes valuable support for the targeted management of noise emanating from vehicle micro-motors.
文摘Purpose:There is a growing interest in speech intelligibility and audito ry perception of deaf children.The aim of the present study was to compare speech intelligibility and auditory perception of pre-school children with Hearing Aid(HA),Cochlear Implant(Cl),and Typical Hearing(TH).Methods:The research design was descriptive-analytic and comparative.The participants comprised 75 male pre-school children aged 4-6 years in the 2017-2018 from Tehran,Iran.The participants were divided into three groups,and each group consisted of 25 children.The first and second groups were respectively selected from pre-school children with HA and CI using the convenience sampling method,while the third group was selected from pre-school children with TH by random sampling method.All children completed Speech Intelligibility Rating and Catego ries of Auditory Performance Questionnaires.Results:The findings indicated that the mean scores of speech intelligibility and auditory perception of the group with TH were significantly higher than those of the other groups(P<0.0001).The mean scores of speech intelligibility in the group with CI did not significantly differ from those of the group with HA(P<0.38).Also,the mean scores of auditory perception in the group with CI were significantly higher than those of the group with HA(P<0.002).Conclusion:The results showed that auditory perception in children with CI was significantly higher than children with HA.This finding highlights the importance of cochlear implantation at a younger age and its significant impact on auditory perception in deaf children.
文摘This paper addresses the JND(Just Noticeable Difference)change of auditory perception with synchronous visual stimuli.Through psychoacoustics experimentS,loudness JND,subjective duration JND and pitch JND of pure tone were measured in auditory-only mode and visual_auditory mode with different visual stimuli which have different attributes such as color,illumination,quality and moving state.Statistical analyses of the experimental data indicare that,comparing with JND in auditory-only mode,the amount of JND with visual stimuli is often larger.The JND'S average increment of subjective duration,pitch and loudness are 45.1%,14.8%and 12.3%,respectively.The conclusion is that the ability of JNDbased auditory perception often decreases with visual stimuli.The incremental amount of JND is afiected bv the attributes of visual stimuli.If the visual stimuli make subjects feel more comfortable,the JND of auditory perception will change smaller.
基金This work was supported by the National Natural Science Foundations of China (No. 30711120563, No. 30670704, and No. 60535030).
文摘Similar to the visual dual-pathway model, neurophysiological studies in non-human primates have suggested that the dual-pathway model is also applicable for explaining auditory cortical processing, including the ventral "what" pathway for object identification and the dorsal "where" pathway for spatial localization. This review summarizes evidence from human neuroimaging studies supporting the dual-pathway model for auditory cortical processing in humans.
基金the National Natural Science Foundation of China (No.60071029)
文摘The perceptual effect of the phase information in speech has been studied by auditorysubjective tests. On the condition that the phase spectrum in speech is changed while amplitudespectrum is unchanged, the tests show that: (1) If the envelop of the reconstructed speech signalis unchanged, there is indistinctive auditory perception between the original speech and thereconstructed speech; (2) The auditory perception effect of the reconstructed speech mainly lieson the amplitude of the derivative of the additive phase; (3) td is the maximum relative time shiftbetween different frequency components of the reconstructed speech signal. The speech qualityis excellent while td <10ms; good while 10ms< td <20ms; common while 20ms< td <35ms, andpoor while td >35ms.
文摘Background:Recent developments in virtual acoustic technology has levered promising applications in the field of auditory sciences,especially in spatial perception.While conventional auditory spatial assessment using loudspeakers,interaural differences and/or questionnaires are limited by the availability and cost of instruments,the use of virtual acoustic space identification(VASI)test has widespread applications in spatial test battery as it overcomes these constraints.Purpose:The lack of test-retest reliability data of VASI test narrows its direct application in auditory spatial assessment,which is explored in the present study.Methods:Data from 75 normal-hearing young adults(mean age:25.11 y±4.65 SD)was collected in three sessions:baseline,within 15 min of baseline(intra-session),and one week after baseline session(inter-session).Test-retest reliability was assessed using the intra-class correlation coefficient(ICC),coefficient of variation(CV),and cluster plots.Results:The results showed excellent reliability for both accuracy and reaction time measures of VASI,with ICC values of 0.93 and 0.87,respectively.The CV values for overall VASI accuracy and reaction time 9.66% and 11.88%,respectively.This was also complemented by the cluster plot analyses,which showed 93.33% and 96.00% of temporal stability in the accuracy and reaction time measures,indicative of high test-retest reliability of VASI test in auditory spatial assessment.Conclusions:The high temporal stability(test-retest reliability)of VASI test validates its application in spatial hearing test battery.
文摘Aim:To evaluate the hearing of children with congenital hypothyroidism(CH)and to analyze the knowledge that parents’have on the possible auditory impacts of the disease.Methods:A total of 263 parents/guardians were interviewed about aspects of CH and hearing.Audiological evaluation was performed on 80 participants,divided into two groups:with CH(n?50)and without CH(n紏30).Clinical and laboratory CH data were obtained from medical records,pure tone auditory thresholds and acoustic reflexes were analyzed.The auditory data was compared between groups.Student’s t-test and Chi-square were used for statistical analysis at a significance level of 5%(p<0.05).Results:The majority(78%),of the parents were unaware that CH when not treated early is a potential risk to hearing.There was no correlation between socioeconomic class and level of information about CH and hearing(p>0,05;p=0.026).There was a statistically significant difference between the auditory tone thresholds of the groups and between the levels of intensity necessary for the triggering of the acoustic reflex.The group with CH presented the worst results(p<0.05)and absence of acoustic reflex in a normal tympanometric condition.Conclusions:Children with CH are more likely to develop damage to the auditory system involving retrocochlear structures when compared to healthy children,and that the disease may have been a risk factor for functional deficits without deteriorating hearing sensitivity.The possible impacts of CH on hearing,when not treated early,should be more publicized among the parents/guardians of this population.
基金Work supported by a grant of the University of Chile(UI-10/16)to EA。
文摘Background:The activation of the medial olivocochlear reflex reduces the cochlear gain,which is manifested perceptually as decreased auditory sensitivity.However,it has remained unclear whether the extent of this suppression varies according to the cochlear region involved.Here we aims to assess the magnitude of contralateral efferent suppression across human cochlea,at low levels,and its impact on hearing sensitivity.Methods:Assuming that acoustic stimulation activates the contralateral medial olivocochlear reflex,we evaluated the magnitude of the suppressive effect as a function of frequency in 17 subjects with normal hearing.Absolute thresholds were measured for bursts tones of various durations(10,100,and 500 ms)and frequencies(250,500,1000,4000,and 8000 Hz)in the presence or absence of contralateral white noise at 60 d B SPL.Results:We found that contralateral noise raised the absolute threshold for the burst tones evaluated.The effect was greater at lower than higher frequencies(3.85 d B at 250 Hz vs.2.22 d B at 8000 Hz).Conclusions:Our findings suggest that in humans,the magnitude of this suppression varies according to the cochlear region stimulated,with a greater effect towards the apex(lower frequencies)than the base(higher frequencies)of the cochlea.
文摘Hiding efficiency of traditional audio information hiding methods is always low since the sentience similarity cannot be guaranteed. A new audio information hiding method is proposed in this letter which can impose the insensitivity with the audio phase for auditory and realize the information hiding through specific algorithm in order to modify local phase within the auditory perception. The algorithm is to introduce the operation of "set 1" and "set 0" for every phase vectors, then the phases must lie on the boundary of a phase area after modified. If it lies on "1" boundary, it comes by set 1 operation. If it lies on "0" boundary, it comes by set 0 operation. The results show that, compared with the legacy method, the proposed method has better auditory similarity, larger information embedding capacity and lower code error rate. As a kind of blind detect method, it fits for application scenario without channel interference.
文摘Synesthesia is the "union of the senses" whereby two or more of the five senses that are normally experienced separately are involuntarily and automatically joined together in experience. For example, some synesthetes experience a color when they hear a sound or see a letter. In this paper, I examine two cases of synesthesia in light of the notions of "experiential parts" and "conscious unity." I first provide some background on the unity of consciousness and the question of experiential parts. I then describe two very different cases of synesthesia. Finally, I critically examine the cases in light of two central notions of"unity." I argue that there is good reason to think that the neural "vehicles" of conscious states are distributed widely and can include multiple modalities. I also argue that some synesthetie experiences do not really enjoy the same "object unity" associated with normal vision.
基金supported by the Strategic Consulting Research Project of Chinese Academy of Engineering(No.2016-ZD-04-03)
文摘Perception is the interaction interface between an intelligent system and the real world. Without sophisticated and flexible perceptual capabilities, it is impossible to create advanced artificial intelligence (AI) systems. For the next-generation AI, called 'AI 2.0', one of the most significant features will be that AI is empowered with intelligent perceptual capabilities, which can simulate human brain's mechanisms and are likely to surpass human brain in terms of performance. In this paper, we briefly review the state-of-the-art advances across different areas of perception, including visual perception, auditory perception, speech perception, and perceptual information processing and learning engines. On this basis, we envision several R&D trends in intelligent perception for the forthcoming era of AI 2.0, including: (1) human-like and transhuman active vision; (2) auditory perception and computation in an actual auditory setting; (3) speech perception and computation in a natural interaction setting; (4) autonomous learning of perceptual information; (5) large-scale perceptual information processing and learning platforms; and (6) urban omnidirectional intelligent perception and reasoning engines. We believe these research directions should be highlighted in the future plans for AI 2.0.
基金Supported by the National Natural Science Foundation of China(11174087)
文摘A binaural-loudness-model-based method for evaluating the spatial discrimination threshold of magnitudes of head-related transfer function(HRTF) is proposed.As the input of the binaural loudness model,the HRTF magnitude variations caused by spatial position variations were firstly calculated from a high-resolution HRTF dataset.Then,three perceptualrelevant parameters,namely interaural loudness level difference,binaural loudness level spectra,and total binaural loudness level,were derived from the binaural loudness model.Finally,the spatial discrimination thresholds of HRTF magnitude were evaluated according to just-noticedifference of the above-mentioned perceptual-relevant parameters.A series of psychoacoustic experiments was also conducted to obtain the spatial discrimination threshold of HRTF magnitudes.Results indicate that the threshold derived from the proposed binaural-loudness-modelbased method is consistent with that obtained from the traditional psychoacoustic experiment,validating the effectiveness of the proposed method.
基金supported by the National Natural Science Foundation of China under Grant No.60775007the National Basic Research 973 Program of China under Grant No.2005CB724301the Science and Technology Commission of Shanghai Municipality under Grant No.08511501701
文摘How to extract robust feature is an important research topic in machine learning community. In this paper, we investigate robust feature extraction for speech signal based on tensor structure and develop a new method called constrained Nonnegative Tensor Factorization (cNTF). A novel feature extraction framework based on the cortical representation in primary auditory cortex (A1) is proposed for robust speaker recognition. Motivated by the neural firing rates model in A1, the speech signal first is represented as a general higher order tensor, cNTF is used to learn the basis functions from multiple interrelated feature subspaces and find a robust sparse representation for speech signal. Computer simulations are given to evaluate the performance of our method and comparisons with existing speaker recognition methods are also provided. The experimental results demonstrate that the proposed method achieves higher recognition accuracy in noisy environment.