Binaural rendering is of great interest to virtual reality and immersive media. Although humans can naturally use their two ears to perceive the spatial information contained in sounds, it is a challenging task for ma...Binaural rendering is of great interest to virtual reality and immersive media. Although humans can naturally use their two ears to perceive the spatial information contained in sounds, it is a challenging task for machines to achieve binaural rendering since the description of a sound field often requires multiple channels and even the metadata of the sound sources. In addition, the perceived sound varies from person to person even in the same sound field. Previous methods generally rely on individual-dependent head-related transferred function(HRTF)datasets and optimization algorithms that act on HRTFs. In practical applications, there are two major drawbacks to existing methods. The first is a high personalization cost, as traditional methods achieve personalized needs by measuring HRTFs. The second is insufficient accuracy because the optimization goal of traditional methods is to retain another part of information that is more important in perception at the cost of discarding a part of the information. Therefore, it is desirable to develop novel techniques to achieve personalization and accuracy at a low cost. To this end, we focus on the binaural rendering of ambisonic and propose 1) channel-shared encoder and channel-compared attention integrated into neural networks and 2) a loss function quantifying interaural level differences to deal with spatial information. To verify the proposed method, we collect and release the first paired ambisonic-binaural dataset and introduce three metrics to evaluate the content information and spatial information accuracy of the end-to-end methods. Extensive experimental results on the collected dataset demonstrate the superior performance of the proposed method and the shortcomings of previous methods.展开更多
Background:Musical perception requires a host of skills.Instrumental musicians place greater emphasis on motor coordination,whereas vocal musicians rehearse vocal sounds.The study explored the differential advantages ...Background:Musical perception requires a host of skills.Instrumental musicians place greater emphasis on motor coordination,whereas vocal musicians rehearse vocal sounds.The study explored the differential advantages of musical background on binaural integration and interaction in musicians(instrumentalists,vocalists)and compared them with age-matched non-musicians.Methods:Eight six participants aged 20e40 y with normal hearing sensitivity were subjected to binaural tests using a standard group comparison research design.The participants were segregated into three groups e Group 1 included instrumentalists(n?26,mean age:17.73±2.83 y),while Group 2 and Group 3 consisted of vocalists(n?30,mean age:19.30±2.47 y)and non-musicians(n?30,mean age:18.20±3.02 y)respectively.The binaural processes namely integration(Dichotic syllable test,DST;and virtual acoustic space identification-VASI)and interaction(Interaural difference thresholds for time and level:ITD&ILD),were administered on all the participants.Results:Statistical analyses showed the main effect of musicianship.Bonferroni pair-wise test revealed that the musicians(instrumentalists and vocalists)outperformed(p<0.05)non-musicians in all the tests.The differential advantage of the musical background was seen on the binaural integration test with instrumentalists performing better in the VASI test compared to vocalists,and vice-versa for DST.No difference was observed in interaction tasks(ITD&ILD)between vocalists and instrumentalists(p>0.05).Conclusion:Musical background-induced differential advantages can be reasonably noted in the binaural skills of instrumentalists and vocalists(compared to non-musicians).展开更多
The binaural masking level difference(BMLD)is a psychoacoustic method to determine binaural interaction and central auditory processes.The BMLD is the difference in hearing thresholds in homophasic and antiphasic cond...The binaural masking level difference(BMLD)is a psychoacoustic method to determine binaural interaction and central auditory processes.The BMLD is the difference in hearing thresholds in homophasic and antiphasic conditions.The duration,phase and frequency of the stimuli can affect the BMLD.The main aim of the study is to evaluate the BMLD for stimuli of different durations and frequencies which could also be used in future electrophysiological studies.To this end we developed a GUI to present different frequency signals of variable duration and determine the BMLD.Three different durations and five different frequencies are explored.The results of the study confirm that the hearing threshold for the antiphasic condition is lower than the hearing threshold for the homophasic condition and that differences are significant for signals of 18ms and 48ms duration.Future objective binaural processing studies will be based on 18ms and 48ms stimuli with the same frequencies as used in the current study.展开更多
Bilateral Cochlear implants (CIs) improved speech intelligibility, speech perception in background noise, and sound localization in quiet and noisy situations. However, it is unclear whether these advantages essential...Bilateral Cochlear implants (CIs) improved speech intelligibility, speech perception in background noise, and sound localization in quiet and noisy situations. However, it is unclear whether these advantages essentially result in binaural integration of acoustic stimuli from each ear. In this study, we investigated the effectiveness of binaural integration by bilateral CIs placement using binaural hearing tests and subjective auditory perceptual assessment. A 61-year-old bilateral CIs subject underwent the following four tests:the Japanese Hearing in Noise Test (HINT-J), the dichotic listening test (DLT), the Rapidly Alternating Speech Perception (RASP) test, and subjective auditory perceptual assessment. The HINT-J score was significantly higher for bilateral CIs than for a unilateral CI. However, DLT and the RASP test revealed contradictory results. Subjective auditory perceptual assessment revealed active and bright impressions for bilateral hearing, which were also noisy and strong compared with those for unilateral hearing. The results of this study revealed that bilateral CIs improved speech perception in background noise and an improved auditory impression, although the bilateral integration abilities were not improved. This was probably because the patient was required to combine information from the two ears into a single perception in DLT and the RASP test. More longitudinal data should be collected and analyzed in future studies to evaluate the long-term effects of bilateral CIs. Copyright ? 2016, PLA General Hospital Department of Otolaryngology Head and Neck Surgery. Production and hosting by Elsevier (Singapore) Pte Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).展开更多
Speaker separation in complex acoustic environment is one of challenging tasks in speech separation.In practice,speakers are very often unmoving or moving slowly in normal communication.In this case,the spatial featur...Speaker separation in complex acoustic environment is one of challenging tasks in speech separation.In practice,speakers are very often unmoving or moving slowly in normal communication.In this case,the spatial features among the consecutive speech frames become highly correlated such that it is helpful for speaker separation by providing additional spatial information.To fully exploit this information,we design a separation system on Recurrent Neural Network(RNN)with long short-term memory(LSTM)which effectively learns the temporal dynamics of spatial features.In detail,a LSTM-based speaker separation algorithm is proposed to extract the spatial features in each time-frequency(TF)unit and form the corresponding feature vector.Then,we treat speaker separation as a supervised learning problem,where a modified ideal ratio mask(IRM)is defined as the training function during LSTM learning.Simulations show that the proposed system achieves attractive separation performance in noisy and reverberant environments.Specifically,during the untrained acoustic test with limited priors,e.g.,unmatched signal to noise ratio(SNR)and reverberation,the proposed LSTM based algorithm can still outperforms the existing DNN based method in the measures of PESQ and STOI.It indicates our method is more robust in untrained conditions.展开更多
Ambisonics is a series of spatial sound reproduction system based on spatial harmonics decomposition and each order approximation of sound field.Ambisonics signals are originally intended for loudspeakers reproduction...Ambisonics is a series of spatial sound reproduction system based on spatial harmonics decomposition and each order approximation of sound field.Ambisonics signals are originally intended for loudspeakers reproduction.By using head-related transfer functions(HRTFs)filters,binaural Ambisonics converts the Ambisonics signals for static or dynamic headphone reproduction.In present work,the performances of static and dynamic binaural Ambisonics reproduction are evaluated and compared.The mean binaural pressure errors across target source directions are first analyzed.Then a virtual source localization experiment is conducted,and the localization performances are evaluated by analyzing the percentages of front-back and up-down confusion,the mean angle error and discreteness in the localization results.The results indicate that binaural Ambsonics reproduction with insufficiently high order(for example,5-10 order)is unable to recreate correct high-frequency magnitude spectra in binaural pressures,resulting in degradation in localization for static reproduction.Because dynamic localization cue is included,dynamic binaural Ambisoncis reproduction yields obviously better localization performance than static reproduction with the same order.Even a 3-order dynamic binaural Ambisoncis reproduction exhibits appropriate localizations performance.展开更多
By using a combined closed and free-field stimulation system, binaurality and azimuth tuning of the neurons in the auditory cortex of the big brown bat, Epte-sicus fuscus, were studied. A variety of azimuth-tuning fun...By using a combined closed and free-field stimulation system, binaurality and azimuth tuning of the neurons in the auditory cortex of the big brown bat, Epte-sicus fuscus, were studied. A variety of azimuth-tuning functions were demonstrated for the binaural neurons. The large majority of EE (contralateral and ipsilateral excitatory) neurons exhibited azimuth selectivity with the best azimuths (BA) at contralateral 30°-40°, some at ipsilateral 20°-40° and preferred azimuth ranges (PAR, response amplitude 】75% of maximum) between 8° and 40°. Sound source azimuths strongly modulate spike counts with a mean modulation depth of 83.8% for EE neurons. EI (contralateral excitatory and ipsilateral inhibitory) neurons have simple azimuth tuning with BA located at contralateral 20°-40° and a broad PAR ranged from 30° to 55°. The present results suggest that azimuth-tuning characteristics of binaural neurons in the auditory cortex of the bat are of significance for acoustic behaviour.展开更多
A scheme for analyzing the timbre in spatial sound with binaural auditory model is proposed and the Ambisonics is taken as an example for analysis. Ambisonics is a spatial sound system based on physical sound field re...A scheme for analyzing the timbre in spatial sound with binaural auditory model is proposed and the Ambisonics is taken as an example for analysis. Ambisonics is a spatial sound system based on physical sound field reconstruction. The errors and timbre colorations in the final reconstructed sound field depend on the spatial aliasing errors on both the recording and reproducing stages of Ambisonics. The binaural loudness level spectra in Ambisonics recon- struction is calculated by using Moore's revised loudness model and then compared with the result of real sound source, so as to evaluate the timbre coloration in Ambisonics quantitatively. The results indicate that, in the case of ideal 'independent signals, the high-frequency limit and radius of region without perceived timbre coloration increase with the order of Ambisonics. On the other hand, in the case of recording by microphone array, once the high-frequency limit of microphone array exceeds that of sound field reconstruction, array recording influences little on the binaural loudness level spectra and thus timbre in final reconstruction up to the high- frequency limit of reproduction. Based on the binaural auditory model analysis, a scheme for optimizing design of Ambisonics recording and reproduction is also suggested. The subjective experiment yields consistent results with those of binaural model, thus verifies the effectiveness of the model analysis.展开更多
Background:Diurnal changes can be defined as the time of the day over an individual's performance level for different activities that involve physical and mental tasks.Objective:The current study aimed to evaluate...Background:Diurnal changes can be defined as the time of the day over an individual's performance level for different activities that involve physical and mental tasks.Objective:The current study aimed to evaluate the effect of diurnal changes in scores obtained for the Dichotic Consonant-Vowel paradigm by young adults with normal hearing sensitivity.Method:Based on the‘Morningness-Eveningness questionnaire’given by Horne&Ostberg,the subjects were divided into moderately-morning,intermediate and moderately-evening categories.The Dichotic Consonant-Vowel tests were performed during morning and evening,and the right ear,left ear and double correct scores were compared between morning and evening for each category.Results:There was significant diurnal changes noted for moderately morning and evening categories,where morning-type individuals performed better during morning and evening-type individuals performed better during the evening.The scores of intermediate individuals remained unchanged between morning and evening test results.Conclusion:Diurnal change is a phenomenon associated with an individual's biological clock mechanism.Hence,attention and inhibitory controls aid them in carrying out tasks that require sufficient physical and mental efforts.The current study suggests that clinicians and researchers consider diurnal changes as an extraneous variable that could affect the reliability of the Dichotic Consonant-Vowel test results.展开更多
A novel optimized wavelet packet algorithm is proposed to improve the perception of sensorineural hearing-impaired people. In this work, we have developed optimized wavelet packet along with, biorthogonal wavelet basi...A novel optimized wavelet packet algorithm is proposed to improve the perception of sensorineural hearing-impaired people. In this work, we have developed optimized wavelet packet along with, biorthogonal wavelet basis functions using MATLAB Code. Here, we have created eight bands based on auditory filters of quasi octave bandwidth. Evaluation was carried out by conducting listening tests on seven subjects with bilateral mild to severe sensorineural hearing loss. The speech material used for the listening test consisted of a set of fifteen nonsense syllables in VCV context. The test results show that the proposed algorithm improves the recognition score, speech quality and transmission of overall feature specifically over the unprocessed signal. The response time also reduces significantly.展开更多
Based on the measurements from 52 Chinese subjects (26 males and 26 females), a high-spatial-resolution head-related transfer function (HRTF) database with corre- sponding anthropometric parameters is established. By ...Based on the measurements from 52 Chinese subjects (26 males and 26 females), a high-spatial-resolution head-related transfer function (HRTF) database with corre- sponding anthropometric parameters is established. By using the database, cues relating to sound source localization, including interaural time difference (ITD), interaural level difference (ILD), and spectral features introduced by pinna, are analyzed. Moreover, the statistical relationship between ITD and anthropometric parameters is estimated. It is proved that the mean values of maximum ITD for male and female are significantly different, so are those for Chinese and western sub- jects. The difference in ITD is due to the difference in individual anthropometric parameters. It is further proved that the spectral features introduced by pinna strongly depend on individual; while at high frequencies (f≥ 5.5 kHz), HRTFs are left-right asymmetric. This work is instructive and helpful for the research on bin- aural hearing and applications on virtual auditory in future.展开更多
基金supported in part by the National Natural Science Foundation of China (62176059, 62101136)。
文摘Binaural rendering is of great interest to virtual reality and immersive media. Although humans can naturally use their two ears to perceive the spatial information contained in sounds, it is a challenging task for machines to achieve binaural rendering since the description of a sound field often requires multiple channels and even the metadata of the sound sources. In addition, the perceived sound varies from person to person even in the same sound field. Previous methods generally rely on individual-dependent head-related transferred function(HRTF)datasets and optimization algorithms that act on HRTFs. In practical applications, there are two major drawbacks to existing methods. The first is a high personalization cost, as traditional methods achieve personalized needs by measuring HRTFs. The second is insufficient accuracy because the optimization goal of traditional methods is to retain another part of information that is more important in perception at the cost of discarding a part of the information. Therefore, it is desirable to develop novel techniques to achieve personalization and accuracy at a low cost. To this end, we focus on the binaural rendering of ambisonic and propose 1) channel-shared encoder and channel-compared attention integrated into neural networks and 2) a loss function quantifying interaural level differences to deal with spatial information. To verify the proposed method, we collect and release the first paired ambisonic-binaural dataset and introduce three metrics to evaluate the content information and spatial information accuracy of the end-to-end methods. Extensive experimental results on the collected dataset demonstrate the superior performance of the proposed method and the shortcomings of previous methods.
文摘Background:Musical perception requires a host of skills.Instrumental musicians place greater emphasis on motor coordination,whereas vocal musicians rehearse vocal sounds.The study explored the differential advantages of musical background on binaural integration and interaction in musicians(instrumentalists,vocalists)and compared them with age-matched non-musicians.Methods:Eight six participants aged 20e40 y with normal hearing sensitivity were subjected to binaural tests using a standard group comparison research design.The participants were segregated into three groups e Group 1 included instrumentalists(n?26,mean age:17.73±2.83 y),while Group 2 and Group 3 consisted of vocalists(n?30,mean age:19.30±2.47 y)and non-musicians(n?30,mean age:18.20±3.02 y)respectively.The binaural processes namely integration(Dichotic syllable test,DST;and virtual acoustic space identification-VASI)and interaction(Interaural difference thresholds for time and level:ITD&ILD),were administered on all the participants.Results:Statistical analyses showed the main effect of musicianship.Bonferroni pair-wise test revealed that the musicians(instrumentalists and vocalists)outperformed(p<0.05)non-musicians in all the tests.The differential advantage of the musical background was seen on the binaural integration test with instrumentalists performing better in the VASI test compared to vocalists,and vice-versa for DST.No difference was observed in interaction tasks(ITD&ILD)between vocalists and instrumentalists(p>0.05).Conclusion:Musical background-induced differential advantages can be reasonably noted in the binaural skills of instrumentalists and vocalists(compared to non-musicians).
文摘The binaural masking level difference(BMLD)is a psychoacoustic method to determine binaural interaction and central auditory processes.The BMLD is the difference in hearing thresholds in homophasic and antiphasic conditions.The duration,phase and frequency of the stimuli can affect the BMLD.The main aim of the study is to evaluate the BMLD for stimuli of different durations and frequencies which could also be used in future electrophysiological studies.To this end we developed a GUI to present different frequency signals of variable duration and determine the BMLD.Three different durations and five different frequencies are explored.The results of the study confirm that the hearing threshold for the antiphasic condition is lower than the hearing threshold for the homophasic condition and that differences are significant for signals of 18ms and 48ms duration.Future objective binaural processing studies will be based on 18ms and 48ms stimuli with the same frequencies as used in the current study.
文摘Bilateral Cochlear implants (CIs) improved speech intelligibility, speech perception in background noise, and sound localization in quiet and noisy situations. However, it is unclear whether these advantages essentially result in binaural integration of acoustic stimuli from each ear. In this study, we investigated the effectiveness of binaural integration by bilateral CIs placement using binaural hearing tests and subjective auditory perceptual assessment. A 61-year-old bilateral CIs subject underwent the following four tests:the Japanese Hearing in Noise Test (HINT-J), the dichotic listening test (DLT), the Rapidly Alternating Speech Perception (RASP) test, and subjective auditory perceptual assessment. The HINT-J score was significantly higher for bilateral CIs than for a unilateral CI. However, DLT and the RASP test revealed contradictory results. Subjective auditory perceptual assessment revealed active and bright impressions for bilateral hearing, which were also noisy and strong compared with those for unilateral hearing. The results of this study revealed that bilateral CIs improved speech perception in background noise and an improved auditory impression, although the bilateral integration abilities were not improved. This was probably because the patient was required to combine information from the two ears into a single perception in DLT and the RASP test. More longitudinal data should be collected and analyzed in future studies to evaluate the long-term effects of bilateral CIs. Copyright ? 2016, PLA General Hospital Department of Otolaryngology Head and Neck Surgery. Production and hosting by Elsevier (Singapore) Pte Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
基金This work is supported by the National Nature Science Foundation of China(NSFC)under Grant Nos.61571106,61501169,41706103the Fundamental Research Funds for the Central Universities under Grant No.2242013K30010.
文摘Speaker separation in complex acoustic environment is one of challenging tasks in speech separation.In practice,speakers are very often unmoving or moving slowly in normal communication.In this case,the spatial features among the consecutive speech frames become highly correlated such that it is helpful for speaker separation by providing additional spatial information.To fully exploit this information,we design a separation system on Recurrent Neural Network(RNN)with long short-term memory(LSTM)which effectively learns the temporal dynamics of spatial features.In detail,a LSTM-based speaker separation algorithm is proposed to extract the spatial features in each time-frequency(TF)unit and form the corresponding feature vector.Then,we treat speaker separation as a supervised learning problem,where a modified ideal ratio mask(IRM)is defined as the training function during LSTM learning.Simulations show that the proposed system achieves attractive separation performance in noisy and reverberant environments.Specifically,during the untrained acoustic test with limited priors,e.g.,unmatched signal to noise ratio(SNR)and reverberation,the proposed LSTM based algorithm can still outperforms the existing DNN based method in the measures of PESQ and STOI.It indicates our method is more robust in untrained conditions.
基金This work was supported by the National Natural Science Foundation of China(11674105)State Key Lab of Subtropical Building Science,South China University of Technology.
文摘Ambisonics is a series of spatial sound reproduction system based on spatial harmonics decomposition and each order approximation of sound field.Ambisonics signals are originally intended for loudspeakers reproduction.By using head-related transfer functions(HRTFs)filters,binaural Ambisonics converts the Ambisonics signals for static or dynamic headphone reproduction.In present work,the performances of static and dynamic binaural Ambisonics reproduction are evaluated and compared.The mean binaural pressure errors across target source directions are first analyzed.Then a virtual source localization experiment is conducted,and the localization performances are evaluated by analyzing the percentages of front-back and up-down confusion,the mean angle error and discreteness in the localization results.The results indicate that binaural Ambsonics reproduction with insufficiently high order(for example,5-10 order)is unable to recreate correct high-frequency magnitude spectra in binaural pressures,resulting in degradation in localization for static reproduction.Because dynamic localization cue is included,dynamic binaural Ambisoncis reproduction yields obviously better localization performance than static reproduction with the same order.Even a 3-order dynamic binaural Ambisoncis reproduction exhibits appropriate localizations performance.
基金This work was conducted in the laboratory of P. Jen of University of Missouri-Columbia MO and was supported by a grant from the Human Frontier Science Program The preparation of this manuscript is supported by the National Natural Science Foundation
文摘By using a combined closed and free-field stimulation system, binaurality and azimuth tuning of the neurons in the auditory cortex of the big brown bat, Epte-sicus fuscus, were studied. A variety of azimuth-tuning functions were demonstrated for the binaural neurons. The large majority of EE (contralateral and ipsilateral excitatory) neurons exhibited azimuth selectivity with the best azimuths (BA) at contralateral 30°-40°, some at ipsilateral 20°-40° and preferred azimuth ranges (PAR, response amplitude 】75% of maximum) between 8° and 40°. Sound source azimuths strongly modulate spike counts with a mean modulation depth of 83.8% for EE neurons. EI (contralateral excitatory and ipsilateral inhibitory) neurons have simple azimuth tuning with BA located at contralateral 20°-40° and a broad PAR ranged from 30° to 55°. The present results suggest that azimuth-tuning characteristics of binaural neurons in the auditory cortex of the bat are of significance for acoustic behaviour.
基金supported by the National Natural Science Foundation of China(11174087)
文摘A scheme for analyzing the timbre in spatial sound with binaural auditory model is proposed and the Ambisonics is taken as an example for analysis. Ambisonics is a spatial sound system based on physical sound field reconstruction. The errors and timbre colorations in the final reconstructed sound field depend on the spatial aliasing errors on both the recording and reproducing stages of Ambisonics. The binaural loudness level spectra in Ambisonics recon- struction is calculated by using Moore's revised loudness model and then compared with the result of real sound source, so as to evaluate the timbre coloration in Ambisonics quantitatively. The results indicate that, in the case of ideal 'independent signals, the high-frequency limit and radius of region without perceived timbre coloration increase with the order of Ambisonics. On the other hand, in the case of recording by microphone array, once the high-frequency limit of microphone array exceeds that of sound field reconstruction, array recording influences little on the binaural loudness level spectra and thus timbre in final reconstruction up to the high- frequency limit of reproduction. Based on the binaural auditory model analysis, a scheme for optimizing design of Ambisonics recording and reproduction is also suggested. The subjective experiment yields consistent results with those of binaural model, thus verifies the effectiveness of the model analysis.
文摘Background:Diurnal changes can be defined as the time of the day over an individual's performance level for different activities that involve physical and mental tasks.Objective:The current study aimed to evaluate the effect of diurnal changes in scores obtained for the Dichotic Consonant-Vowel paradigm by young adults with normal hearing sensitivity.Method:Based on the‘Morningness-Eveningness questionnaire’given by Horne&Ostberg,the subjects were divided into moderately-morning,intermediate and moderately-evening categories.The Dichotic Consonant-Vowel tests were performed during morning and evening,and the right ear,left ear and double correct scores were compared between morning and evening for each category.Results:There was significant diurnal changes noted for moderately morning and evening categories,where morning-type individuals performed better during morning and evening-type individuals performed better during the evening.The scores of intermediate individuals remained unchanged between morning and evening test results.Conclusion:Diurnal change is a phenomenon associated with an individual's biological clock mechanism.Hence,attention and inhibitory controls aid them in carrying out tasks that require sufficient physical and mental efforts.The current study suggests that clinicians and researchers consider diurnal changes as an extraneous variable that could affect the reliability of the Dichotic Consonant-Vowel test results.
文摘A novel optimized wavelet packet algorithm is proposed to improve the perception of sensorineural hearing-impaired people. In this work, we have developed optimized wavelet packet along with, biorthogonal wavelet basis functions using MATLAB Code. Here, we have created eight bands based on auditory filters of quasi octave bandwidth. Evaluation was carried out by conducting listening tests on seven subjects with bilateral mild to severe sensorineural hearing loss. The speech material used for the listening test consisted of a set of fifteen nonsense syllables in VCV context. The test results show that the proposed algorithm improves the recognition score, speech quality and transmission of overall feature specifically over the unprocessed signal. The response time also reduces significantly.
基金Supported by the National Natural Science Foundation of China (Grant No. 10374031)
文摘Based on the measurements from 52 Chinese subjects (26 males and 26 females), a high-spatial-resolution head-related transfer function (HRTF) database with corre- sponding anthropometric parameters is established. By using the database, cues relating to sound source localization, including interaural time difference (ITD), interaural level difference (ILD), and spectral features introduced by pinna, are analyzed. Moreover, the statistical relationship between ITD and anthropometric parameters is estimated. It is proved that the mean values of maximum ITD for male and female are significantly different, so are those for Chinese and western sub- jects. The difference in ITD is due to the difference in individual anthropometric parameters. It is further proved that the spectral features introduced by pinna strongly depend on individual; while at high frequencies (f≥ 5.5 kHz), HRTFs are left-right asymmetric. This work is instructive and helpful for the research on bin- aural hearing and applications on virtual auditory in future.