The aim of the study was to evaluate the alterations in speech intelligibility in a cleft palate patient, before and after extending and modifying the palatal contour of the existing prosthesis using a correctable wax...The aim of the study was to evaluate the alterations in speech intelligibility in a cleft palate patient, before and after extending and modifying the palatal contour of the existing prosthesis using a correctable wax recording. An eight-year-old girl studying in second grade with a velopharyngeal defect using an obturator reported to the outpatient clinic complaining of lack in clarity of speech. The existing prosthesis was lacking a speech bulb hence it was decided to add the speech bulb to the existing prosthesis and evaluate the speech. Even after the use of speech bulb it was observed that she was unable to pronounce the vowels and words like shoe, vision, cheer, etc. clearly. Hence, a palatography was done using a correctable wax technique and the existing prosthesis was altered accordingly. Great improvement in speech, mastication, and velopharyngeal function was achieved after the palatography alteration of the existing prosthesis.展开更多
Purpose:There is a growing interest in speech intelligibility and audito ry perception of deaf children.The aim of the present study was to compare speech intelligibility and auditory perception of pre-school children...Purpose:There is a growing interest in speech intelligibility and audito ry perception of deaf children.The aim of the present study was to compare speech intelligibility and auditory perception of pre-school children with Hearing Aid(HA),Cochlear Implant(Cl),and Typical Hearing(TH).Methods:The research design was descriptive-analytic and comparative.The participants comprised 75 male pre-school children aged 4-6 years in the 2017-2018 from Tehran,Iran.The participants were divided into three groups,and each group consisted of 25 children.The first and second groups were respectively selected from pre-school children with HA and CI using the convenience sampling method,while the third group was selected from pre-school children with TH by random sampling method.All children completed Speech Intelligibility Rating and Catego ries of Auditory Performance Questionnaires.Results:The findings indicated that the mean scores of speech intelligibility and auditory perception of the group with TH were significantly higher than those of the other groups(P<0.0001).The mean scores of speech intelligibility in the group with CI did not significantly differ from those of the group with HA(P<0.38).Also,the mean scores of auditory perception in the group with CI were significantly higher than those of the group with HA(P<0.002).Conclusion:The results showed that auditory perception in children with CI was significantly higher than children with HA.This finding highlights the importance of cochlear implantation at a younger age and its significant impact on auditory perception in deaf children.展开更多
Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life.Recently,Machine-learning based approaches to speech enhancement have shown great pro...Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life.Recently,Machine-learning based approaches to speech enhancement have shown great promise for improving speech intelligibility.Two key issues of these approaches are acoustic features extracted from noisy signals and classifiers used for supervised learning.In this paper,features are focused.Multi-resolution power-normalized cepstral coefficients(MRPNCC)are proposed as a new feature to enhance the speech intelligibility for hearing impaired.The new feature is constructed by combining four cepstrum at different time–frequency(T–F)resolutions in order to capture both the local and contextual information.MRPNCC vectors and binary masking labels calculated by signals passed through gammatone filterbank are used to train support vector machine(SVM)classifier,which aim to identify the binary masking values of the T–F units in the enhancement stage.The enhanced speech is synthesized by using the estimated masking values and wiener filtered T–F unit.Objective experimental results demonstrate that the proposed feature is superior to other comparing features in terms of HIT-FA,STOI,HASPI and PESQ,and that the proposed algorithm not only improves speech intelligibility but also improves speech quality slightly.Subjective tests validate the effectiveness of the proposed algorithm for hearing impaired.展开更多
As the primary means of communication,speech is an essential aspect for humans to interact and build connections in the social world.Speech intelligibility is critical in social communication;unintelligibility may lea...As the primary means of communication,speech is an essential aspect for humans to interact and build connections in the social world.Speech intelligibility is critical in social communication;unintelligibility may lead to confusion,misunderstanding,and frustration.Many Chinese learners of English find it challenging to apply English into social interaction and reach mutual intelligibility with international communicators.This article analyzes the obstacles impeding Chinese EFL learners’speech intelligibility development,from the aspects of phonology(segmental and suprasegmental features)and pragmatics.Some strategies are proposed to help Chinese learners ameliorate phonology and pragmatics problems and improve speech intelligibility in English communication.展开更多
This study examines the effect of speech level on intelligibility in different reverberation conditions, and explores the potential of loudness-based reverberation parameters proposed by Lee et al. [J. Acoust. Soc. Am...This study examines the effect of speech level on intelligibility in different reverberation conditions, and explores the potential of loudness-based reverberation parameters proposed by Lee et al. [J. Acoust. Soc. Am., 131(2), 1194-1205 (2012)] to explain the effect of speech level on intelligibility in various reverberation conditions. Listening experiments were performed with three speech levels (LAeq of 55 dB, 65 dB and 75 dB) and three reverberation conditions (T20 of 1.0 s, 1.9 s and 4.0 s), and subjects listened to speech stimuli through headphones. Collected subjective data were compared with two conventional speech intelligibility parameters (Speech Intelligibility Index and Speech Transmission Index) and two loudness-based reverberation parameters (EDTN and TN). Results reveal that the effect of speech level on intelligibility changes with a room’s reverberation conditions, and that increased level results in reduced intelligibility in highly reverberant conditions. EDTN and TN explain this finding better than do STI and SII, because they consider many psychoacoustic phenomena important for the modeling of the effect of speech level varying with reverberation.展开更多
Aims: The purpose of this work is to formulate the requirements for future methods of searching for extra-terrestrial civilizations by use of the concepts of information theory and the theoretically grounded method. M...Aims: The purpose of this work is to formulate the requirements for future methods of searching for extra-terrestrial civilizations by use of the concepts of information theory and the theoretically grounded method. Methodology: To realize it, the number of dimensionless criteria contained in the International System of Units (SI) has been calculated. This value, without additional assumptions, allows us to present a formula for calculating the comparative uncertainty of the model of any physical phenomenon. Based on these formulas, the magnitude of the inevitable threshold of misunderstanding of two civilizations in the universe is determined. Results: New theoretical recommendations for choosing the most effective methods to search the techno signatures of extra-terrestrial civilizations are formulated. Conclusion: Using the calculated amount of information embedded in the model, we showed that the most promising methods for finding potential residents in the Universe should combine frequency radiation with thermal or electromagnetic quantities.展开更多
The paper’s purpose is to design and program the four operation-calculators that receives voice instructions and runs them as either a voice or text phase. The Calculator simulates the work of the Compiler. The paper...The paper’s purpose is to design and program the four operation-calculators that receives voice instructions and runs them as either a voice or text phase. The Calculator simulates the work of the Compiler. The paper is a practical <span style="font-family:Verdana;">example programmed to support that it is possible to construct a verbal</span><span style="font-family:Verdana;"> Compiler.</span>展开更多
The relation between the speech intelligibility of Chinese and the speech transmission index (STI)is discussed, which is based on some useful properties of the modulation transfer function (MTF)and the result obtained...The relation between the speech intelligibility of Chinese and the speech transmission index (STI)is discussed, which is based on some useful properties of the modulation transfer function (MTF)and the result obtained by articulation tests under different signal-to-noise ratios.展开更多
Speech intelligibility (SI) is an important index for the design and assessment of speech purpose hall. The relationship between Chinese speech intelligibility scores in rooms and speech transmission index (STI) under...Speech intelligibility (SI) is an important index for the design and assessment of speech purpose hall. The relationship between Chinese speech intelligibility scores in rooms and speech transmission index (STI) under diotic listening condition was studied using monaural room impulse responses obtained from the room acoustical simulation software Odeon in previous paper. The present study employs the simulated binaural room impulse responses and auralization technique to obtain the subjective Chi- nese speech intelligibility scores using rhyme test. The relationship between Chinese speech intelligi- bility scores and STI is built and validated in rooms using dichotic (binaural) listening. The result shows that there is a high correlation between Chinese speech intelligibility scores and STI using di- chotic listening. The relationship between Chinese speech intelligibility scores and STI under diotic and dichotic listening conditions is also analyzed. Compared with diotic listening, dichotic (binaural) listening (an actual listening situation) can improve 2.7 dB signal-to-noise ratio for Mandarin Chinese speech intelligibility. STI method can predict and evaluate the speech intelligibility for Mandarin Chi- nese in rooms for dichotic (binaural) listening.展开更多
The existing auditory computational mod- els for evaluating speech intelligibility can only account for energetic masking, and the effect of informational masking is rarely described in these models. This study was ai...The existing auditory computational mod- els for evaluating speech intelligibility can only account for energetic masking, and the effect of informational masking is rarely described in these models. This study was aimed to make a computational model considering the mechanism of informational masking. Several psy- choacoustic experiments were conducted to test the ef- fect of informational masking on speech intelligibility by manipulating the number of masking talker, speech rate, and the similarity of F0 contour between target and masker. The results showed that the speech recep- tion threshold for the target increased as the F0 contours of the masker became more similar to that of the tar- get, suggesting that the difficulty in segregating the tar- get harmonics from the masker harmonics may underlie the informational masking effect. Based on these stud- ies, a new auditory computational model was made by inducing the auditory function of harmonic extraction to the traditional model of speech intelligibility index (SII), named as harmonic extraction (HF) model. The predictions of the HF model are highly consistent with the experimental results.展开更多
In order to investigate the influence of dummy head on measuring speech intelligi- bility, the objective and subjective speech intelligibility evaluation experiments were respectively carried out for different spatial...In order to investigate the influence of dummy head on measuring speech intelligi- bility, the objective and subjective speech intelligibility evaluation experiments were respectively carried out for different spatial configurations of a target source and a noise source in the horizontal plane. The differences between standard STIPA measured without a dummy head and binaural STIPA measured with a dummy head were compared and the correlation of subjective speech intelligibility and objective STIPA was analyzed. It is showed that the position of sound source affects significantly on binaural STIPA and subjective intelligibility measured by a dummy head or measured in a real-life scenario. The standard STIPA is closer to the lower value of the two binaural STIPA values. The speech intelligibility is higher for a single ear which is on the same side with the target source or on the other side of the noise source. Binaural speech intelligibility is always the lowest when both target and noise sources are at the same place but once apart the speech intelligibility will increase sharply. It is also found that the subjective intelligibility measured by a dummy head or measured in a real-life scenario is uncorrelated with standard STIPA, but correlated highly with STIPA measured with a dummy head. The subjective intelligibility of one single ear is correlated highly with STIPA measured at the same ear, and the binaural speech intelligibility is in well agreement with the higher value of the two binaural STIPA values.展开更多
Diagnosing a baby’s feelings poses a challenge for both doctors and parents because babies cannot explain their feelings through expression or speech.Understanding the emotions of babies and their associated expressi...Diagnosing a baby’s feelings poses a challenge for both doctors and parents because babies cannot explain their feelings through expression or speech.Understanding the emotions of babies and their associated expressions during different sensations such as hunger,pain,etc.,is a complicated task.In infancy,all communication and feelings are propagated through cryspeech,which is a natural phenomenon.Several clinical methods can be used to diagnose a baby’s diseases,but nonclinical methods of diagnosing a baby’s feelings are lacking.As such,in this study,we aimed to identify babies’feelings and emotions through their cry using a nonclinical method.Changes in the cry sound can be identified using our method and used to assess the baby’s feelings.We considered the frequency of the cries from the energy of the sound.The feelings represented by the infant’s cry are judged to represent certain sensations expressed by the child using the optimal frequency of the recognition of a real-world audio sound.We used machine learning and artificial intelligence to distinguish cry tones in real time through feature analysis.The experimental group consisted of 50%each male and female babies,and we determined the relevancy of the results against different parameters.This application produced real-time results after recognizing a child’s cry sounds.The novelty of our work is that we,for the first time,successfully derived the feelings of young children through the cry-speech of the child,showing promise for end-user applications.展开更多
Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotiona...Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotional states of speakers holds significant importance in a range of real-time applications,including but not limited to virtual reality,human-robot interaction,emergency centers,and human behavior assessment.Accurately identifying emotions in the SER process relies on extracting relevant information from audio inputs.Previous studies on SER have predominantly utilized short-time characteristics such as Mel Frequency Cepstral Coefficients(MFCCs)due to their ability to capture the periodic nature of audio signals effectively.Although these traits may improve their ability to perceive and interpret emotional depictions appropriately,MFCCS has some limitations.So this study aims to tackle the aforementioned issue by systematically picking multiple audio cues,enhancing the classifier model’s efficacy in accurately discerning human emotions.The utilized dataset is taken from the EMO-DB database,preprocessing input speech is done using a 2D Convolution Neural Network(CNN)involves applying convolutional operations to spectrograms as they afford a visual representation of the way the audio signal frequency content changes over time.The next step is the spectrogram data normalization which is crucial for Neural Network(NN)training as it aids in faster convergence.Then the five auditory features MFCCs,Chroma,Mel-Spectrogram,Contrast,and Tonnetz are extracted from the spectrogram sequentially.The attitude of feature selection is to retain only dominant features by excluding the irrelevant ones.In this paper,the Sequential Forward Selection(SFS)and Sequential Backward Selection(SBS)techniques were employed for multiple audio cues features selection.Finally,the feature sets composed from the hybrid feature extraction methods are fed into the deep Bidirectional Long Short Term Memory(Bi-LSTM)network to discern emotions.Since the deep Bi-LSTM can hierarchically learn complex features and increases model capacity by achieving more robust temporal modeling,it is more effective than a shallow Bi-LSTM in capturing the intricate tones of emotional content existent in speech signals.The effectiveness and resilience of the proposed SER model were evaluated by experiments,comparing it to state-of-the-art SER techniques.The results indicated that the model achieved accuracy rates of 90.92%,93%,and 92%over the Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS),Berlin Database of Emotional Speech(EMO-DB),and The Interactive Emotional Dyadic Motion Capture(IEMOCAP)datasets,respectively.These findings signify a prominent enhancement in the ability to emotional depictions identification in speech,showcasing the potential of the proposed model in advancing the SER field.展开更多
A machine learning based speech enhancement method is proposed to improve the intelligibility of whispered speech. A binary mask estimated by a two-class support vector machine (SVM) classifier is used to synthesize...A machine learning based speech enhancement method is proposed to improve the intelligibility of whispered speech. A binary mask estimated by a two-class support vector machine (SVM) classifier is used to synthesize the enhanced whisper. A novel noise robust feature called Gammatone feature cosine coefficients (GFCCs) extracted by an auditory periphery model is derived and used for the binary mask estimation. The intelligibility performance of the proposed method is evaluated and compared with the traditional speech enhancement methods. Objective and subjective evaluation results indicate that the proposed method can effectively improve the intelligibility of whispered speech which is contaminated by noise. Compared with the power subtract algorithm and the log-MMSE algorithm, both of which do not improve the intelligibility in lower signal-to-noise ratio (SNR) environments, the proposed method has good performance in improving the intelligibility of noisy whisper. Additionally, the intelligibility of the enhanced whispered speech using the proposed method also outperforms that of the corresponding unprocessed noisy whispered speech.展开更多
The acoustic environment of the classroom is one of the most important factors influencing the teaching and learning effects of the teacher and students.It is critical to ensure good speech intelligibility in classroo...The acoustic environment of the classroom is one of the most important factors influencing the teaching and learning effects of the teacher and students.It is critical to ensure good speech intelligibility in classrooms.However,due to some factors,it may not be easy to achieve an ideal classroom acoustic environment,especially in large-scale multimedia classrooms.In a real renovation project of 39 multimedia classrooms in a university,seven typical rooms were selected,and the acoustic environment optimisation design and verification for these multimedia classrooms were performed based on simulation.First,the acoustic and sound reinforcement design schemes were determined based on the room acoustics software ODEON.Next,the effects of the optimisation design were analysed,and the simulated and measured results were compared;the accuracy of using the reduced sound absorption coefficients,which were determined empirically,was also examined.Finally,the recommended reverberation times(RTs)in multimedia classrooms corresponding to speech intelligibility were discussed,the effectiveness of the speech transmission index(STI)as a primary parameter for classroom acoustic environment control was considered,and the acoustic environment under the unoccupied and occupied statuses was compared.The results revealed that although there are many factors influencing the effect of classroom acoustic environment control,an adequate result can be expected on applying the appropriate method.Considering both the acoustic design and visual requirements also makes the classroom likely to have a good visual effect in addition to having a good listening environment.展开更多
A heuristic theoretical optimal routing algorithm (TORA) is presented to achieve the data-gathering structure of location-aided quality of service (QoS) in wireless sensor networks (WSNs). The construction of TO...A heuristic theoretical optimal routing algorithm (TORA) is presented to achieve the data-gathering structure of location-aided quality of service (QoS) in wireless sensor networks (WSNs). The construction of TORA is based on a kind of swarm intelligence (SI) mechanism, i. e. , ant colony optimization. Firstly, the ener- gy-efficient weight is designed based on flow distribution to divide WSNs into different functional regions, so the routing selection can self-adapt asymmetric power configurations with lower latency. Then, the designs of the novel heuristic factor and the pheromone updating rule can endow ant-like agents with the ability of detecting the local networks energy status and approaching the theoretical optimal tree, thus improving the adaptability and en- ergy-efficiency in route building. Simulation results show that compared with some classic routing algorithms, TORA can further minimize the total communication energy cost and enhance the QoS performance with low-de- lay effect under the data-gathering condition.展开更多
Based on the commonly used indicators for speech intelligibility, this work acoustically evaluates the two largest auditoria in the Faculty of Engineering, Helwan University, Cairo, Egypt, using experimental and digit...Based on the commonly used indicators for speech intelligibility, this work acoustically evaluates the two largest auditoria in the Faculty of Engineering, Helwan University, Cairo, Egypt, using experimental and digital simulation techniques. Design treatments were also suggested to improve the acoustic performance of the auditoria, where the impact of these treatments was checked using the simutation as well. The models that were analysed using the CATT-software were first validated utilizing the results of the fietd work in the unoccupied rooms. The results showed that the acoustic quality of the two auditoria are far from the optimal conditions due to their improper acoustic characteristics and the high noise revers as weft. The results of improvement proposals showed that altering the ceiling shape and adding efficient absorptive materials to the rear surfaces successfully reduced the excessive reverberation time to the optimal values, increased the earty reflections and eliminated the shadow zones. In addition, decreasing the noise Levels by 20 dB due to improving the window insulation noticeably improved the speech intelligibitity at all receivers.展开更多
文摘The aim of the study was to evaluate the alterations in speech intelligibility in a cleft palate patient, before and after extending and modifying the palatal contour of the existing prosthesis using a correctable wax recording. An eight-year-old girl studying in second grade with a velopharyngeal defect using an obturator reported to the outpatient clinic complaining of lack in clarity of speech. The existing prosthesis was lacking a speech bulb hence it was decided to add the speech bulb to the existing prosthesis and evaluate the speech. Even after the use of speech bulb it was observed that she was unable to pronounce the vowels and words like shoe, vision, cheer, etc. clearly. Hence, a palatography was done using a correctable wax technique and the existing prosthesis was altered accordingly. Great improvement in speech, mastication, and velopharyngeal function was achieved after the palatography alteration of the existing prosthesis.
文摘Purpose:There is a growing interest in speech intelligibility and audito ry perception of deaf children.The aim of the present study was to compare speech intelligibility and auditory perception of pre-school children with Hearing Aid(HA),Cochlear Implant(Cl),and Typical Hearing(TH).Methods:The research design was descriptive-analytic and comparative.The participants comprised 75 male pre-school children aged 4-6 years in the 2017-2018 from Tehran,Iran.The participants were divided into three groups,and each group consisted of 25 children.The first and second groups were respectively selected from pre-school children with HA and CI using the convenience sampling method,while the third group was selected from pre-school children with TH by random sampling method.All children completed Speech Intelligibility Rating and Catego ries of Auditory Performance Questionnaires.Results:The findings indicated that the mean scores of speech intelligibility and auditory perception of the group with TH were significantly higher than those of the other groups(P<0.0001).The mean scores of speech intelligibility in the group with CI did not significantly differ from those of the group with HA(P<0.38).Also,the mean scores of auditory perception in the group with CI were significantly higher than those of the group with HA(P<0.002).Conclusion:The results showed that auditory perception in children with CI was significantly higher than children with HA.This finding highlights the importance of cochlear implantation at a younger age and its significant impact on auditory perception in deaf children.
基金supported by the National Natural Science Foundation of China(Nos.61902158,61673108)the Science and Technology Program of Nantong(JC2018129,MS12018082)Top-notch Academic Programs Project of Jiangsu Higher Education Institu-tions(PPZY2015B135).
文摘Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life.Recently,Machine-learning based approaches to speech enhancement have shown great promise for improving speech intelligibility.Two key issues of these approaches are acoustic features extracted from noisy signals and classifiers used for supervised learning.In this paper,features are focused.Multi-resolution power-normalized cepstral coefficients(MRPNCC)are proposed as a new feature to enhance the speech intelligibility for hearing impaired.The new feature is constructed by combining four cepstrum at different time–frequency(T–F)resolutions in order to capture both the local and contextual information.MRPNCC vectors and binary masking labels calculated by signals passed through gammatone filterbank are used to train support vector machine(SVM)classifier,which aim to identify the binary masking values of the T–F units in the enhancement stage.The enhanced speech is synthesized by using the estimated masking values and wiener filtered T–F unit.Objective experimental results demonstrate that the proposed feature is superior to other comparing features in terms of HIT-FA,STOI,HASPI and PESQ,and that the proposed algorithm not only improves speech intelligibility but also improves speech quality slightly.Subjective tests validate the effectiveness of the proposed algorithm for hearing impaired.
文摘As the primary means of communication,speech is an essential aspect for humans to interact and build connections in the social world.Speech intelligibility is critical in social communication;unintelligibility may lead to confusion,misunderstanding,and frustration.Many Chinese learners of English find it challenging to apply English into social interaction and reach mutual intelligibility with international communicators.This article analyzes the obstacles impeding Chinese EFL learners’speech intelligibility development,from the aspects of phonology(segmental and suprasegmental features)and pragmatics.Some strategies are proposed to help Chinese learners ameliorate phonology and pragmatics problems and improve speech intelligibility in English communication.
文摘This study examines the effect of speech level on intelligibility in different reverberation conditions, and explores the potential of loudness-based reverberation parameters proposed by Lee et al. [J. Acoust. Soc. Am., 131(2), 1194-1205 (2012)] to explain the effect of speech level on intelligibility in various reverberation conditions. Listening experiments were performed with three speech levels (LAeq of 55 dB, 65 dB and 75 dB) and three reverberation conditions (T20 of 1.0 s, 1.9 s and 4.0 s), and subjects listened to speech stimuli through headphones. Collected subjective data were compared with two conventional speech intelligibility parameters (Speech Intelligibility Index and Speech Transmission Index) and two loudness-based reverberation parameters (EDTN and TN). Results reveal that the effect of speech level on intelligibility changes with a room’s reverberation conditions, and that increased level results in reduced intelligibility in highly reverberant conditions. EDTN and TN explain this finding better than do STI and SII, because they consider many psychoacoustic phenomena important for the modeling of the effect of speech level varying with reverberation.
文摘Aims: The purpose of this work is to formulate the requirements for future methods of searching for extra-terrestrial civilizations by use of the concepts of information theory and the theoretically grounded method. Methodology: To realize it, the number of dimensionless criteria contained in the International System of Units (SI) has been calculated. This value, without additional assumptions, allows us to present a formula for calculating the comparative uncertainty of the model of any physical phenomenon. Based on these formulas, the magnitude of the inevitable threshold of misunderstanding of two civilizations in the universe is determined. Results: New theoretical recommendations for choosing the most effective methods to search the techno signatures of extra-terrestrial civilizations are formulated. Conclusion: Using the calculated amount of information embedded in the model, we showed that the most promising methods for finding potential residents in the Universe should combine frequency radiation with thermal or electromagnetic quantities.
文摘The paper’s purpose is to design and program the four operation-calculators that receives voice instructions and runs them as either a voice or text phase. The Calculator simulates the work of the Compiler. The paper is a practical <span style="font-family:Verdana;">example programmed to support that it is possible to construct a verbal</span><span style="font-family:Verdana;"> Compiler.</span>
文摘The relation between the speech intelligibility of Chinese and the speech transmission index (STI)is discussed, which is based on some useful properties of the modulation transfer function (MTF)and the result obtained by articulation tests under different signal-to-noise ratios.
基金the National Natural Science Foundation of China (Grant No. 10774048)
文摘Speech intelligibility (SI) is an important index for the design and assessment of speech purpose hall. The relationship between Chinese speech intelligibility scores in rooms and speech transmission index (STI) under diotic listening condition was studied using monaural room impulse responses obtained from the room acoustical simulation software Odeon in previous paper. The present study employs the simulated binaural room impulse responses and auralization technique to obtain the subjective Chi- nese speech intelligibility scores using rhyme test. The relationship between Chinese speech intelligi- bility scores and STI is built and validated in rooms using dichotic (binaural) listening. The result shows that there is a high correlation between Chinese speech intelligibility scores and STI using di- chotic listening. The relationship between Chinese speech intelligibility scores and STI under diotic and dichotic listening conditions is also analyzed. Compared with diotic listening, dichotic (binaural) listening (an actual listening situation) can improve 2.7 dB signal-to-noise ratio for Mandarin Chinese speech intelligibility. STI method can predict and evaluate the speech intelligibility for Mandarin Chi- nese in rooms for dichotic (binaural) listening.
文摘The existing auditory computational mod- els for evaluating speech intelligibility can only account for energetic masking, and the effect of informational masking is rarely described in these models. This study was aimed to make a computational model considering the mechanism of informational masking. Several psy- choacoustic experiments were conducted to test the ef- fect of informational masking on speech intelligibility by manipulating the number of masking talker, speech rate, and the similarity of F0 contour between target and masker. The results showed that the speech recep- tion threshold for the target increased as the F0 contours of the masker became more similar to that of the tar- get, suggesting that the difficulty in segregating the tar- get harmonics from the masker harmonics may underlie the informational masking effect. Based on these stud- ies, a new auditory computational model was made by inducing the auditory function of harmonic extraction to the traditional model of speech intelligibility index (SII), named as harmonic extraction (HF) model. The predictions of the HF model are highly consistent with the experimental results.
基金supported by the National Nature Science Foundation of China(11204278)
文摘In order to investigate the influence of dummy head on measuring speech intelligi- bility, the objective and subjective speech intelligibility evaluation experiments were respectively carried out for different spatial configurations of a target source and a noise source in the horizontal plane. The differences between standard STIPA measured without a dummy head and binaural STIPA measured with a dummy head were compared and the correlation of subjective speech intelligibility and objective STIPA was analyzed. It is showed that the position of sound source affects significantly on binaural STIPA and subjective intelligibility measured by a dummy head or measured in a real-life scenario. The standard STIPA is closer to the lower value of the two binaural STIPA values. The speech intelligibility is higher for a single ear which is on the same side with the target source or on the other side of the noise source. Binaural speech intelligibility is always the lowest when both target and noise sources are at the same place but once apart the speech intelligibility will increase sharply. It is also found that the subjective intelligibility measured by a dummy head or measured in a real-life scenario is uncorrelated with standard STIPA, but correlated highly with STIPA measured with a dummy head. The subjective intelligibility of one single ear is correlated highly with STIPA measured at the same ear, and the binaural speech intelligibility is in well agreement with the higher value of the two binaural STIPA values.
基金This research was funded by the Deanship of Scientific Research,Najran University,Kingdom of Saudi Arabia,grant number NU/RC/SERC/11/5.
文摘Diagnosing a baby’s feelings poses a challenge for both doctors and parents because babies cannot explain their feelings through expression or speech.Understanding the emotions of babies and their associated expressions during different sensations such as hunger,pain,etc.,is a complicated task.In infancy,all communication and feelings are propagated through cryspeech,which is a natural phenomenon.Several clinical methods can be used to diagnose a baby’s diseases,but nonclinical methods of diagnosing a baby’s feelings are lacking.As such,in this study,we aimed to identify babies’feelings and emotions through their cry using a nonclinical method.Changes in the cry sound can be identified using our method and used to assess the baby’s feelings.We considered the frequency of the cries from the energy of the sound.The feelings represented by the infant’s cry are judged to represent certain sensations expressed by the child using the optimal frequency of the recognition of a real-world audio sound.We used machine learning and artificial intelligence to distinguish cry tones in real time through feature analysis.The experimental group consisted of 50%each male and female babies,and we determined the relevancy of the results against different parameters.This application produced real-time results after recognizing a child’s cry sounds.The novelty of our work is that we,for the first time,successfully derived the feelings of young children through the cry-speech of the child,showing promise for end-user applications.
文摘Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotional states of speakers holds significant importance in a range of real-time applications,including but not limited to virtual reality,human-robot interaction,emergency centers,and human behavior assessment.Accurately identifying emotions in the SER process relies on extracting relevant information from audio inputs.Previous studies on SER have predominantly utilized short-time characteristics such as Mel Frequency Cepstral Coefficients(MFCCs)due to their ability to capture the periodic nature of audio signals effectively.Although these traits may improve their ability to perceive and interpret emotional depictions appropriately,MFCCS has some limitations.So this study aims to tackle the aforementioned issue by systematically picking multiple audio cues,enhancing the classifier model’s efficacy in accurately discerning human emotions.The utilized dataset is taken from the EMO-DB database,preprocessing input speech is done using a 2D Convolution Neural Network(CNN)involves applying convolutional operations to spectrograms as they afford a visual representation of the way the audio signal frequency content changes over time.The next step is the spectrogram data normalization which is crucial for Neural Network(NN)training as it aids in faster convergence.Then the five auditory features MFCCs,Chroma,Mel-Spectrogram,Contrast,and Tonnetz are extracted from the spectrogram sequentially.The attitude of feature selection is to retain only dominant features by excluding the irrelevant ones.In this paper,the Sequential Forward Selection(SFS)and Sequential Backward Selection(SBS)techniques were employed for multiple audio cues features selection.Finally,the feature sets composed from the hybrid feature extraction methods are fed into the deep Bidirectional Long Short Term Memory(Bi-LSTM)network to discern emotions.Since the deep Bi-LSTM can hierarchically learn complex features and increases model capacity by achieving more robust temporal modeling,it is more effective than a shallow Bi-LSTM in capturing the intricate tones of emotional content existent in speech signals.The effectiveness and resilience of the proposed SER model were evaluated by experiments,comparing it to state-of-the-art SER techniques.The results indicated that the model achieved accuracy rates of 90.92%,93%,and 92%over the Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS),Berlin Database of Emotional Speech(EMO-DB),and The Interactive Emotional Dyadic Motion Capture(IEMOCAP)datasets,respectively.These findings signify a prominent enhancement in the ability to emotional depictions identification in speech,showcasing the potential of the proposed model in advancing the SER field.
基金The National Natural Science Foundation of China (No.61231002,61273266,51075068,60872073,60975017, 61003131)the Ph.D.Programs Foundation of the Ministry of Education of China(No.20110092130004)+1 种基金the Science Foundation for Young Talents in the Educational Committee of Anhui Province(No. 2010SQRL018)the 211 Project of Anhui University(No.2009QN027B)
文摘A machine learning based speech enhancement method is proposed to improve the intelligibility of whispered speech. A binary mask estimated by a two-class support vector machine (SVM) classifier is used to synthesize the enhanced whisper. A novel noise robust feature called Gammatone feature cosine coefficients (GFCCs) extracted by an auditory periphery model is derived and used for the binary mask estimation. The intelligibility performance of the proposed method is evaluated and compared with the traditional speech enhancement methods. Objective and subjective evaluation results indicate that the proposed method can effectively improve the intelligibility of whispered speech which is contaminated by noise. Compared with the power subtract algorithm and the log-MMSE algorithm, both of which do not improve the intelligibility in lower signal-to-noise ratio (SNR) environments, the proposed method has good performance in improving the intelligibility of noisy whisper. Additionally, the intelligibility of the enhanced whispered speech using the proposed method also outperforms that of the corresponding unprocessed noisy whispered speech.
基金This study was supported by the National Natural Science Foundation of China(No.51778100,No.11774266,No.51878110,No.51278078).
文摘The acoustic environment of the classroom is one of the most important factors influencing the teaching and learning effects of the teacher and students.It is critical to ensure good speech intelligibility in classrooms.However,due to some factors,it may not be easy to achieve an ideal classroom acoustic environment,especially in large-scale multimedia classrooms.In a real renovation project of 39 multimedia classrooms in a university,seven typical rooms were selected,and the acoustic environment optimisation design and verification for these multimedia classrooms were performed based on simulation.First,the acoustic and sound reinforcement design schemes were determined based on the room acoustics software ODEON.Next,the effects of the optimisation design were analysed,and the simulated and measured results were compared;the accuracy of using the reduced sound absorption coefficients,which were determined empirically,was also examined.Finally,the recommended reverberation times(RTs)in multimedia classrooms corresponding to speech intelligibility were discussed,the effectiveness of the speech transmission index(STI)as a primary parameter for classroom acoustic environment control was considered,and the acoustic environment under the unoccupied and occupied statuses was compared.The results revealed that although there are many factors influencing the effect of classroom acoustic environment control,an adequate result can be expected on applying the appropriate method.Considering both the acoustic design and visual requirements also makes the classroom likely to have a good visual effect in addition to having a good listening environment.
基金Supported by the Foundation of National Natural Science of China(60802005,50803016)the Science Foundation for the Excellent Youth Scholars in East China University of Science and Technology(YH0157127)the Undergraduate Innovational Experimentation Program in East China University of Science andTechnology(X1033)~~
文摘A heuristic theoretical optimal routing algorithm (TORA) is presented to achieve the data-gathering structure of location-aided quality of service (QoS) in wireless sensor networks (WSNs). The construction of TORA is based on a kind of swarm intelligence (SI) mechanism, i. e. , ant colony optimization. Firstly, the ener- gy-efficient weight is designed based on flow distribution to divide WSNs into different functional regions, so the routing selection can self-adapt asymmetric power configurations with lower latency. Then, the designs of the novel heuristic factor and the pheromone updating rule can endow ant-like agents with the ability of detecting the local networks energy status and approaching the theoretical optimal tree, thus improving the adaptability and en- ergy-efficiency in route building. Simulation results show that compared with some classic routing algorithms, TORA can further minimize the total communication energy cost and enhance the QoS performance with low-de- lay effect under the data-gathering condition.
文摘Based on the commonly used indicators for speech intelligibility, this work acoustically evaluates the two largest auditoria in the Faculty of Engineering, Helwan University, Cairo, Egypt, using experimental and digital simulation techniques. Design treatments were also suggested to improve the acoustic performance of the auditoria, where the impact of these treatments was checked using the simutation as well. The models that were analysed using the CATT-software were first validated utilizing the results of the fietd work in the unoccupied rooms. The results showed that the acoustic quality of the two auditoria are far from the optimal conditions due to their improper acoustic characteristics and the high noise revers as weft. The results of improvement proposals showed that altering the ceiling shape and adding efficient absorptive materials to the rear surfaces successfully reduced the excessive reverberation time to the optimal values, increased the earty reflections and eliminated the shadow zones. In addition, decreasing the noise Levels by 20 dB due to improving the window insulation noticeably improved the speech intelligibitity at all receivers.