The aim of the study was to evaluate the alterations in speech intelligibility in a cleft palate patient, before and after extending and modifying the palatal contour of the existing prosthesis using a correctable wax...The aim of the study was to evaluate the alterations in speech intelligibility in a cleft palate patient, before and after extending and modifying the palatal contour of the existing prosthesis using a correctable wax recording. An eight-year-old girl studying in second grade with a velopharyngeal defect using an obturator reported to the outpatient clinic complaining of lack in clarity of speech. The existing prosthesis was lacking a speech bulb hence it was decided to add the speech bulb to the existing prosthesis and evaluate the speech. Even after the use of speech bulb it was observed that she was unable to pronounce the vowels and words like shoe, vision, cheer, etc. clearly. Hence, a palatography was done using a correctable wax technique and the existing prosthesis was altered accordingly. Great improvement in speech, mastication, and velopharyngeal function was achieved after the palatography alteration of the existing prosthesis.展开更多
Purpose:There is a growing interest in speech intelligibility and audito ry perception of deaf children.The aim of the present study was to compare speech intelligibility and auditory perception of pre-school children...Purpose:There is a growing interest in speech intelligibility and audito ry perception of deaf children.The aim of the present study was to compare speech intelligibility and auditory perception of pre-school children with Hearing Aid(HA),Cochlear Implant(Cl),and Typical Hearing(TH).Methods:The research design was descriptive-analytic and comparative.The participants comprised 75 male pre-school children aged 4-6 years in the 2017-2018 from Tehran,Iran.The participants were divided into three groups,and each group consisted of 25 children.The first and second groups were respectively selected from pre-school children with HA and CI using the convenience sampling method,while the third group was selected from pre-school children with TH by random sampling method.All children completed Speech Intelligibility Rating and Catego ries of Auditory Performance Questionnaires.Results:The findings indicated that the mean scores of speech intelligibility and auditory perception of the group with TH were significantly higher than those of the other groups(P<0.0001).The mean scores of speech intelligibility in the group with CI did not significantly differ from those of the group with HA(P<0.38).Also,the mean scores of auditory perception in the group with CI were significantly higher than those of the group with HA(P<0.002).Conclusion:The results showed that auditory perception in children with CI was significantly higher than children with HA.This finding highlights the importance of cochlear implantation at a younger age and its significant impact on auditory perception in deaf children.展开更多
Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life.Recently,Machine-learning based approaches to speech enhancement have shown great pro...Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life.Recently,Machine-learning based approaches to speech enhancement have shown great promise for improving speech intelligibility.Two key issues of these approaches are acoustic features extracted from noisy signals and classifiers used for supervised learning.In this paper,features are focused.Multi-resolution power-normalized cepstral coefficients(MRPNCC)are proposed as a new feature to enhance the speech intelligibility for hearing impaired.The new feature is constructed by combining four cepstrum at different time–frequency(T–F)resolutions in order to capture both the local and contextual information.MRPNCC vectors and binary masking labels calculated by signals passed through gammatone filterbank are used to train support vector machine(SVM)classifier,which aim to identify the binary masking values of the T–F units in the enhancement stage.The enhanced speech is synthesized by using the estimated masking values and wiener filtered T–F unit.Objective experimental results demonstrate that the proposed feature is superior to other comparing features in terms of HIT-FA,STOI,HASPI and PESQ,and that the proposed algorithm not only improves speech intelligibility but also improves speech quality slightly.Subjective tests validate the effectiveness of the proposed algorithm for hearing impaired.展开更多
As the primary means of communication,speech is an essential aspect for humans to interact and build connections in the social world.Speech intelligibility is critical in social communication;unintelligibility may lea...As the primary means of communication,speech is an essential aspect for humans to interact and build connections in the social world.Speech intelligibility is critical in social communication;unintelligibility may lead to confusion,misunderstanding,and frustration.Many Chinese learners of English find it challenging to apply English into social interaction and reach mutual intelligibility with international communicators.This article analyzes the obstacles impeding Chinese EFL learners’speech intelligibility development,from the aspects of phonology(segmental and suprasegmental features)and pragmatics.Some strategies are proposed to help Chinese learners ameliorate phonology and pragmatics problems and improve speech intelligibility in English communication.展开更多
This study examines the effect of speech level on intelligibility in different reverberation conditions, and explores the potential of loudness-based reverberation parameters proposed by Lee et al. [J. Acoust. Soc. Am...This study examines the effect of speech level on intelligibility in different reverberation conditions, and explores the potential of loudness-based reverberation parameters proposed by Lee et al. [J. Acoust. Soc. Am., 131(2), 1194-1205 (2012)] to explain the effect of speech level on intelligibility in various reverberation conditions. Listening experiments were performed with three speech levels (LAeq of 55 dB, 65 dB and 75 dB) and three reverberation conditions (T20 of 1.0 s, 1.9 s and 4.0 s), and subjects listened to speech stimuli through headphones. Collected subjective data were compared with two conventional speech intelligibility parameters (Speech Intelligibility Index and Speech Transmission Index) and two loudness-based reverberation parameters (EDTN and TN). Results reveal that the effect of speech level on intelligibility changes with a room’s reverberation conditions, and that increased level results in reduced intelligibility in highly reverberant conditions. EDTN and TN explain this finding better than do STI and SII, because they consider many psychoacoustic phenomena important for the modeling of the effect of speech level varying with reverberation.展开更多
Aims: The purpose of this work is to formulate the requirements for future methods of searching for extra-terrestrial civilizations by use of the concepts of information theory and the theoretically grounded method. M...Aims: The purpose of this work is to formulate the requirements for future methods of searching for extra-terrestrial civilizations by use of the concepts of information theory and the theoretically grounded method. Methodology: To realize it, the number of dimensionless criteria contained in the International System of Units (SI) has been calculated. This value, without additional assumptions, allows us to present a formula for calculating the comparative uncertainty of the model of any physical phenomenon. Based on these formulas, the magnitude of the inevitable threshold of misunderstanding of two civilizations in the universe is determined. Results: New theoretical recommendations for choosing the most effective methods to search the techno signatures of extra-terrestrial civilizations are formulated. Conclusion: Using the calculated amount of information embedded in the model, we showed that the most promising methods for finding potential residents in the Universe should combine frequency radiation with thermal or electromagnetic quantities.展开更多
The paper’s purpose is to design and program the four operation-calculators that receives voice instructions and runs them as either a voice or text phase. The Calculator simulates the work of the Compiler. The paper...The paper’s purpose is to design and program the four operation-calculators that receives voice instructions and runs them as either a voice or text phase. The Calculator simulates the work of the Compiler. The paper is a practical <span style="font-family:Verdana;">example programmed to support that it is possible to construct a verbal</span><span style="font-family:Verdana;"> Compiler.</span>展开更多
The relation between the speech intelligibility of Chinese and the speech transmission index (STI)is discussed, which is based on some useful properties of the modulation transfer function (MTF)and the result obtained...The relation between the speech intelligibility of Chinese and the speech transmission index (STI)is discussed, which is based on some useful properties of the modulation transfer function (MTF)and the result obtained by articulation tests under different signal-to-noise ratios.展开更多
Speech intelligibility (SI) is an important index for the design and assessment of speech purpose hall. The relationship between Chinese speech intelligibility scores in rooms and speech transmission index (STI) under...Speech intelligibility (SI) is an important index for the design and assessment of speech purpose hall. The relationship between Chinese speech intelligibility scores in rooms and speech transmission index (STI) under diotic listening condition was studied using monaural room impulse responses obtained from the room acoustical simulation software Odeon in previous paper. The present study employs the simulated binaural room impulse responses and auralization technique to obtain the subjective Chi- nese speech intelligibility scores using rhyme test. The relationship between Chinese speech intelligi- bility scores and STI is built and validated in rooms using dichotic (binaural) listening. The result shows that there is a high correlation between Chinese speech intelligibility scores and STI using di- chotic listening. The relationship between Chinese speech intelligibility scores and STI under diotic and dichotic listening conditions is also analyzed. Compared with diotic listening, dichotic (binaural) listening (an actual listening situation) can improve 2.7 dB signal-to-noise ratio for Mandarin Chinese speech intelligibility. STI method can predict and evaluate the speech intelligibility for Mandarin Chi- nese in rooms for dichotic (binaural) listening.展开更多
Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotiona...Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotional states of speakers holds significant importance in a range of real-time applications,including but not limited to virtual reality,human-robot interaction,emergency centers,and human behavior assessment.Accurately identifying emotions in the SER process relies on extracting relevant information from audio inputs.Previous studies on SER have predominantly utilized short-time characteristics such as Mel Frequency Cepstral Coefficients(MFCCs)due to their ability to capture the periodic nature of audio signals effectively.Although these traits may improve their ability to perceive and interpret emotional depictions appropriately,MFCCS has some limitations.So this study aims to tackle the aforementioned issue by systematically picking multiple audio cues,enhancing the classifier model’s efficacy in accurately discerning human emotions.The utilized dataset is taken from the EMO-DB database,preprocessing input speech is done using a 2D Convolution Neural Network(CNN)involves applying convolutional operations to spectrograms as they afford a visual representation of the way the audio signal frequency content changes over time.The next step is the spectrogram data normalization which is crucial for Neural Network(NN)training as it aids in faster convergence.Then the five auditory features MFCCs,Chroma,Mel-Spectrogram,Contrast,and Tonnetz are extracted from the spectrogram sequentially.The attitude of feature selection is to retain only dominant features by excluding the irrelevant ones.In this paper,the Sequential Forward Selection(SFS)and Sequential Backward Selection(SBS)techniques were employed for multiple audio cues features selection.Finally,the feature sets composed from the hybrid feature extraction methods are fed into the deep Bidirectional Long Short Term Memory(Bi-LSTM)network to discern emotions.Since the deep Bi-LSTM can hierarchically learn complex features and increases model capacity by achieving more robust temporal modeling,it is more effective than a shallow Bi-LSTM in capturing the intricate tones of emotional content existent in speech signals.The effectiveness and resilience of the proposed SER model were evaluated by experiments,comparing it to state-of-the-art SER techniques.The results indicated that the model achieved accuracy rates of 90.92%,93%,and 92%over the Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS),Berlin Database of Emotional Speech(EMO-DB),and The Interactive Emotional Dyadic Motion Capture(IEMOCAP)datasets,respectively.These findings signify a prominent enhancement in the ability to emotional depictions identification in speech,showcasing the potential of the proposed model in advancing the SER field.展开更多
To explain the reactions of the building occupants to their acoustical environments,meetings with the designers,walk-through surveys,and detailed acoustical measurements were done.The objective was to determine how de...To explain the reactions of the building occupants to their acoustical environments,meetings with the designers,walk-through surveys,and detailed acoustical measurements were done.The objective was to determine how design decisions affect office acoustical environments,and how to improve the acoustical design of‘green’office buildings.Design-performance criteria were established.Measurements were made of noise level,reverberation time,speechintelligibility index(SII),and noise isolation.Noise levels were atypically low in unoccupied buildings with no mechanical ventilation,but excessive in areas near external walls next to noisy external noise sources—especially with windows open for ventilation—and in occupied buildings.Reverberation times were excessive in areas with large volumes and insufficient sound absorption.Speech intelligibility was generally adequate,but speech privacy was inadequate in shared and open-office areas,and into private offices with the doors open for ventilation.Improvement of the acoustical design of‘green’buildings must include increasing the external-internal noise isolation and that between workplaces,and the use of adequate sound absorption to control reverberation and noise.展开更多
The existing auditory computational mod- els for evaluating speech intelligibility can only account for energetic masking, and the effect of informational masking is rarely described in these models. This study was ai...The existing auditory computational mod- els for evaluating speech intelligibility can only account for energetic masking, and the effect of informational masking is rarely described in these models. This study was aimed to make a computational model considering the mechanism of informational masking. Several psy- choacoustic experiments were conducted to test the ef- fect of informational masking on speech intelligibility by manipulating the number of masking talker, speech rate, and the similarity of F0 contour between target and masker. The results showed that the speech recep- tion threshold for the target increased as the F0 contours of the masker became more similar to that of the tar- get, suggesting that the difficulty in segregating the tar- get harmonics from the masker harmonics may underlie the informational masking effect. Based on these stud- ies, a new auditory computational model was made by inducing the auditory function of harmonic extraction to the traditional model of speech intelligibility index (SII), named as harmonic extraction (HF) model. The predictions of the HF model are highly consistent with the experimental results.展开更多
In order to investigate the influence of dummy head on measuring speech intelligi- bility, the objective and subjective speech intelligibility evaluation experiments were respectively carried out for different spatial...In order to investigate the influence of dummy head on measuring speech intelligi- bility, the objective and subjective speech intelligibility evaluation experiments were respectively carried out for different spatial configurations of a target source and a noise source in the horizontal plane. The differences between standard STIPA measured without a dummy head and binaural STIPA measured with a dummy head were compared and the correlation of subjective speech intelligibility and objective STIPA was analyzed. It is showed that the position of sound source affects significantly on binaural STIPA and subjective intelligibility measured by a dummy head or measured in a real-life scenario. The standard STIPA is closer to the lower value of the two binaural STIPA values. The speech intelligibility is higher for a single ear which is on the same side with the target source or on the other side of the noise source. Binaural speech intelligibility is always the lowest when both target and noise sources are at the same place but once apart the speech intelligibility will increase sharply. It is also found that the subjective intelligibility measured by a dummy head or measured in a real-life scenario is uncorrelated with standard STIPA, but correlated highly with STIPA measured with a dummy head. The subjective intelligibility of one single ear is correlated highly with STIPA measured at the same ear, and the binaural speech intelligibility is in well agreement with the higher value of the two binaural STIPA values.展开更多
Purpose:Our study aims to compare speech understanding in noise and spectral-temporal resolution skills with regard to the degree of hearing loss,age,hearing aid use experience and gender of hearing aid users.Methods:...Purpose:Our study aims to compare speech understanding in noise and spectral-temporal resolution skills with regard to the degree of hearing loss,age,hearing aid use experience and gender of hearing aid users.Methods:Our study included sixty-eight hearing aid users aged between 40-70 years,with bilateral mild and moderate symmetrical sensorineural hearing loss.Random gap detection test,Turkish matrix test and spectral-temporally modulated ripple test were implemented on the participants with bilateral hearing aids.The test results acquired were compared statistically according to different variables and the correlations were examined.Results:No statistically significant differences were observed for speech-in-noise recognition,spectraltemporal resolution among older and younger adults in hearing aid users(p>0.05).There wasn’t found a statistically significant difference among test outcomes as regards different hearing loss degrees(p>0.05).Higher performances were obtained in terms of temporal resolution in male participants and participants with more hearing aid use experience(p<0.05).Significant correlations were obtained between the results of speech-in-noise recognition,temporal resolution and spectral resolution tests performed with hearing aids(p<0.05).Conclusion:Our study findings emphasized the importance of regular hearing aid use and it showed that some auditory skills can be improved with hearing aids.Observation of correlations among the speechin-noise recognition,temporal resolution and spectral resolution tests have revealed that these skills should be evaluated as a whole to maximize the patient’s communication abilities.展开更多
Diagnosing a baby’s feelings poses a challenge for both doctors and parents because babies cannot explain their feelings through expression or speech.Understanding the emotions of babies and their associated expressi...Diagnosing a baby’s feelings poses a challenge for both doctors and parents because babies cannot explain their feelings through expression or speech.Understanding the emotions of babies and their associated expressions during different sensations such as hunger,pain,etc.,is a complicated task.In infancy,all communication and feelings are propagated through cryspeech,which is a natural phenomenon.Several clinical methods can be used to diagnose a baby’s diseases,but nonclinical methods of diagnosing a baby’s feelings are lacking.As such,in this study,we aimed to identify babies’feelings and emotions through their cry using a nonclinical method.Changes in the cry sound can be identified using our method and used to assess the baby’s feelings.We considered the frequency of the cries from the energy of the sound.The feelings represented by the infant’s cry are judged to represent certain sensations expressed by the child using the optimal frequency of the recognition of a real-world audio sound.We used machine learning and artificial intelligence to distinguish cry tones in real time through feature analysis.The experimental group consisted of 50%each male and female babies,and we determined the relevancy of the results against different parameters.This application produced real-time results after recognizing a child’s cry sounds.The novelty of our work is that we,for the first time,successfully derived the feelings of young children through the cry-speech of the child,showing promise for end-user applications.展开更多
In the speech recognition system,the acoustic model is an important underlying model,and its accuracy directly affects the performance of the entire system.This paper introduces the construction and training process o...In the speech recognition system,the acoustic model is an important underlying model,and its accuracy directly affects the performance of the entire system.This paper introduces the construction and training process of the acoustic model in detail and studies the Connectionist temporal classification(CTC)algorithm,which plays an important role in the end-to-end framework,established a convolutional neural network(CNN)combined with an acoustic model of Connectionist temporal classification to improve the accuracy of speech recognition.This study uses a sound sensor,ReSpeakerMic Array v2.0.1,to convert the collected speech signals into text or corresponding speech signals to improve communication and reduce noise and hardware interference.The baseline acousticmodel in this study faces challenges such as long training time,high error rate,and a certain degree of overfitting.The model is trained through continuous design and improvement of the relevant parameters of the acousticmodel,and finally the performance is selected according to the evaluation index.Excellentmodel,which reduces the error rate to about 18%,thus improving the accuracy rate.Finally,comparative verificationwas carried out from the selection of acoustic feature parameters,the selection of modeling units,and the speaker’s speech rate,which further verified the excellent performance of the CTCCNN_5+BN+Residual model structure.In terms of experiments,to train and verify the CTC-CNN baseline acoustic model,this study uses THCHS-30 and ST-CMDS speech data sets as training data sets,and after 54 epochs of training,the word error rate of the acoustic model training set is 31%,the word error rate of the test set is stable at about 43%.This experiment also considers the surrounding environmental noise.Under the noise level of 80∼90 dB,the accuracy rate is 88.18%,which is the worst performance among all levels.In contrast,at 40–60 dB,the accuracy was as high as 97.33%due to less noise pollution.展开更多
Speech plays an extremely important role in social activities.Many individuals suffer from a“speech barrier,”which limits their communication with others.In this study,an improved speech recognitionmethod is propose...Speech plays an extremely important role in social activities.Many individuals suffer from a“speech barrier,”which limits their communication with others.In this study,an improved speech recognitionmethod is proposed that addresses the needs of speech-impaired and deaf individuals.A basic improved connectionist temporal classification convolutional neural network(CTC-CNN)architecture acoustic model was constructed by combining a speech database with a deep neural network.Acoustic sensors were used to convert the collected voice signals into text or corresponding voice signals to improve communication.The method can be extended to modern artificial intelligence techniques,with multiple applications such as meeting minutes,medical reports,and verbatim records for cars,sales,etc.For experiments,a modified CTC-CNN was used to train an acoustic model,which showed better performance than the earlier common algorithms.Thus a CTC-CNN baseline acoustic model was constructed and optimized,which reduced the error rate to about 18%and improved the accuracy rate.展开更多
文摘The aim of the study was to evaluate the alterations in speech intelligibility in a cleft palate patient, before and after extending and modifying the palatal contour of the existing prosthesis using a correctable wax recording. An eight-year-old girl studying in second grade with a velopharyngeal defect using an obturator reported to the outpatient clinic complaining of lack in clarity of speech. The existing prosthesis was lacking a speech bulb hence it was decided to add the speech bulb to the existing prosthesis and evaluate the speech. Even after the use of speech bulb it was observed that she was unable to pronounce the vowels and words like shoe, vision, cheer, etc. clearly. Hence, a palatography was done using a correctable wax technique and the existing prosthesis was altered accordingly. Great improvement in speech, mastication, and velopharyngeal function was achieved after the palatography alteration of the existing prosthesis.
文摘Purpose:There is a growing interest in speech intelligibility and audito ry perception of deaf children.The aim of the present study was to compare speech intelligibility and auditory perception of pre-school children with Hearing Aid(HA),Cochlear Implant(Cl),and Typical Hearing(TH).Methods:The research design was descriptive-analytic and comparative.The participants comprised 75 male pre-school children aged 4-6 years in the 2017-2018 from Tehran,Iran.The participants were divided into three groups,and each group consisted of 25 children.The first and second groups were respectively selected from pre-school children with HA and CI using the convenience sampling method,while the third group was selected from pre-school children with TH by random sampling method.All children completed Speech Intelligibility Rating and Catego ries of Auditory Performance Questionnaires.Results:The findings indicated that the mean scores of speech intelligibility and auditory perception of the group with TH were significantly higher than those of the other groups(P<0.0001).The mean scores of speech intelligibility in the group with CI did not significantly differ from those of the group with HA(P<0.38).Also,the mean scores of auditory perception in the group with CI were significantly higher than those of the group with HA(P<0.002).Conclusion:The results showed that auditory perception in children with CI was significantly higher than children with HA.This finding highlights the importance of cochlear implantation at a younger age and its significant impact on auditory perception in deaf children.
基金supported by the National Natural Science Foundation of China(Nos.61902158,61673108)the Science and Technology Program of Nantong(JC2018129,MS12018082)Top-notch Academic Programs Project of Jiangsu Higher Education Institu-tions(PPZY2015B135).
文摘Speech intelligibility enhancement in noisy environments is still one of the major challenges for hearing impaired in everyday life.Recently,Machine-learning based approaches to speech enhancement have shown great promise for improving speech intelligibility.Two key issues of these approaches are acoustic features extracted from noisy signals and classifiers used for supervised learning.In this paper,features are focused.Multi-resolution power-normalized cepstral coefficients(MRPNCC)are proposed as a new feature to enhance the speech intelligibility for hearing impaired.The new feature is constructed by combining four cepstrum at different time–frequency(T–F)resolutions in order to capture both the local and contextual information.MRPNCC vectors and binary masking labels calculated by signals passed through gammatone filterbank are used to train support vector machine(SVM)classifier,which aim to identify the binary masking values of the T–F units in the enhancement stage.The enhanced speech is synthesized by using the estimated masking values and wiener filtered T–F unit.Objective experimental results demonstrate that the proposed feature is superior to other comparing features in terms of HIT-FA,STOI,HASPI and PESQ,and that the proposed algorithm not only improves speech intelligibility but also improves speech quality slightly.Subjective tests validate the effectiveness of the proposed algorithm for hearing impaired.
文摘As the primary means of communication,speech is an essential aspect for humans to interact and build connections in the social world.Speech intelligibility is critical in social communication;unintelligibility may lead to confusion,misunderstanding,and frustration.Many Chinese learners of English find it challenging to apply English into social interaction and reach mutual intelligibility with international communicators.This article analyzes the obstacles impeding Chinese EFL learners’speech intelligibility development,from the aspects of phonology(segmental and suprasegmental features)and pragmatics.Some strategies are proposed to help Chinese learners ameliorate phonology and pragmatics problems and improve speech intelligibility in English communication.
文摘This study examines the effect of speech level on intelligibility in different reverberation conditions, and explores the potential of loudness-based reverberation parameters proposed by Lee et al. [J. Acoust. Soc. Am., 131(2), 1194-1205 (2012)] to explain the effect of speech level on intelligibility in various reverberation conditions. Listening experiments were performed with three speech levels (LAeq of 55 dB, 65 dB and 75 dB) and three reverberation conditions (T20 of 1.0 s, 1.9 s and 4.0 s), and subjects listened to speech stimuli through headphones. Collected subjective data were compared with two conventional speech intelligibility parameters (Speech Intelligibility Index and Speech Transmission Index) and two loudness-based reverberation parameters (EDTN and TN). Results reveal that the effect of speech level on intelligibility changes with a room’s reverberation conditions, and that increased level results in reduced intelligibility in highly reverberant conditions. EDTN and TN explain this finding better than do STI and SII, because they consider many psychoacoustic phenomena important for the modeling of the effect of speech level varying with reverberation.
文摘Aims: The purpose of this work is to formulate the requirements for future methods of searching for extra-terrestrial civilizations by use of the concepts of information theory and the theoretically grounded method. Methodology: To realize it, the number of dimensionless criteria contained in the International System of Units (SI) has been calculated. This value, without additional assumptions, allows us to present a formula for calculating the comparative uncertainty of the model of any physical phenomenon. Based on these formulas, the magnitude of the inevitable threshold of misunderstanding of two civilizations in the universe is determined. Results: New theoretical recommendations for choosing the most effective methods to search the techno signatures of extra-terrestrial civilizations are formulated. Conclusion: Using the calculated amount of information embedded in the model, we showed that the most promising methods for finding potential residents in the Universe should combine frequency radiation with thermal or electromagnetic quantities.
文摘The paper’s purpose is to design and program the four operation-calculators that receives voice instructions and runs them as either a voice or text phase. The Calculator simulates the work of the Compiler. The paper is a practical <span style="font-family:Verdana;">example programmed to support that it is possible to construct a verbal</span><span style="font-family:Verdana;"> Compiler.</span>
文摘The relation between the speech intelligibility of Chinese and the speech transmission index (STI)is discussed, which is based on some useful properties of the modulation transfer function (MTF)and the result obtained by articulation tests under different signal-to-noise ratios.
基金the National Natural Science Foundation of China (Grant No. 10774048)
文摘Speech intelligibility (SI) is an important index for the design and assessment of speech purpose hall. The relationship between Chinese speech intelligibility scores in rooms and speech transmission index (STI) under diotic listening condition was studied using monaural room impulse responses obtained from the room acoustical simulation software Odeon in previous paper. The present study employs the simulated binaural room impulse responses and auralization technique to obtain the subjective Chi- nese speech intelligibility scores using rhyme test. The relationship between Chinese speech intelligi- bility scores and STI is built and validated in rooms using dichotic (binaural) listening. The result shows that there is a high correlation between Chinese speech intelligibility scores and STI using di- chotic listening. The relationship between Chinese speech intelligibility scores and STI under diotic and dichotic listening conditions is also analyzed. Compared with diotic listening, dichotic (binaural) listening (an actual listening situation) can improve 2.7 dB signal-to-noise ratio for Mandarin Chinese speech intelligibility. STI method can predict and evaluate the speech intelligibility for Mandarin Chi- nese in rooms for dichotic (binaural) listening.
文摘Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional state.The examination of the emotional states of speakers holds significant importance in a range of real-time applications,including but not limited to virtual reality,human-robot interaction,emergency centers,and human behavior assessment.Accurately identifying emotions in the SER process relies on extracting relevant information from audio inputs.Previous studies on SER have predominantly utilized short-time characteristics such as Mel Frequency Cepstral Coefficients(MFCCs)due to their ability to capture the periodic nature of audio signals effectively.Although these traits may improve their ability to perceive and interpret emotional depictions appropriately,MFCCS has some limitations.So this study aims to tackle the aforementioned issue by systematically picking multiple audio cues,enhancing the classifier model’s efficacy in accurately discerning human emotions.The utilized dataset is taken from the EMO-DB database,preprocessing input speech is done using a 2D Convolution Neural Network(CNN)involves applying convolutional operations to spectrograms as they afford a visual representation of the way the audio signal frequency content changes over time.The next step is the spectrogram data normalization which is crucial for Neural Network(NN)training as it aids in faster convergence.Then the five auditory features MFCCs,Chroma,Mel-Spectrogram,Contrast,and Tonnetz are extracted from the spectrogram sequentially.The attitude of feature selection is to retain only dominant features by excluding the irrelevant ones.In this paper,the Sequential Forward Selection(SFS)and Sequential Backward Selection(SBS)techniques were employed for multiple audio cues features selection.Finally,the feature sets composed from the hybrid feature extraction methods are fed into the deep Bidirectional Long Short Term Memory(Bi-LSTM)network to discern emotions.Since the deep Bi-LSTM can hierarchically learn complex features and increases model capacity by achieving more robust temporal modeling,it is more effective than a shallow Bi-LSTM in capturing the intricate tones of emotional content existent in speech signals.The effectiveness and resilience of the proposed SER model were evaluated by experiments,comparing it to state-of-the-art SER techniques.The results indicated that the model achieved accuracy rates of 90.92%,93%,and 92%over the Ryerson Audio-Visual Database of Emotional Speech and Song(RAVDESS),Berlin Database of Emotional Speech(EMO-DB),and The Interactive Emotional Dyadic Motion Capture(IEMOCAP)datasets,respectively.These findings signify a prominent enhancement in the ability to emotional depictions identification in speech,showcasing the potential of the proposed model in advancing the SER field.
文摘To explain the reactions of the building occupants to their acoustical environments,meetings with the designers,walk-through surveys,and detailed acoustical measurements were done.The objective was to determine how design decisions affect office acoustical environments,and how to improve the acoustical design of‘green’office buildings.Design-performance criteria were established.Measurements were made of noise level,reverberation time,speechintelligibility index(SII),and noise isolation.Noise levels were atypically low in unoccupied buildings with no mechanical ventilation,but excessive in areas near external walls next to noisy external noise sources—especially with windows open for ventilation—and in occupied buildings.Reverberation times were excessive in areas with large volumes and insufficient sound absorption.Speech intelligibility was generally adequate,but speech privacy was inadequate in shared and open-office areas,and into private offices with the doors open for ventilation.Improvement of the acoustical design of‘green’buildings must include increasing the external-internal noise isolation and that between workplaces,and the use of adequate sound absorption to control reverberation and noise.
文摘The existing auditory computational mod- els for evaluating speech intelligibility can only account for energetic masking, and the effect of informational masking is rarely described in these models. This study was aimed to make a computational model considering the mechanism of informational masking. Several psy- choacoustic experiments were conducted to test the ef- fect of informational masking on speech intelligibility by manipulating the number of masking talker, speech rate, and the similarity of F0 contour between target and masker. The results showed that the speech recep- tion threshold for the target increased as the F0 contours of the masker became more similar to that of the tar- get, suggesting that the difficulty in segregating the tar- get harmonics from the masker harmonics may underlie the informational masking effect. Based on these stud- ies, a new auditory computational model was made by inducing the auditory function of harmonic extraction to the traditional model of speech intelligibility index (SII), named as harmonic extraction (HF) model. The predictions of the HF model are highly consistent with the experimental results.
基金supported by the National Nature Science Foundation of China(11204278)
文摘In order to investigate the influence of dummy head on measuring speech intelligi- bility, the objective and subjective speech intelligibility evaluation experiments were respectively carried out for different spatial configurations of a target source and a noise source in the horizontal plane. The differences between standard STIPA measured without a dummy head and binaural STIPA measured with a dummy head were compared and the correlation of subjective speech intelligibility and objective STIPA was analyzed. It is showed that the position of sound source affects significantly on binaural STIPA and subjective intelligibility measured by a dummy head or measured in a real-life scenario. The standard STIPA is closer to the lower value of the two binaural STIPA values. The speech intelligibility is higher for a single ear which is on the same side with the target source or on the other side of the noise source. Binaural speech intelligibility is always the lowest when both target and noise sources are at the same place but once apart the speech intelligibility will increase sharply. It is also found that the subjective intelligibility measured by a dummy head or measured in a real-life scenario is uncorrelated with standard STIPA, but correlated highly with STIPA measured with a dummy head. The subjective intelligibility of one single ear is correlated highly with STIPA measured at the same ear, and the binaural speech intelligibility is in well agreement with the higher value of the two binaural STIPA values.
文摘Purpose:Our study aims to compare speech understanding in noise and spectral-temporal resolution skills with regard to the degree of hearing loss,age,hearing aid use experience and gender of hearing aid users.Methods:Our study included sixty-eight hearing aid users aged between 40-70 years,with bilateral mild and moderate symmetrical sensorineural hearing loss.Random gap detection test,Turkish matrix test and spectral-temporally modulated ripple test were implemented on the participants with bilateral hearing aids.The test results acquired were compared statistically according to different variables and the correlations were examined.Results:No statistically significant differences were observed for speech-in-noise recognition,spectraltemporal resolution among older and younger adults in hearing aid users(p>0.05).There wasn’t found a statistically significant difference among test outcomes as regards different hearing loss degrees(p>0.05).Higher performances were obtained in terms of temporal resolution in male participants and participants with more hearing aid use experience(p<0.05).Significant correlations were obtained between the results of speech-in-noise recognition,temporal resolution and spectral resolution tests performed with hearing aids(p<0.05).Conclusion:Our study findings emphasized the importance of regular hearing aid use and it showed that some auditory skills can be improved with hearing aids.Observation of correlations among the speechin-noise recognition,temporal resolution and spectral resolution tests have revealed that these skills should be evaluated as a whole to maximize the patient’s communication abilities.
基金This research was funded by the Deanship of Scientific Research,Najran University,Kingdom of Saudi Arabia,grant number NU/RC/SERC/11/5.
文摘Diagnosing a baby’s feelings poses a challenge for both doctors and parents because babies cannot explain their feelings through expression or speech.Understanding the emotions of babies and their associated expressions during different sensations such as hunger,pain,etc.,is a complicated task.In infancy,all communication and feelings are propagated through cryspeech,which is a natural phenomenon.Several clinical methods can be used to diagnose a baby’s diseases,but nonclinical methods of diagnosing a baby’s feelings are lacking.As such,in this study,we aimed to identify babies’feelings and emotions through their cry using a nonclinical method.Changes in the cry sound can be identified using our method and used to assess the baby’s feelings.We considered the frequency of the cries from the energy of the sound.The feelings represented by the infant’s cry are judged to represent certain sensations expressed by the child using the optimal frequency of the recognition of a real-world audio sound.We used machine learning and artificial intelligence to distinguish cry tones in real time through feature analysis.The experimental group consisted of 50%each male and female babies,and we determined the relevancy of the results against different parameters.This application produced real-time results after recognizing a child’s cry sounds.The novelty of our work is that we,for the first time,successfully derived the feelings of young children through the cry-speech of the child,showing promise for end-user applications.
基金Supported by the Department of Electrical Engineering at National Chin-Yi University of TechnologyNational Chin-Yi University of Technology,TakmingUniversity of Science and Technology,Taiwan,for supporting this research。
文摘In the speech recognition system,the acoustic model is an important underlying model,and its accuracy directly affects the performance of the entire system.This paper introduces the construction and training process of the acoustic model in detail and studies the Connectionist temporal classification(CTC)algorithm,which plays an important role in the end-to-end framework,established a convolutional neural network(CNN)combined with an acoustic model of Connectionist temporal classification to improve the accuracy of speech recognition.This study uses a sound sensor,ReSpeakerMic Array v2.0.1,to convert the collected speech signals into text or corresponding speech signals to improve communication and reduce noise and hardware interference.The baseline acousticmodel in this study faces challenges such as long training time,high error rate,and a certain degree of overfitting.The model is trained through continuous design and improvement of the relevant parameters of the acousticmodel,and finally the performance is selected according to the evaluation index.Excellentmodel,which reduces the error rate to about 18%,thus improving the accuracy rate.Finally,comparative verificationwas carried out from the selection of acoustic feature parameters,the selection of modeling units,and the speaker’s speech rate,which further verified the excellent performance of the CTCCNN_5+BN+Residual model structure.In terms of experiments,to train and verify the CTC-CNN baseline acoustic model,this study uses THCHS-30 and ST-CMDS speech data sets as training data sets,and after 54 epochs of training,the word error rate of the acoustic model training set is 31%,the word error rate of the test set is stable at about 43%.This experiment also considers the surrounding environmental noise.Under the noise level of 80∼90 dB,the accuracy rate is 88.18%,which is the worst performance among all levels.In contrast,at 40–60 dB,the accuracy was as high as 97.33%due to less noise pollution.
基金This research was supported by the Department of Electrical Engineering at National Chin-Yi University of Technology.The authors would like to thank the National Chin-Yi University of Technology,TakmingUniversity of Science and Technology,Taiwan,for supporting this research.
文摘Speech plays an extremely important role in social activities.Many individuals suffer from a“speech barrier,”which limits their communication with others.In this study,an improved speech recognitionmethod is proposed that addresses the needs of speech-impaired and deaf individuals.A basic improved connectionist temporal classification convolutional neural network(CTC-CNN)architecture acoustic model was constructed by combining a speech database with a deep neural network.Acoustic sensors were used to convert the collected voice signals into text or corresponding voice signals to improve communication.The method can be extended to modern artificial intelligence techniques,with multiple applications such as meeting minutes,medical reports,and verbatim records for cars,sales,etc.For experiments,a modified CTC-CNN was used to train an acoustic model,which showed better performance than the earlier common algorithms.Thus a CTC-CNN baseline acoustic model was constructed and optimized,which reduced the error rate to about 18%and improved the accuracy rate.