The Internet of Things(IoT)plays an essential role in the current and future generations of information,network,and communication development and applications.This research focuses on vocal tract visualization and mod...The Internet of Things(IoT)plays an essential role in the current and future generations of information,network,and communication development and applications.This research focuses on vocal tract visualization and modeling,which are critical issues in realizing inner vocal tract animation.That is applied in many fields,such as speech training,speech therapy,speech analysis and other speech production-related applications.This work constructed a geometric model by observation of Magnetic Resonance Imaging data,providing a new method to annotate and construct 3D vocal tract organs.The proposed method has two advantages compared with previous methods.Firstly it has a uniform construction protocol for all speech organs.Secondly,this method can build correspondent feature points between different speech organs.There are less than three control parameters can be used to describe every speech organ accurately,for which the accumulated contribution rate is more than 88%.By means of the reconfiguration,the model error is less than 1.0 mm.Regarding to the data from Chinese Magnetic resonance imaging(MRI),this is the first work of 3D vocal tract model.It will promote the theoretical research and development of the intelligent Internet of Things facing speech generation-related issues.展开更多
Background: The present work aims to characterize the profile of patients with stroke treat at a hospital located in the Region of the Mata of Minas Gerais, Brazil, considering the findings of the clinical vocal tract...Background: The present work aims to characterize the profile of patients with stroke treat at a hospital located in the Region of the Mata of Minas Gerais, Brazil, considering the findings of the clinical vocal tract, kind of stroke, age and gender of such patients. Methodology: To obtain data, the clinical profile of 133 patients with a clinical or tomography diagnosis of stroke was analyzed, and the results were presented in percentage. For quantitative data average and analysis the tests were done with associations that held χ2 test, and for significance it was considered p Results: From the total of patients, 63 were women, accounting for 47.4% and the other 52.6% were males. Clinically, they were characterized with the highest percentage for ischemic stroke (89.4%) compared to the hemorrhagic type (10.6%). Most of them were referred for computed tomography (86.5%) and remained hospitalized for an average of 6.496 ± 7.372 days. Similar percentages were obtained in the analysis of the population in question, when considering if they had (54.1%) or not (49.6%) any damage in their speech, language skills or swallowing. There were different types of disabilities in patients with stroke. Men with an average age of 69.8 ± 13.9 presents mostly ischemic stroke, and the majority of patients with stroke had hemiplegia and abnormalities of the vocal tract, dysphasia, and aphasia. While older patients had an ischemic stroke and were presented with left hemiplegia, the younger ones suffered from hemorrhagic strokes that caused a disability characterized as right hemiplegic. Conclusion: Our results show important conclusions regarding the clinical evolution of the vocal tract of patients who suffered strokes during the period of the analysis, being useful for better comprehension of how the vocal tract from these patients evolved according to the kind of stroke, sex and age also allowing a contraposition with other future statistics periods available in literature. It can also be pointed out the difficulties in diagnosing the stroke and the concern with the immediate care, but not with its continuance or with its multidisciplinary approach, giving an evident life risk through dysphasia and the increase of permanent damage when there isn’t an appropriate work done with the patients.展开更多
Directing to the weakness of the present fixed values mapping methods (method_F), a vocal tract system conversion method based on the universal background model (UBM) is proposed for improving the performance of t...Directing to the weakness of the present fixed values mapping methods (method_F), a vocal tract system conversion method based on the universal background model (UBM) is proposed for improving the performance of the speech conversion system from Chinese whis- pered speech to normal speech. For the numerous components of UBM, the errors produced by the acoustical probability density statistical model can't be ignored. Thus an effective Gaus- sian mixture components chosen method based on the posterior probability summation of the minimum spectral distortion is developed to optimizing the system performance. The proposed method (method_U) is analyzed and compared using the performance index (PI) based on Itakura-Saito spectral distortion measure. It is shown experimentally that the performance of method_U is more stability for different speakers and different phonemes than that of method_F. The average PI of method_U is better than method_F. It is shown that by selecting effective Gaussian mixture components, the PI of method_U can be further improved 5.11%. Subjective auditory tests also show that the proposed method can improve the definition and intelligibility of conversion speech.展开更多
A new speech synthesis algorithm based on the LMA filter in Chinese text-to-speech systern is introduced. Using this method, the system can not only generate speech with higher quality, but also have a more powerful ...A new speech synthesis algorithm based on the LMA filter in Chinese text-to-speech systern is introduced. Using this method, the system can not only generate speech with higher quality, but also have a more powerful ability to modify the prosodic parameters, which ensures a far more natural and intelligible synthesized speech than ever before. First, the fundamental principles of the LMA filter and the construction of the synthesizer are presented, then, how to modify the acoustic parameters with this synthesizer is described; finally, the quantitative evaluation of the system's performance is shown while compared with a relatively successful PSOLA synthesizer KDTALK_1展开更多
The geometric and biomechanical properties of the larynx strongly influence voice quality and efficiency. A physical understanding of phonation natures in pathological conditions is important for predictions of how vo...The geometric and biomechanical properties of the larynx strongly influence voice quality and efficiency. A physical understanding of phonation natures in pathological conditions is important for predictions of how voice disorders can be treated using therapy and rehabilitation. Here, we present a continuum-based numerical model of phonation that considers complex fluid-structure interactions occurring in the airway. This model considers a three-dimensional geometry of vocal folds, muscle contractions, and viscoelastic properties to provide a realistic framework of phonation. The vocal fold motion is coupled to an unsteady compressible respiratory flow, allowing numerical simulations of normal and diseased phonations to derive clear relationships between actual laryngeal structures and model parameters such as muscle activity. As a pilot analysis of diseased phonation, we model vocal nodules, the mass lesions that can appear bilaterally on both sides of the vocal folds. Comparison of simulations with and without the nodules demonstrates how the lesions affect vocal fold motion, consequently restricting voice quality. Furthermore, we found that the minimum lung pressure required for voice production increases as nodules move closer to the center of the vocal fold. Thus, simulations using the developed model may provide essential insight into complex phonation phenomena and further elucidate the etiologic mechanisms of voice disorders.展开更多
A monitoring of multiple physical parameters in a moderate seismic area in Western Piedmont (NW Italy) and the simultaneous observation of the behaviour of numerous species of domestic and wild animals gave in a perio...A monitoring of multiple physical parameters in a moderate seismic area in Western Piedmont (NW Italy) and the simultaneous observation of the behaviour of numerous species of domestic and wild animals gave in a period of over twenty years the possibility to distinguish the unusual animal behaviours due to local earthquake nucleation from other causes. In particular, the observation of the body and vocal language of dogs (Canis familiaris) in the same area has permitted not only to specify the different meanings of vocal language in connection to their body language, but also to classify the minimum elements into a vocal language that is linked together by tonal and rhythmical sequences of sounds that form a semantic lexicon. The usage of the same tonal and rhythmical vocal sequences in similar or identical situations, which are experienced by different groups of dogs, induces us to verify whether it could be possible to link particular vocal sequences to precise physical anomalies before earthquakes. The individuation of physical anomalies due to an earthquake nucleation or due to a hydro-geological destabilization, is possible thanks to a continuous long-term monitoring of some parameters. Moreover, the complexity of the vocal language of dogs increases if the dogs live in an area with a law population density. Then the correlation between some vocal sequences and some seismic precursors is better if dogs live free in yard or on farms, if they are in good health, and if they can establish a strong social relation of group. When dogs live closed in yards of houses that are far apart, they communicate with each other with an amazing vocal language, full of questions and answers, imitations of sequences, and information about situations that may be harmful to them.展开更多
基金This work was supported by the Regional Innovation Cooperation Project of Sichuan Province(Grant No.2022YFQ0073).
文摘The Internet of Things(IoT)plays an essential role in the current and future generations of information,network,and communication development and applications.This research focuses on vocal tract visualization and modeling,which are critical issues in realizing inner vocal tract animation.That is applied in many fields,such as speech training,speech therapy,speech analysis and other speech production-related applications.This work constructed a geometric model by observation of Magnetic Resonance Imaging data,providing a new method to annotate and construct 3D vocal tract organs.The proposed method has two advantages compared with previous methods.Firstly it has a uniform construction protocol for all speech organs.Secondly,this method can build correspondent feature points between different speech organs.There are less than three control parameters can be used to describe every speech organ accurately,for which the accumulated contribution rate is more than 88%.By means of the reconfiguration,the model error is less than 1.0 mm.Regarding to the data from Chinese Magnetic resonance imaging(MRI),this is the first work of 3D vocal tract model.It will promote the theoretical research and development of the intelligent Internet of Things facing speech generation-related issues.
文摘Background: The present work aims to characterize the profile of patients with stroke treat at a hospital located in the Region of the Mata of Minas Gerais, Brazil, considering the findings of the clinical vocal tract, kind of stroke, age and gender of such patients. Methodology: To obtain data, the clinical profile of 133 patients with a clinical or tomography diagnosis of stroke was analyzed, and the results were presented in percentage. For quantitative data average and analysis the tests were done with associations that held χ2 test, and for significance it was considered p Results: From the total of patients, 63 were women, accounting for 47.4% and the other 52.6% were males. Clinically, they were characterized with the highest percentage for ischemic stroke (89.4%) compared to the hemorrhagic type (10.6%). Most of them were referred for computed tomography (86.5%) and remained hospitalized for an average of 6.496 ± 7.372 days. Similar percentages were obtained in the analysis of the population in question, when considering if they had (54.1%) or not (49.6%) any damage in their speech, language skills or swallowing. There were different types of disabilities in patients with stroke. Men with an average age of 69.8 ± 13.9 presents mostly ischemic stroke, and the majority of patients with stroke had hemiplegia and abnormalities of the vocal tract, dysphasia, and aphasia. While older patients had an ischemic stroke and were presented with left hemiplegia, the younger ones suffered from hemorrhagic strokes that caused a disability characterized as right hemiplegic. Conclusion: Our results show important conclusions regarding the clinical evolution of the vocal tract of patients who suffered strokes during the period of the analysis, being useful for better comprehension of how the vocal tract from these patients evolved according to the kind of stroke, sex and age also allowing a contraposition with other future statistics periods available in literature. It can also be pointed out the difficulties in diagnosing the stroke and the concern with the immediate care, but not with its continuance or with its multidisciplinary approach, giving an evident life risk through dysphasia and the increase of permanent damage when there isn’t an appropriate work done with the patients.
基金supported by the National Natural Science Foundation of China(61071215)the Science and Technology Foundation of Suzhou(SYG201033)the Pre-research Foundation of Soochow University(Q311901111,14317399)
文摘Directing to the weakness of the present fixed values mapping methods (method_F), a vocal tract system conversion method based on the universal background model (UBM) is proposed for improving the performance of the speech conversion system from Chinese whis- pered speech to normal speech. For the numerous components of UBM, the errors produced by the acoustical probability density statistical model can't be ignored. Thus an effective Gaus- sian mixture components chosen method based on the posterior probability summation of the minimum spectral distortion is developed to optimizing the system performance. The proposed method (method_U) is analyzed and compared using the performance index (PI) based on Itakura-Saito spectral distortion measure. It is shown experimentally that the performance of method_U is more stability for different speakers and different phonemes than that of method_F. The average PI of method_U is better than method_F. It is shown that by selecting effective Gaussian mixture components, the PI of method_U can be further improved 5.11%. Subjective auditory tests also show that the proposed method can improve the definition and intelligibility of conversion speech.
文摘A new speech synthesis algorithm based on the LMA filter in Chinese text-to-speech systern is introduced. Using this method, the system can not only generate speech with higher quality, but also have a more powerful ability to modify the prosodic parameters, which ensures a far more natural and intelligible synthesized speech than ever before. First, the fundamental principles of the LMA filter and the construction of the synthesizer are presented, then, how to modify the acoustic parameters with this synthesizer is described; finally, the quantitative evaluation of the system's performance is shown while compared with a relatively successful PSOLA synthesizer KDTALK_1
文摘The geometric and biomechanical properties of the larynx strongly influence voice quality and efficiency. A physical understanding of phonation natures in pathological conditions is important for predictions of how voice disorders can be treated using therapy and rehabilitation. Here, we present a continuum-based numerical model of phonation that considers complex fluid-structure interactions occurring in the airway. This model considers a three-dimensional geometry of vocal folds, muscle contractions, and viscoelastic properties to provide a realistic framework of phonation. The vocal fold motion is coupled to an unsteady compressible respiratory flow, allowing numerical simulations of normal and diseased phonations to derive clear relationships between actual laryngeal structures and model parameters such as muscle activity. As a pilot analysis of diseased phonation, we model vocal nodules, the mass lesions that can appear bilaterally on both sides of the vocal folds. Comparison of simulations with and without the nodules demonstrates how the lesions affect vocal fold motion, consequently restricting voice quality. Furthermore, we found that the minimum lung pressure required for voice production increases as nodules move closer to the center of the vocal fold. Thus, simulations using the developed model may provide essential insight into complex phonation phenomena and further elucidate the etiologic mechanisms of voice disorders.
文摘A monitoring of multiple physical parameters in a moderate seismic area in Western Piedmont (NW Italy) and the simultaneous observation of the behaviour of numerous species of domestic and wild animals gave in a period of over twenty years the possibility to distinguish the unusual animal behaviours due to local earthquake nucleation from other causes. In particular, the observation of the body and vocal language of dogs (Canis familiaris) in the same area has permitted not only to specify the different meanings of vocal language in connection to their body language, but also to classify the minimum elements into a vocal language that is linked together by tonal and rhythmical sequences of sounds that form a semantic lexicon. The usage of the same tonal and rhythmical vocal sequences in similar or identical situations, which are experienced by different groups of dogs, induces us to verify whether it could be possible to link particular vocal sequences to precise physical anomalies before earthquakes. The individuation of physical anomalies due to an earthquake nucleation or due to a hydro-geological destabilization, is possible thanks to a continuous long-term monitoring of some parameters. Moreover, the complexity of the vocal language of dogs increases if the dogs live in an area with a law population density. Then the correlation between some vocal sequences and some seismic precursors is better if dogs live free in yard or on farms, if they are in good health, and if they can establish a strong social relation of group. When dogs live closed in yards of houses that are far apart, they communicate with each other with an amazing vocal language, full of questions and answers, imitations of sequences, and information about situations that may be harmful to them.