The present work presents a statistical method to translate human voices across age groups,based on commonalities in voices of blood relations.The age-translated voices have been naturalized extracting the blood relat...The present work presents a statistical method to translate human voices across age groups,based on commonalities in voices of blood relations.The age-translated voices have been naturalized extracting the blood relation features e.g.,pitch,duration,energy,using Mel Frequency Cepstrum Coefficients(MFCC),for social compatibility of the voice-impaired.The system has been demonstrated using standard English and an Indian language.The voice samples for resynthesis were derived from 12 families,with member ages ranging from 8–80 years.The voice-age translation,performed using the Pitch synchronous overlap and add(PSOLA)approach,by modulation of extracted voice features,was validated by perception test.The translated and resynthesized voices were correlated using Linde,Buzo,Gray(LBG),and Kekre’s Fast Codebook generation(KFCG)algorithms.For translated voice targets,a strong(θ>∼93%andθ>∼96%)correlation was found with blood relatives,whereas,a weak(θ<∼78%andθ<∼80%)correlation range was found between different families and different gender from same families.The study further subcategorized the sampling and synthesis of the voices into similar or dissimilar gender groups,using a support vector machine(SVM)choosing between available voice samples.Finally,∼96%,∼93%,and∼94%accuracies were obtained in the identification of the gender of the voice sample,the age group samples,and the correlation between the original and converted voice samples,respectively.The results obtained were close to the natural voice sample features and are envisaged to facilitate a near-natural voice for speech-impaired easily.展开更多
The present system experimentally demonstrates a synthesis of syllables and words from tongue manoeuvers in multiple languages,captured by four oral sensors only.For an experimental demonstration of the system used in...The present system experimentally demonstrates a synthesis of syllables and words from tongue manoeuvers in multiple languages,captured by four oral sensors only.For an experimental demonstration of the system used in the oral cavity,a prototype tooth model was used.Based on the principle developed in a previous publication by the author(s),the proposed system has been implemented using the oral cavity(tongue,teeth,and lips)features alone,without the glottis and the larynx.The positions of the sensors in the proposed system were optimized based on articulatory(oral cavity)gestures estimated by simulating the mechanism of human speech.The system has been tested for all English alphabets and several words with sensor-based input along with an experimental demonstration of the developed algorithm,with limit switches,potentiometer,and flex sensors emulating the tongue in an artificial oral cavity.The system produces the sounds of vowels,consonants,and words in English,along with the pronunciation of meanings of their translations in four major Indian languages,all from oral cavity mapping.The experimental setup also caters to gender mapping of voice.The sound produced from the hardware has been validated by a perceptual test to verify the gender and word of the speech sample by listeners,with∼98%and∼95%accuracy,respectively.Such a model may be useful to interpret speech for those who are speech-disabled because of accidents,neuron disorder,spinal cord injury,or larynx disorder.展开更多
基金The authors would like to acknowledge the Ministry of Electronics and Information Technology(MeitY),Government of India for financial support through the scholarship for Palli Padmini,during research work through Visvesvaraya Ph.D.Scheme for Electronics and IT.
文摘The present work presents a statistical method to translate human voices across age groups,based on commonalities in voices of blood relations.The age-translated voices have been naturalized extracting the blood relation features e.g.,pitch,duration,energy,using Mel Frequency Cepstrum Coefficients(MFCC),for social compatibility of the voice-impaired.The system has been demonstrated using standard English and an Indian language.The voice samples for resynthesis were derived from 12 families,with member ages ranging from 8–80 years.The voice-age translation,performed using the Pitch synchronous overlap and add(PSOLA)approach,by modulation of extracted voice features,was validated by perception test.The translated and resynthesized voices were correlated using Linde,Buzo,Gray(LBG),and Kekre’s Fast Codebook generation(KFCG)algorithms.For translated voice targets,a strong(θ>∼93%andθ>∼96%)correlation was found with blood relatives,whereas,a weak(θ<∼78%andθ<∼80%)correlation range was found between different families and different gender from same families.The study further subcategorized the sampling and synthesis of the voices into similar or dissimilar gender groups,using a support vector machine(SVM)choosing between available voice samples.Finally,∼96%,∼93%,and∼94%accuracies were obtained in the identification of the gender of the voice sample,the age group samples,and the correlation between the original and converted voice samples,respectively.The results obtained were close to the natural voice sample features and are envisaged to facilitate a near-natural voice for speech-impaired easily.
基金The authors would like to acknowledge theMinistry of Electronics and Informa-tion Technology(MeitY)Government of India for financial support through the scholarship for Palli Padmini,during research work through Visvesvaraya Ph.D.Scheme for Electronics and IT.
文摘The present system experimentally demonstrates a synthesis of syllables and words from tongue manoeuvers in multiple languages,captured by four oral sensors only.For an experimental demonstration of the system used in the oral cavity,a prototype tooth model was used.Based on the principle developed in a previous publication by the author(s),the proposed system has been implemented using the oral cavity(tongue,teeth,and lips)features alone,without the glottis and the larynx.The positions of the sensors in the proposed system were optimized based on articulatory(oral cavity)gestures estimated by simulating the mechanism of human speech.The system has been tested for all English alphabets and several words with sensor-based input along with an experimental demonstration of the developed algorithm,with limit switches,potentiometer,and flex sensors emulating the tongue in an artificial oral cavity.The system produces the sounds of vowels,consonants,and words in English,along with the pronunciation of meanings of their translations in four major Indian languages,all from oral cavity mapping.The experimental setup also caters to gender mapping of voice.The sound produced from the hardware has been validated by a perceptual test to verify the gender and word of the speech sample by listeners,with∼98%and∼95%accuracy,respectively.Such a model may be useful to interpret speech for those who are speech-disabled because of accidents,neuron disorder,spinal cord injury,or larynx disorder.