摘要
声音转换技术可以将一个人的语音模式转换为与其特性不同的另一个人语音模式 ,使转换语音保持源说话人原有语音信息内容不变 ,而具有目标说话人的声音特点。本文研究了由遗传算法训练的RBF神经网络捕获说话人的语音频谱包络映射关系 ,以实现不同说话人之间声音特性的转换。实验对六个普通话单元音音素的转换语音质量分别作了客观和主观评估 ,结果表明用神经网络方法可以获得所期望的转换语音性能。实验结果还说明 ,与K -均值法相比 ,用遗传算法训练神经网络可以增强网络的全局寻优能力 ,使转换语音与目标语音的平均频谱失真距离减小约 10 %。
Voice conversion technology makes the speech of one speaker sounds as though it were uttered by another speaker giving it a new identity while preserving the original content. This paper addresses a study on voice conversion using genetic algorithm (GA) to train the hidden layers of RBF neural network, which can help better capture the nonlinear mapping between different speakers. Both subjective evaluations and objective ones are conducted on the transformed speech quality with six mono vowel phones in Mandarin speech. Experimental results show that desired performance of converted speech can be obtained when a neural network method is applied to voice conversion technique. The evaluations report that compared with K means method, a genetic algorithm based RBF network has the ability of global optimization with a 10% decrease in the spectral distance between the transformed speech and the target speech.
出处
《中文信息学报》
CSCD
北大核心
2004年第1期78-84,共7页
Journal of Chinese Information Processing
基金
国家自然科学基金项目 (6 0 172 0 5 5
6 0 12 130 2 )
中科院自动化所领域前沿项目 (1M 0 2J0 5 )
关键词
人工智能
自然语言处理
声音转换
RBF神经网络
遗传算法
线谱频
artificial intelligence
natural language processing
voice conversion
RBF neural network
genetic algorithm
line spectrum frequency