摘要
基于SVTTS架构的语音克隆系统采用d-vector描述说话人编码特征,由于该特征提取过程中没有考虑到整段句子的语音信息,从而影响了克隆语音的相似度。针对此问题,提出一种基于x-vector说话人特征的语音克隆方法。该方法采用x-vector作为表征目标说话人的嵌入向量,拼接到合成器中,并通过声码器克隆出目标说话人的语音。实验结果表明采用x-vector的方法提取嵌入向量的相似度更高;与传统方法相比,该方法克隆语音的自然度和相似性分别提升了0.32和0.14。
The voice cloning system based on the speaker verification to multi-speaker text-to-speech(SVTTS)architecture adopts the speaker encoding feature described by d-vector.The speech information of the entire sentence is not considered in the feature extraction process,which affects the similarity of the cloned voice.To address this problem,this paper proposes a method of voice cloning based on x-vector speaker characteristics.This method uses x-vector as the embedding vector characterizing the target speaker,splices it into the synthesizer,and clones the target speaker’s voice through the vocoder.The experimental results show that the x-vector method is used to extract the embedding vector with higher similarity.Compared with the traditional method,the naturalness and similarity of the cloned voice of the proposed method are improved by 0.32 and 0.14,respectively.
作者
张雅欣
张连海
ZHANG Yaxin;ZHANG Lianhai(Zhongyuan Network Security Research Institute, Zhengzhou University, Zhengzhou 450001, China;Information Engineering University, Zhengzhou 450001, China)
出处
《信息工程大学学报》
2020年第6期664-669,共6页
Journal of Information Engineering University
基金
国家自然科学基金资助项目(61673395)。