摘要
语音合成(Text to Speech,TTS)技术是实现人机语音通信的一项关键技术,语音库的质量是决定TTS效果的重要因素。本文针对TTS语音库制作周期长,发音人录音状态(音色、能量)差异而导致的TTS语音数据库录制后能量不一致问题,提出了一种语音能量均衡方法,包括时域包络波动检测和帧能量平均两个步骤。首先分析获得标准语音的相关能量参数和波动参数作为模板,利用时域包络波动检测算法对预调节语音样本的合格性进行检验;最后,根据帧能量平均准则,对所有合格语音样本进行时域幅值调整,以最大限度地保证语音库整体能量的一致性。实验结果表明,本文提出的语音能量均衡方法可以有效提升TTS语音库质量,具有实际工程意义。
The quality of speech library is an important factor,which determines the effect of Speech to Text( TTS). The production cycle of the TTS speech database needs about six months. During the period,the voice state recording needs to be consistent,that is,the tone and energy can not have a big difference,which is more difficult for pronunciation. Thus,this paper gives voice energy balance method,including the time-domain envelope detection algorithm and the frame energy average algorithm,aiming to solve the TTS speech database recording after the phenomenon of inconsistency. Firstly,obtaining the standard speech related energy parameters and wave parameters as a template; secondly,using the time-domain envelope fluctuation detection algorithm to check the pre-regulation speech samples test. Finally according to the frame energy average criterion of all qualified speech samples,adjusting the samples amplitude in time domain value,to maximize the overall energy of the speech database consistency. The experimental results show that the proposed method can effectively improve the quality of the TTS speech database,and has practical engineering significance.
出处
《信号处理》
CSCD
北大核心
2017年第2期229-235,共7页
Journal of Signal Processing
关键词
语音合成
能量均衡
时域包络波动检测
speech to text
energy balance
time-domain envelope detection