期刊文献+

语音合成系统中语音库样本能量均衡方法研究 被引量:4

Voice Energy Balance Method for Text to Speech Database
下载PDF
导出
摘要 语音合成(Text to Speech,TTS)技术是实现人机语音通信的一项关键技术,语音库的质量是决定TTS效果的重要因素。本文针对TTS语音库制作周期长,发音人录音状态(音色、能量)差异而导致的TTS语音数据库录制后能量不一致问题,提出了一种语音能量均衡方法,包括时域包络波动检测和帧能量平均两个步骤。首先分析获得标准语音的相关能量参数和波动参数作为模板,利用时域包络波动检测算法对预调节语音样本的合格性进行检验;最后,根据帧能量平均准则,对所有合格语音样本进行时域幅值调整,以最大限度地保证语音库整体能量的一致性。实验结果表明,本文提出的语音能量均衡方法可以有效提升TTS语音库质量,具有实际工程意义。 The quality of speech library is an important factor,which determines the effect of Speech to Text( TTS). The production cycle of the TTS speech database needs about six months. During the period,the voice state recording needs to be consistent,that is,the tone and energy can not have a big difference,which is more difficult for pronunciation. Thus,this paper gives voice energy balance method,including the time-domain envelope detection algorithm and the frame energy average algorithm,aiming to solve the TTS speech database recording after the phenomenon of inconsistency. Firstly,obtaining the standard speech related energy parameters and wave parameters as a template; secondly,using the time-domain envelope fluctuation detection algorithm to check the pre-regulation speech samples test. Finally according to the frame energy average criterion of all qualified speech samples,adjusting the samples amplitude in time domain value,to maximize the overall energy of the speech database consistency. The experimental results show that the proposed method can effectively improve the quality of the TTS speech database,and has practical engineering significance.
作者 刘伟 谢建志
出处 《信号处理》 CSCD 北大核心 2017年第2期229-235,共7页 Journal of Signal Processing
关键词 语音合成 能量均衡 时域包络波动检测 speech to text energy balance time-domain envelope detection
  • 相关文献

参考文献3

二级参考文献37

  • 1陈振标,徐波.基于子带能量特征的最优化语音端点检测算法研究[J].声学学报,2005,30(2):171-176. 被引量:22
  • 2卜凡亮,王为民,戴启军,陈砚圃.基于噪声被掩蔽概率的优化语音增强方法[J].电子与信息学报,2005,27(5):753-756. 被引量:16
  • 3陶智,赵鹤鸣,龚呈卉.基于听觉掩蔽效应和Bark子波变换的语音增强[J].声学学报,2005,30(4):367-372. 被引量:39
  • 4Loizou P.C,Kim G.Reasons why Current Speech-Enhancement Algorithms do not improve Speech Intdligibility and Suggested Solutions[J].IEEE Transactions on Acoustics,Speech,and Signal Processing,2011,19(1):47-56.
  • 5Virag.N.Signal channel speech enhancement based on masking properties of the human auditory system[J].IEEE Transactions on Acoustics,Speech,and Signal Processing,1999,7(2):126-137.
  • 6Cao L,Zhang T.Q,Gao H.X,Yi C.Multi-band Spectral Subtraction Method Combined with Auditory Masking Properties for Speech Enhancement[C]//International Congress on Image and Signal Processing,Chongqing,China,2012:72-76.
  • 7Lu C-T.Enhancement of single channel speech using perceptual-decision-directed approach[J].Speech Communication,2011,53(3):495-507.
  • 8Lu C-T.Reduction of musical residual noise for speech enhancement using masking properties and optimal smoothing[J].Pattern Recognition Letters,2007,49 (4):1300-1306.
  • 9Hu Y,Loizou P.C.A perceptually Motivated Approach for Speech Enhancement[J].IEEE Transactions on Acoustics,Speech,and Signal Processing,2003,11 (5):457-465.
  • 10Lu C-T,Tseng K-F.A gain factor adopted by masking property and SNR variation for speech enhancement in colored-noise corruptions[J].Computer Speech and Langnage,2010,24(3):632-647.

共引文献27

同被引文献52

引证文献4

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部