基于神经网络由语音预测视位参数被引量：2

Predicting Viseme Parameters from Speech Based on Neural Network

下载PDF

导出

摘要语音是由多个发音器官共同作用产生的,发音器官动作与语音之间有着内在的必然联系.研究了利用神经网络预测视位参数中的选择语音参数、确定输入语音时域范围、优化神经网络结构等因素.实验结果表明,线性预测参数加短时能量优于其他语音参数,前向协同发音较后向协同发音影响更大,反馈对前馈神经网络的性能有所改善.考虑到实验采用的是任意连续语流,均方误差约为0.0114的实验结果还是很有吸引力的. Speech is produced by co-operation of all speech organs, and there are inherent relations between speech and movement of speech organs. To predict viseme parameters from speech using neural network, input speech parameters selection, time domain and structure of neural network were studied. Experiment results show that LPC coefficient plus short time energy are superior to other speech parameters, forward co-articulation is more server than backward co-articulation, and a delay feedback can improve the forward neural network performance. Considering experiments were based on unlimited vocabulary and continuous speech, the 0.0114 mean square error (MSE) is quite promising.

作者王志明蔡莲红

机构地区北京科技大学计算机系清华大学计算机系

出处《小型微型计算机系统》 CSCD 北大核心 2005年第6期1083-1087,共5页 Journal of Chinese Computer Systems

基金高等学校博士学科点专项科研基金资助(20010003049) 北京科技大学校基金(2004509180)资助

关键词前馈神经网络视位线性预测系数线谱对系数实倒谱系数反射系数 MEL倒谱系数均方误差 feed forward neural network viseme linear predictive coding (LPC) line spectral frequency (LSF) real cepstrum (RCEP) reflection coefficient (RC) mel frequency cepstrum coefficient (MFCC) mean square error (MSE)

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献9

1Brooke N M, Scott S D, Tomlinson M J. Making talking heads and speech reading with computers [C]. IEEE Colloquium on Integrated Audio-Visual Processing for Recognition, Synthesis and Communication, 1996, 2/1-2/6.
2Rao R R, Tsuhan Chen. Cross-modal prediction in audio-visual communication[C]. In: 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-96), Atlanta, USA, 1996..
3Tsuhan Chen, Rao R R. Audio-visual interaction in multimedia communication[C]. In: 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-97). Munich, Germany, 1997.
4Rao R R. Chen Tsuhan, Mersereau Russell M. Audio-to-visual conversion for multimedia communication[J]. IEEE Trans on Industrial Electronics, 1998, 45(2): 15-22.
5Chen T. Audiovisual speech processing[J]. IEEE Signal Processing Magazine, 2001, 18(1): 9-21.
6Lavagetto F. Time-delay neural networks for estimating lip movements from speech analysis. a useful tool in audio-video synchronization[J]. IEEE Trans on Circuits and Systems for Video Technology, 1997, 7(5): 786-800.
7Kyoung Ho Choi, Jenq-Neng Hwang, Baum-Welch hidden markov model inversion for reliable audio-to-visual conversion [C]. In: 1999 IEEE 3rd Workshop on Multimedia Signal Processing, Piscataway, NJ, USA, 1999.
8Williams J J, Katsaggelos A K, Randolph M A. A hidden markov model based visual speech synthesizer[C]. In: 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ′ 00), Istanbul, Turkey, 2000.
9陈益强高文王兆其杨长水姜大龙.多模式语音合成[A].见:徐明星.第六届全国人机语音通讯学术会议论文集[C].深圳,2001.163-168.

同被引文献13

1王庆福,孙俊峰,王新龙,金伟萍.汉语元音的非线性预测[J].南京大学学报（自然科学版）,2005,41(1):84-90. 被引量：3
2江太辉.自适应格型滤波器新算法及其在语音线性预测合成中的应用[J].电子科学学刊,1996,18(4):362-368. 被引量：1
3周又玲,周铁,王志新.自适应LMS算法的DSP实现[J].现代电子技术,2006,29(19):135-137. 被引量：5
4Attali J G, Pages G. Approximations of functions by a multilayer pcrccptron[J]. Neural Networks, 1997, 10(6): 1069-1081.
5Kosmatopoulos E B, Polycarpou M M, Christodoulou M A. High- order neural network structures for identification of dynamical systems[J]. IEEE Transactions on Neural Networks, 1995, 6(2): 422-431.
6Ge S S, Li G Y, Lee T H. Adaptive NN control for a class of strict-feedback discrete-time nonlinear systems [ J ]. Automatica, 2003, 39(5) : 807-819.
7Goodwin G C, Sin K S. Adaptive filtering prediction and control [M]. Englewood Cliffs, NJ, Prentice-Hall, 1984.
8Sun M. Iterative learning neurocomputing [ C ]. Proceedings of 2009 International Conference on Wireless Networks and Information Systems, 2009, 158-161.
9Haykin S. Neural networks: a comprehensive foundation [ M ]. New Jersey: Prentice-Hall, 1999.
10He P, Jagannathan S. Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input conslraints [J]. IEEE Transaction on System, Man, and Cybernetics-Part B: Cybernetics, 2007, 37(2): 425-436.

引证文献2

1郑珠锋,尹明.语音预测的LMS方法及性能分析[J].现代电子技术,2007,30(17):4-5.
2严伟力,孙明轩.基于时变神经网络的非线性时变系统建模[J].小型微型计算机系统,2011,32(6):1228-1231. 被引量：1

二级引证文献1

1戴蓉,黄成.基于时变神经网络的迭代学习辨识算法[J].重庆邮电大学学报（自然科学版）,2016,28(2):265-272. 被引量：6

1王志明,蔡莲红,艾海舟.基于支持向量回归的唇动参数预测[J].计算机研究与发展,2003,40(11):1561-1565. 被引量：7
2柯世杰,岳振军.分形理论在语音信号处理中的应用[J].电脑知识与技术,2009,5(3):1719-1721. 被引量：1
3李永恒,严家明,揭峰.线性预测分析在连接词语音识别中的研究[J].计算机仿真,2010,27(11):340-344. 被引量：2
4陈觉之,张贵荣,周宇欢.基于GMM模型的自适应说话人识别研究[J].计算机与现代化,2013(7):91-93. 被引量：2
5陈蔚,熊卫华,施巍巍.基于经验模态分解和Mel倒谱系数的语音端点检测[J].浙江理工大学学报（自然科学版）,2015,33(4):574-578. 被引量：4
6陈宁,徐心和,张春晖,刘兴刚.足球机器人仿真系统的结构与设计分析[J].系统仿真学报,2002,14(3):374-376. 被引量：2
7软件·过年[J].软件世界,2006(2):24-31.
8刘怡牧.CMOS与BIOS[J].中小学电教（综合）,2004(8):78-78.
9陈禹.用户心中的误区[J].中国计算机用户,2007(10):63-63.
10吴慧玲,杜成东,毛鹤.基于GMM的说话人识别算法的研究与应用[J].现代计算机（中旬刊）,2014(5):31-35. 被引量：6

小型微型计算机系统

2005年第6期

浏览历史

内容加载中请稍等...

基于神经网络由语音预测视位参数被引量：2

参考文献9

同被引文献13

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于神经网络由语音预测视位参数 被引量：2

参考文献9

同被引文献13

引证文献2

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于神经网络由语音预测视位参数被引量：2