基于高斯混合模型的感知域音频编码方法

Perceptual Domain Audio Coding Method Based on Gaussian Mixture Model

下载PDF

导出

摘要传统感知音频编码方案采用心理声学掩蔽降低编码码率,其声道模型＋信号激励的方式难以同时实现高质量的中低码率语音和音频信号编码。为此,提出一种基于高斯混合模型的感知域音频编码方法,利用Gammatone滤波器组模拟人耳听觉系统,采用多路复用掩蔽模型替换降低包络脉冲的数量,对结构化模型进行拟合,使用高斯-牛顿算法对听觉包络进行高斯混合模型参数的拟合,将高斯混合模型参数替代音频信号特征。实验结果表明,与基于稀疏包络表示重构的音频编码方法相比,该方法的主观测试高0.5分-0.8分,客观测试高5分-10分,解码得到的语音和大部分音乐信号都能还原到原始音频信号,可用于实现高质量的中低码率语音和音频编码。 For the traditional perceptual audio encoding scheme using the psychoacoustic mask effect to reduce coding rate,the channel model＋signal incentive way is difficult to simultaneously realize high quality in low bit rate speech and audio signal coding.It proposes a perceptual domain audio coding algorithm based on Gaussian Mixture Model（GMM）.The algorithm uses Gammatone filter groups to simulate the human auditory system,using multiplexer masking model and replace to reduce the number of pulse envelope and facilitate the use of structural model fitting,using the Gauss-Newton algorithm for the fitting of Gaussian mixture model parameters,using Gaussian mixture model parameter replace audio signal characteristics.The results prove that compared with the audio coding method based on the envelope with sparse reconstruction,subjective test is higher than 0.5 point to 0.8 point,and the objective test is higher than 5 point to 10 point,most of the speech and music signal can be restored to the effect of the original audio signal by decoding,and can be used to achieve high quality speech and audio encoding at low bit rate.

作者吕亚平高戈陈怡张康

机构地区武汉大学计算机学院国家多媒体软件工程技术研究中心华中师范大学计算机学院

出处《计算机工程》 CAS CSCD 北大核心 2015年第10期265-269,共5页 Computer Engineering

基金国家自然科学基金资助项目(614712710)

关键词人耳听觉系统感知域音频编码高斯混合模型 Gammatone滤波器组高斯-牛顿算法 human auditory system perceptual domain audio coding Gaussian Mixture Model（GMM） Gammatone filter bank Gauss-Newton algorithm

分类号 TN912 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献8

1Spanias A,Painter T.Audio Signal Processing and Coding[M].New York,USA:John Wiley and Sons,2012.
2ISO.ISO/IEC 14496-3-2009 Coding of Audio-Visual Objects,Part 3:Audio[S].2009.
33GPP.3GPP TS 26.171-2002Adaptive Multi-Rate-Wide-band(AMR-WB)Speech Codec,General Description[S].2002.
4Smith E C,Lewicki M S.Efficient Auditory Coding[J].Nature,2006,439(7079):978-982.
5Holters M.Automatic Parameter Optimization for a Perceptual Audio Codec[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2009:13-16.
6Strahl S.Sparse Gammatone Signal Model Optimized for English Speech Does not Match the Human Auditory Filters[J].Brain Research,2008,1220(2):224-233.
7Mathews J H,Fink K K.数值方法(Matlab版)[M].4版.北京:电子工业出版社,2010.
8Thiemann J.A Sparse Auditory Envelope Representation with Iterative Reconstruction for Audio Coding[D].Montreal,Canada:McGill University,2011.

1程小伟,王健,曾庆宁,谢先明,龙超.基于调制域谱减法的鲁棒性说话人识别[J].科学技术与工程,2017,17(3):252-257. 被引量：5
2蒋毅,刘润生,冯振明.基于听感知特性的双麦克风近讲语音增强算法[J].清华大学学报（自然科学版）,2014,54(9):1179-1183. 被引量：1
3赵月露,纪磊.导航卫星接收机窄带干扰抑制算法研究[J].电子科技,2016,29(3):53-57. 被引量：2
4何朝霞,潘平.基于听觉模型的说话人语音特征提取[J].微型机与应用,2012,31(1):37-39. 被引量：2
5沈芮,张剑,王鼎.基于可见光通信的室内定位算法及相应参数估计克拉美罗界[J].激光与光电子学进展,2014,51(9):81-88. 被引量：10
6池秀清.视频及音频信号的数字化与信源编码[J].科技情报开发与经济,2009,19(27):92-94.
7韦岗,李向武.基于自适应重迭正交变换的音频信号编码[J].电子科技导报,1996(4):30-32.
8贾瑞,李冬梅.实时的Gammatone听感知滤波器组的FPGA实现[J].微电子学与计算机,2015,32(1):35-39. 被引量：1
9王侠,梁瑞宇,王青云,申红明,赵力,邹采荣.An adaptive multichannel loudness compensation method[J].Journal of Southeast University(English Edition),2016,32(2):141-145.
10彭浩辉,谢志文.掩蔽模型对语音增强效果影响的研究[J].电声技术,2008,32(9):56-60.

<12 >

计算机工程

2015年第10期

职称评审材料打包下载

基于高斯混合模型的感知域音频编码方法

参考文献8

相关作者

相关机构

相关主题

基于高斯混合模型的感知域音频编码方法

参考文献8

相关作者

相关机构

相关主题

微信扫一扫：分享