期刊文献+

基于高斯混合模型的感知域音频编码方法

Perceptual Domain Audio Coding Method Based on Gaussian Mixture Model
下载PDF
导出
摘要 传统感知音频编码方案采用心理声学掩蔽降低编码码率,其声道模型+信号激励的方式难以同时实现高质量的中低码率语音和音频信号编码。为此,提出一种基于高斯混合模型的感知域音频编码方法,利用Gammatone滤波器组模拟人耳听觉系统,采用多路复用掩蔽模型替换降低包络脉冲的数量,对结构化模型进行拟合,使用高斯-牛顿算法对听觉包络进行高斯混合模型参数的拟合,将高斯混合模型参数替代音频信号特征。实验结果表明,与基于稀疏包络表示重构的音频编码方法相比,该方法的主观测试高0.5分-0.8分,客观测试高5分-10分,解码得到的语音和大部分音乐信号都能还原到原始音频信号,可用于实现高质量的中低码率语音和音频编码。 For the traditional perceptual audio encoding scheme using the psychoacoustic mask effect to reduce coding rate,the channel model+signal incentive way is difficult to simultaneously realize high quality in low bit rate speech and audio signal coding.It proposes a perceptual domain audio coding algorithm based on Gaussian Mixture Model(GMM).The algorithm uses Gammatone filter groups to simulate the human auditory system,using multiplexer masking model and replace to reduce the number of pulse envelope and facilitate the use of structural model fitting,using the Gauss-Newton algorithm for the fitting of Gaussian mixture model parameters,using Gaussian mixture model parameter replace audio signal characteristics.The results prove that compared with the audio coding method based on the envelope with sparse reconstruction,subjective test is higher than 0.5 point to 0.8 point,and the objective test is higher than 5 point to 10 point,most of the speech and music signal can be restored to the effect of the original audio signal by decoding,and can be used to achieve high quality speech and audio encoding at low bit rate.
出处 《计算机工程》 CAS CSCD 北大核心 2015年第10期265-269,共5页 Computer Engineering
基金 国家自然科学基金资助项目(614712710)
关键词 人耳听觉系统 感知域音频编码 高斯混合模型 Gammatone滤波器组 高斯-牛顿算法 human auditory system perceptual domain audio coding Gaussian Mixture Model(GMM) Gammatone filter bank Gauss-Newton algorithm
  • 引文网络
  • 相关文献

参考文献8

  • 1Spanias A,Painter T.Audio Signal Processing and Coding[M].New York,USA:John Wiley and Sons,2012.
  • 2ISO.ISO/IEC 14496-3-2009 Coding of Audio-Visual Objects,Part 3:Audio[S].2009.
  • 33GPP.3GPP TS 26.171-2002Adaptive Multi-Rate-Wide-band(AMR-WB)Speech Codec,General Description[S].2002.
  • 4Smith E C,Lewicki M S.Efficient Auditory Coding[J].Nature,2006,439(7079):978-982.
  • 5Holters M.Automatic Parameter Optimization for a Perceptual Audio Codec[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2009:13-16.
  • 6Strahl S.Sparse Gammatone Signal Model Optimized for English Speech Does not Match the Human Auditory Filters[J].Brain Research,2008,1220(2):224-233.
  • 7Mathews J H,Fink K K.数值方法(Matlab版)[M].4版.北京:电子工业出版社,2010.
  • 8Thiemann J.A Sparse Auditory Envelope Representation with Iterative Reconstruction for Audio Coding[D].Montreal,Canada:McGill University,2011.
;
使用帮助 返回顶部