期刊文献+

Split vector quantization for sinusoidal amplitude and frequency

Split vector quantization for sinusoidal amplitude and frequency
原文传递
导出
摘要 In this paper, we suggest applying tree structure on the sinusoidal parameters. The suggested sinusoidal coder is targeted to find the coded sinusoidal parameters obtained by minimizing a likelihood function in a least square (LS) sense. From a rate-distortion standpoint, we address the problem of how to allocate available bits among different frequency bands to code sinusoids at each frame. For further analyzing the quantization behavior of the proposed method, we assess the quantization performance with respect to other methods: the short-time Fourier transform (STFT) based coder commonly used for speech enhancement or separation, and the line spectral frequency (LSF) coder used in speech coding. Through extensive simulations, we show that the proposed quantizer leads to less spectral distortion as well as higher perceived quality for the re-synthesized signals based on the coded parameters in a model-based approach with respect to previous STFT-based methods. The proposed method lowers the complexity, and, due to its tree-structure, leads to a rapid search capability. It provides flexibility for use in many speaker-independent applications by finding the most likely frequency vectors selected from a list of frequency candidates. Therefore, the proposed quantizer can be considered an attractive candidate for model-based speech applications in both speaker-dependent and speaker-independent scenarios. In this paper, we suggest applying tree structure on the sinusoidal parameters. The suggested sinusoidal coder is targeted to find the coded sinusoidal parameters obtained by minimizing a likelihood function in a least square (LS) sense. From a rate-distortion standpoint, we address the problem of how to allocate available bits among different frequency bands to code sinusoids at each frame. For further analyzing the quantization behavior of the proposed method, we assess the quantization performance with respect to other methods: the short-time Fourier transform (STFT) based coder commonly used for speech enhancement or separation, and the line spectral frequency (LSF) coder used in speech coding. Through extensive simulations, we show that the proposed quantizer leads to less spectral distortion as well as higher perceived quality for the re-synthesized signals based on the coded parameters in a model-based approach with respect to previous STFT-based methods. The proposed method lowers the complexity, and, due to its tree-structure, leads to a rapid search capability. It provides flexibility for use in many speaker-independent applications by finding the most likely frequency vectors selected from a list of frequency candidates. Therefore, the proposed quantizer can be considered an attractive candidate for model-based speech applications in both speaker-dependent and speaker-independent scenarios.
出处 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2011年第2期140-154,共15页 浙江大学学报C辑(计算机与电子(英文版)
关键词 Short-time Fourier transform Split vector quantization Sinusoidal modeling Spectral distortion Short-time Fourier transform, Split vector quantization, Sinusoidal modeling, Spectral distortion
  • 相关文献

参考文献1

二级参考文献47

  • 1Bach, F.R., Jordan, M.I., 2006. Learning spectral clustering, with application to speech separation. J. Mach. Learn. Res., 7(1): 1963-2001.
  • 2Barker, J., Shao, X., 2007. Audio-Visual Speech Fragment Decoding. Proc. Int. Conf. on Auditory-Visual Speech Processing, p.37-42.
  • 3Barker, J., Cooke, M., Ellis, D., 2005. Decoding speech in the presence of other sources. Speech Commun., 45(1):5-25. [doi:10.1016/j.specom.2004.05.002].
  • 4Barker, J., Coy, A., Ma, N., Cooke, M., 2006. Recent Advances in Speech Fragment Decoding Techniques. 9th Int. Conf. on Spoken Language Processing, p.85-88.
  • 5Benaroya, L., Bimbot, F., Gribonval, R., 2006. Audio source separation with a single sensor. IEEE Trans. Audio Speech Lang. Process., 14(1):191-199. [doi:10.1109FrSA. 2005.854110].
  • 6Bishop, C.M., 2006. Pattern Recognition and Machine Learning. Information Science and Statistics Series. Springer, New York, USA, p.2-3. [doi:10.10071978-0- 387-45528-0].
  • 7Chatterjee, S., Sreenivas, T.V., 2008. Predicting VQ performance bound for LSF coding. IEEE Signal Process. Lett., 15(1): 166-169. [doi:l 0.1109/I-SP.2007.914786].
  • 8Chhikara, R., Folks, L., 1989. The Inverse Gaussian Distribution: Theory, Methodology and Applications. CRC Press, Marcel Dekker Inc., New York, USA, p.39-52.
  • 9Christensen, M.G., Jakobsson, A., 2009. Multi-Pitch Estima- tion. Synthesis Lectures on Speech and Audio Processing. Morgan and Claypool Publishers, San Rafael, CA, USA, p.1-24. [doi:10.2200/S00178EDIV01Y200903SAP005].
  • 10Cooke, M.E, Barker, J., Cunningham, S.E, Shao, X., 2006. An audiovisual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am., 120(5):2421- 2424; [doi:]0.1121/1.2229005].

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部