Split vector quantization for sinusoidal amplitude and frequency

Split vector quantization for sinusoidal amplitude and frequency

导出

摘要 In this paper, we suggest applying tree structure on the sinusoidal parameters. The suggested sinusoidal coder is targeted to find the coded sinusoidal parameters obtained by minimizing a likelihood function in a least square (LS) sense. From a rate-distortion standpoint, we address the problem of how to allocate available bits among different frequency bands to code sinusoids at each frame. For further analyzing the quantization behavior of the proposed method, we assess the quantization performance with respect to other methods: the short-time Fourier transform (STFT) based coder commonly used for speech enhancement or separation, and the line spectral frequency (LSF) coder used in speech coding. Through extensive simulations, we show that the proposed quantizer leads to less spectral distortion as well as higher perceived quality for the re-synthesized signals based on the coded parameters in a model-based approach with respect to previous STFT-based methods. The proposed method lowers the complexity, and, due to its tree-structure, leads to a rapid search capability. It provides flexibility for use in many speaker-independent applications by finding the most likely frequency vectors selected from a list of frequency candidates. Therefore, the proposed quantizer can be considered an attractive candidate for model-based speech applications in both speaker-dependent and speaker-independent scenarios. In this paper, we suggest applying tree structure on the sinusoidal parameters. The suggested sinusoidal coder is targeted to find the coded sinusoidal parameters obtained by minimizing a likelihood function in a least square （LS） sense. From a rate-distortion standpoint, we address the problem of how to allocate available bits among different frequency bands to code sinusoids at each frame. For further analyzing the quantization behavior of the proposed method, we assess the quantization performance with respect to other methods： the short-time Fourier transform （STFT） based coder commonly used for speech enhancement or separation, and the line spectral frequency （LSF） coder used in speech coding. Through extensive simulations, we show that the proposed quantizer leads to less spectral distortion as well as higher perceived quality for the re-synthesized signals based on the coded parameters in a model-based approach with respect to previous STFT-based methods. The proposed method lowers the complexity, and, due to its tree-structure, leads to a rapid search capability. It provides flexibility for use in many speaker-independent applications by finding the most likely frequency vectors selected from a list of frequency candidates. Therefore, the proposed quantizer can be considered an attractive candidate for model-based speech applications in both speaker-dependent and speaker-independent scenarios.

作者 Pejman MOWLAEE Abolghasem SAYADIAN Hamid SHEIKHZADEH

机构地区 Department of Electronic Engineering

出处《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2011年第2期140-154,共15页 浙江大学学报C辑（计算机与电子（英文版）

关键词 Short-time Fourier transform Split vector quantization Sinusoidal modeling Spectral distortion Short-time Fourier transform, Split vector quantization, Sinusoidal modeling, Spectral distortion

分类号 TN762 [电子电信—电路与系统]

引文网络
相关文献

参考文献1

1Pejman MOWLAEE,Abolghasem SAYADIYAN,Hamid SHEIKHZADEH.Evaluating single-channel speech separation performance in transform-domain[J].Journal of Zhejiang University-Science C(Computers and Electronics),2010,11(3):160-174. 被引量：1

二级参考文献47

1Bach, F.R., Jordan, M.I., 2006. Learning spectral clustering, with application to speech separation. J. Mach. Learn. Res., 7(1): 1963-2001.
2Barker, J., Shao, X., 2007. Audio-Visual Speech Fragment Decoding. Proc. Int. Conf. on Auditory-Visual Speech Processing, p.37-42.
3Barker, J., Cooke, M., Ellis, D., 2005. Decoding speech in the presence of other sources. Speech Commun., 45(1):5-25. [doi:10.1016/j.specom.2004.05.002].
4Barker, J., Coy, A., Ma, N., Cooke, M., 2006. Recent Advances in Speech Fragment Decoding Techniques. 9th Int. Conf. on Spoken Language Processing, p.85-88.
5Benaroya, L., Bimbot, F., Gribonval, R., 2006. Audio source separation with a single sensor. IEEE Trans. Audio Speech Lang. Process., 14(1):191-199. [doi:10.1109FrSA. 2005.854110].
6Bishop, C.M., 2006. Pattern Recognition and Machine Learning. Information Science and Statistics Series. Springer, New York, USA, p.2-3. [doi:10.10071978-0- 387-45528-0].
7Chatterjee, S., Sreenivas, T.V., 2008. Predicting VQ performance bound for LSF coding. IEEE Signal Process. Lett., 15(1): 166-169. [doi:l 0.1109/I-SP.2007.914786].
8Chhikara, R., Folks, L., 1989. The Inverse Gaussian Distribution: Theory, Methodology and Applications. CRC Press, Marcel Dekker Inc., New York, USA, p.39-52.
9Christensen, M.G., Jakobsson, A., 2009. Multi-Pitch Estima- tion. Synthesis Lectures on Speech and Audio Processing. Morgan and Claypool Publishers, San Rafael, CA, USA, p.1-24. [doi:10.2200/S00178EDIV01Y200903SAP005].
10Cooke, M.E, Barker, J., Cunningham, S.E, Shao, X., 2006. An audiovisual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am., 120(5):2421- 2424; [doi:]0.1121/1.2229005].

1Zongbo Xie Jiuchao Feng.Speech Enhancement via Bayesian Multi-solution Shrinker[J].中国电子商情（通信市场）,2013(6):89-94.
2Bao Changchun(Department of Telecommunication Engineering, Changchun Posts and Telecommunications institUte,Changchun 130012,P.R.China)Dai Yisong(Department of Electronic Engineering,Jinn University of Technology,Changchun 130022,P.R.China).One-Step interpolation Predictive Vector Quantization of LSP Parameters[J].The Journal of China Universities of Posts and Telecommunications,1996,3(1):21-26. 被引量：1
3LI Xu LIU Tianjiao LIU Ying TANG Yan.Optimized Multicast Routing Algorithm Based on Tree Structure in MANETs[J].China Communications,2014,11(2):90-99. 被引量：3
4熊燕.LSF参数转换分裂矢量量化的卡尔曼后滤波增强方法[J].计算机工程与应用,2013,49(10):228-231. 被引量：1
5卿粼波,何小海,吕瑞,曾强宇.基于比特概率优化算法的分布式视频编码[J].光电子．激光,2008,19(5):620-624. 被引量：2
6ZHAO Nan XU Xin YANG Yi.Sparse Representations for Speech Enhancement[J].Chinese Journal of Electronics,2011,20(2):268-272. 被引量：9
7Jing Ma,Jindong Fei,Dong Chen.Rate-distortion weighted SPIHT algorithm for interferometer data processing[J].Journal of Systems Engineering and Electronics,2011,22(4):547-556.
8梁英,史仪凯.STFT及其在回声消除方面的应用[J].计测技术,2005,25(3):11-12.
9Bao ChangchunAssociate professor of Information Engineering, Beijing Polytechnic University, Ph.D, CIE senior member (Department of Electronic Engineering, Beijing Polytechnic University, Beijing 100022) Fan ChangxinProfessor with Xidian University, C.A Review of Speech Coding[J].通信学报,1998,19(5):45-56. 被引量：3
10桂任舟,杨子杰.基于信号分解的时频分析方法在高频地波雷达目标监测中的应用研究[J].武汉大学学报（信息科学版）,2006,31(7):653-656. 被引量：7

Journal of Zhejiang University-Science C(Computers and Electronics)

2011年第2期

浏览历史

内容加载中请稍等...

Split vector quantization for sinusoidal amplitude and frequency

参考文献1

二级参考文献47

相关作者

相关机构

相关主题

浏览历史