期刊文献+

基于最小平均复杂度的矢量量化音频分类方法 被引量:1

Audio Classification Based on Minimum Average Complexity Vector Quantization
下载PDF
导出
摘要 首先提出了“平均复杂度”的概念,然后由信息熵公式给出了最小平均复杂度的计算方法,并以此为准则构造音频数据的矢量量化树,从而得到音频数据在特征空间的分布情况.根据不同种类的音频数据有不同分布这一事实,比较未知音频与已知音频种类的数据在特征空间中的分布情况的近似程度,就可完成音频分类.实验表明,该方法具有适应性强、计算效率高的特点. The Concept of MAC(Minimum Average Complexity) is proposed first, and the calculation method is given according to the entropy formula. A VQ(Vector Quantization) tree is constructed via the MAC criterion, by which the distribution of audio feature vectors in the feature space can be obtained. In the fact that different kind of audio has different distribution, audio classification can be achieved by the degree of distribution similarity in the feature space between the unknown audio and the audio trained before. The algorithm is proven to be generalized and effective by the result of experiments.
出处 《武汉大学学报(理学版)》 CAS CSCD 北大核心 2005年第1期69-73,共5页 Journal of Wuhan University:Natural Science Edition
基金 国家自然科学基金资助项目(10371033) 国家211工程重大项目资助
关键词 平均复杂度 分裂 矢量量化树 特征空间 分布 距离 average complexity split vector quantization tree feature space distribution distance
  • 相关文献

参考文献10

  • 1卢坚,毛兵,孙正兴,张福炎.一种改进的基于说话者的语音分割算法[J].软件学报,2002,13(2):274-279. 被引量:17
  • 2Foote J. An Overview of Audio Information Retrieval [J]. ACM Multimedia System, 1999,7:2-10.
  • 3Feiten B, Frank R, Ungvary T. Organization of Sounds with Neural Nets [A]. Proceedings of the1991 International Computer Music Conference [ C].San Francisco, 1991, 441-444.
  • 4Feiten B, Gtinzel S. Automatic Indexing of a Sound Database Using Self-Organizing Neural Nets[J]. Computer Music Journal, 1994,18 (3) : 53-65.
  • 5Wold E,Blum T,Keislar D,et al. Content-Based Classification, Search and Retrieval of Audio[J]. IEEE Multimedia Magazine, 1996,3(3): 27-36.
  • 6Breiman L,Friedman J H,Olshen R A,et al. Classification and Regression Trees[M]. Belmont, CA: Wadsworth, 1984.
  • 7Steven Roman. Coding and Information Theory[M].New York: Springer-Verlag, 1992.
  • 8Linde Y,Buzo A,Gray R M. An Algorithm for Vector Quantizer Design[J]. IEEE Transactions on Communications, 1980,28( 1 ) :84-95.
  • 9Vergin R, O' Shaughnessay D. Generalized Mel-Frequency Cepstral Coefficients for Large-Vocabulary Speaker-Independent Continuous Speech Recognition[J]. IEEE Transactions on Speech and Audio Processing,1999,7(5) :525-53.
  • 10Tzanetakis G. Manipulation, Analysis and Retrieval Systems for Audio Signals[D]. Princeton University:Department of Computer Science, 2002.

二级参考文献11

  • 1Delacourt, P., Wellekens, C.J. DISTBIC: a speaker-based segmentation for audio data indexing. Speech Communication, 2000,32(1~2):111~126.
  • 2Guo, Xue-feng, Zhu, Wei-bin, Shi, Qiu. The IBM LVCSR system used for 1998 Mandarin broadcast news transcription evaluation. In: Proceedings of the 1999 DARPA Broadcast News Workshop. 1999. http://www.nist.gov/.
  • 3Bakis, R., Chen, S., Gopalakrishnan, P.S., et al. Transcription of broadcast news shows with the IBM large vocabulary speech recognition system. In: Proceedings of the DARPA Speech Recognition Workshop. Chantilly, 1997. 67~72.
  • 4Wegmann, S., Zhan, P., Gillick, L. Progress in broadcast news transcription at Dragon systems. In: Proceedings of the ICASSP'99, Vol. 1. Phoenix, Arizona: IEEE. 1999. 33~36.
  • 5Siegler, M.A., Jain U., Raj, B., et al. Automatic segmentation, classification, and clustering of broadcast news audio. In: Proceedings of the DARPA Speech Recognition Workshop. Chantilly, 1997. 97~99.
  • 6Cover, T.M., Tomas, J.A. Elements of Information Theory. New York: John Wiley & Sons, 1991. 1197-1208.
  • 7Gish, H., Schmidt, N. Text-Independent speaker identification. IEEE Signal Processing Magazine, 1994,11(4):18~32.
  • 8Chen, S.S., Gopalakrishnan, P.S. Clustering via the bayesian information criterion with applications in speech recognition. In: Proceedings of the ICASSP'98, Vol. 2, Seattle, Washington: IEEE, 1998. 645~648.
  • 9Schwarz, G. Estimating the dimension of a model. The Annuals of Statistics, 1978,6:461~464.
  • 10Delacourt, P., Wellejkens, C.J. Audio data indexing: use of second-order statistics for speaker-based segmentation. In: Proceedings of the IEEE International Conference on Multimedia Computing and Systems (ICMCS'1999), Vol.2. Florence, Italy: IEEE, 1999. 959~963.

共引文献16

同被引文献8

引证文献1

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部