期刊文献+

基于熵函数的耳语音声韵分割法 被引量:34

Entropy-based initial/final segmentation for Chinese whiskered speech
原文传递
导出
摘要 耳语音声韵分割是耳语音识别和转换的前期工作。由于耳语发音不同于正常音,一般用于正常音的声韵分割法对耳语音不再适用。通过分析耳语音的发音及声学特点,利用宽带语谱图的声韵变化规律,提出了适用于耳语音的信息熵端点检测法,以及相对熵、音长和谱重心相结合的声韵分割法。并对两组信噪比为2-10 dB的380个汉语单音节耳语音进行声韵分割,女声的正确率为87.9%,男声的正确率为90.3%,高于频域法、聚类法和谱平坦度声韵分割法。实验表明,相对熵法可做为耳语音识别和转换的预处理,它改善了汉语耳语音转换为正常音的音质。 The Initial/Final(IF) segmentation of whispered speech is the pre-processing in the whispered speech recognition and the reconstruction of normal speech from whisper. However, because the whispered initials and finals are all unvoiced, it is difficult to segment them by the methods used in the normal speech. With tile characteristics analysis of Chinese whispered speech, a new segmentation method is proposed. The speech endpoint is detected by the entropy function, and the initial/final boundary is obtained by the decision of the initial duration, the symmetric relative entropy and the normalized spectral center of gravity. The correct segmentation rates are 87.9% for the female data and 90.3% for the male data in the test with 380 Chinese whispered syllables at 2-10 dB SNR. It is more accuracy than the frequency domain method, the clustering method and the spectral flatness method. As shown in the experiments, this algorithm can be used as pre-processing in the whispered speech recognition and the conversion. It gives the reconstructed speech a more natural quality.
出处 《声学学报》 EI CSCD 北大核心 2005年第1期69-75,共7页 Acta Acustica
基金 国家自然科学基金资助项目(60272037)
  • 相关文献

参考文献13

  • 1齐士钤 张家禄.汉语普通话辅音音长分析[J].声学学报,1982,(1):8-13.
  • 2曹剑芬.现代语音基础知识[M].北京:人民教育出版社,1990..
  • 3朱维彬,张家.汉语语音资料库的语音学标记及人工切分[J].声学学报,1999,24(3):225-235. 被引量:11
  • 4陈韬,李昌立,莫福源.汉语孤立字全音节实时识别系统[J].声学学报,1993,18(3):161-171. 被引量:4
  • 5潘凌云,孙达传,吴美朝.语音识别中基于语谱图的语音音素分割方法[J].杭州大学学报(自然科学版),1995,22(1):42-46. 被引量:7
  • 6丁慧,栗学丽,徐柏龄.基于听觉模型的耳语音的声韵切分[J].应用声学,2004,23(2):20-25. 被引量:8
  • 7Taisuke Itoh, Kazuya Takeda and Fumitada Itakura.Acoustic analysis and recognition of whispered speech. In:Proc. ICASSP, Orlando, Florida, USA, 2002:389-392.
  • 8Robert W. Morris, Mark A. Clements. Reconstruction of speech from whispers. Medical Engineering ~ Physics,2002; 24(8): 515-520.
  • 9Higashikawa M, Nakai K, Sakakura A, Takahashi H. Perceived pitch of whispered vowels-relationship with formant frequencies: a preliminary study. Journal of Voice,1996; 10(2): 155-158.
  • 10Izmirli O. Using a spectral flatness based feature for audio segmentation and retrieval. In: Proc. International Symposium on Music Information Retrieval, Plymouth, USA,2000:100-101.

二级参考文献18

  • 1赵鹤呜,周旭东.一种新的听觉感知模型[J].电子科学学刊,1994,16(5):513-517. 被引量:4
  • 2秦勇.汉语超大词汇语音识别系统的研究与实现.中国科学院声学研究所博士论文[M].,1996..
  • 3Fant G 张家lu等(译).言语科学与言语技术[M].北京:商务印书馆,1994..
  • 4陈韬,1990年
  • 5陈永彬,语言信号处理,1990年
  • 6吴宗济,实验语音学概要,1989年
  • 7张家禄,J Chin Lingustics,1982年,10卷,190页
  • 8张家禄,心理学报,1981年,1卷,76页
  • 9初敏.高清晰度高自然度汉语文语转换系统的研究.中国科学院声学研究所博士论文[M].,1995..
  • 10Zhu Weibin,Proc CJSLP’97,1997年,67页

共引文献42

同被引文献292

引证文献34

二级引证文献149

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部