Application of formant instantaneous characteristics to speech recognition and speaker identification

Application of formant instantaneous characteristics to speech recognition and speaker identification

下载PDF

导出

摘要 This paper proposes a new phase feature derived from the formant instantaneous characteristics for speech recognition （SR） and speaker identification （SI） systems. Using Hilbert transform （HT）, the formant characteristics can be represented by instantaneous frequency （IF） and instantaneous bandwidth, namely formant instantaneous characteristics （FIC）. In order to explore the importance of FIC both in SR and SI, this paper proposes different features from FIC used for SR and SI systems. When combing these new features with conventional parameters, higher identification rate can be achieved than that of using Mel-frequency cepstral coefficients （MFCC） parameters only. The experiment results show that the new features are effective characteristic parameters and can be treated as the compensation of conventional parameters for SR and SI. This paper proposes a new phase feature derived from the formant instantaneous characteristics for speech recognition （SR） and speaker identification （SI） systems. Using Hilbert transform （HT）, the formant characteristics can be represented by instantaneous frequency （IF） and instantaneous bandwidth, namely formant instantaneous characteristics （FIC）. In order to explore the importance of FIC both in SR and SI, this paper proposes different features from FIC used for SR and SI systems. When combing these new features with conventional parameters, higher identification rate can be achieved than that of using Mel-frequency cepstral coefficients （MFCC） parameters only. The experiment results show that the new features are effective characteristic parameters and can be treated as the compensation of conventional parameters for SR and SI.

作者侯丽敏胡晓宁谢娟敏

机构地区 Key laboratory of Specialty Fiber Optics and Optical Access Networks

出处《Journal of Shanghai University(English Edition)》 CAS 2011年第2期123-127,共5页 上海大学学报（英文版）

基金 Project supported by the National Natural Science Foundation of China (Grant No.60903186) the Shanghai Leading Academic Discipline Project (Grant No.J50104)

关键词 instantaneous frequency （IF） Hilbert transform （HT） speech recognition speaker identification Mel-frequency cepstral coefficients （MFCC） instantaneous frequency （IF）, Hilbert transform （HT）, speech recognition, speaker identification, Mel-frequency cepstral coefficients （MFCC）

分类号 TN912.34 [电子电信—通信与信息系统] S781.1 [农业科学—木材科学与技术]

引文网络
相关文献

参考文献14

1甄斌,吴玺宏,刘志敏,迟惠生.语音识别和说话人识别中各倒谱分量的相对重要性[J].北京大学学报（自然科学版）,2001,37(3):371-378. 被引量：74
2PICONE J. Continuous speech recognition using hidden Markov model [J]. IEEE ASSP Magazine, 1990, 17(3): 26-41.
3REYNOLDS D A, ROSE R C. Robust text-independent speaker identification using Gaussian mixture speaker models [J]. IEEE Transactions on Speech and Audio Processing, 1995, 3(1): 72-83.
4ZHANG W Y, RAO B D. Two microphone based direction of arrival estimation for multiple speech sources using spectral properties of speech [C]// Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei. 2009: 2193-2196.
5ALSTERIS L D, PALIWAL K K. Iterative reconstruction of speech from short-time Fourier transform phase andmagnitude spectra [J]. Computer Speech and Language, 2007, 21(1): 174-186.
6ALSTERIS L D, PALIWAL K K. Short-time phase spet trum in speech processing: A review and some experimental results [J]. Digital Signal Processing, 2001 17(3): 578-616.
7HEQDE R M, MURTHY H A, GADDEON V R R. Significance of the modified group delay feature in speech recognition [J]. IEEE Transactions Audio, Speech, and Language Processing, 2007, 15(1): 190-202.
8PLUMPE M D, QUATIERI T F, REYNOLDS D A. Mod- eling of the glottal flow derivative waveform with application to speaker identification [J]. IEEE Transactions on Speech and Audio Processing, 1999, 7(5): 569-586.
9REILLY A, FRAZER G, BOASHASH B. Analytic signal generation-tips and trap [J]. IEEE Transactions on Signal Processing, 1994, 42(11): 3241-3245.
10DIMITRIADIS D, MARAGOS P. Continuous energy demodulation methods and application to speech analysis [J]. Speech Communication, 2006, 48(7): 819-837.

二级参考文献3

1杨行峻迟惠生.数字语音信号处理[M].北京:电子工业出版社,1995..
2Zhen B，Proceedings ICSLP Ⅱ，2000年，933页
3杨行峻，数字语音信号处理，1995年

共引文献73

1王光艳,赵晓群,王霞.基于MATLAB GUI的语音信号特征提取系统设计[J].河北工业大学学报,2010,39(4):14-18. 被引量：11
2岳倩倩,周萍,景新幸.基于非线性幂函数的听觉特征提取算法研究[J].微电子学与计算机,2015,32(6):163-166. 被引量：5
3闫向宏,张亚萍,乔文孝.基于倒双谱的套管井声学评价系统特性辨识[J].应用声学,2005,24(4):250-254.
4李鹏怀,徐佩霞.基于DSP的嵌入式语音识别系统的实现[J].计算机工程,2005,31(16):160-162. 被引量：10
5崔双喜,朴春俊.噪声环境下的语音识别性能研究[J].计算机测量与控制,2005,13(11):1276-1278. 被引量：11
6马军,杨苹.一种聚焦式模糊分段算法及其在语音识别中的应用[J].科技资讯,2006,4(7):2-4.
7白莹,赵振东,戚银城,王斌,郭建勇.基于小波神经网络的与文本无关说话人识别方法研究[J].电子与信息学报,2006,28(6):1036-1039. 被引量：7
8贺志阳,张玲华.基于GMM统计参数和SVM的说话人辨认研究[J].南京邮电大学学报（自然科学版）,2006,26(3):78-82. 被引量：2
9王书诏,邱天爽.与文本无关的说话人识别系统的设计[J].电声技术,2006,30(12):51-52. 被引量：1
10王书诏,邱天爽.说话人识别研究综述[J].电声技术,2007,31(1):51-55. 被引量：9

1Zeng Yumin,Wu Zhenyang.COMBINATION OF PITCH SYNCHRONOUS ANALYSIS AND FISHER CRITERION FOR SPEAKER IDENTIFICATION[J].Journal of Electronics(China),2007,24(6):828-834.
2WANG Chengyou,TANG Shuqi,LIANG Diannong,CHEN Huihuang and TANG Zhaojing(National University of Defence Technology Changsha 410073)Received.The methods for combining the information of various kinds of features in speech recognition[J].Chinese Journal of Acoustics,1997,16(2):115-120.
3Hu Yi and He DehuanBeijing Institute of Remote Sensing Equipment, P. O. Box 3925, Beijing 100854, China.A New Adaptive Formant Vocoder[J].Journal of Systems Engineering and Electronics,1991,2(1):89-96.
4XU Longting,YANG Zhen,SUN Linhui.Simplification of I-Vector Extraction for Speaker Identification[J].Chinese Journal of Electronics,2016,25(6):1121-1126. 被引量：4
5一丁.听，谁的耳朵最灵敏（下）[J].学语文之友（小学1-2年级）,2012(10):32-33.
6Liu Gang Chen Wei Guo Jun.Novel Active Learning Method for Speech Recognition[J].China Communications,2010,7(5):29-39. 被引量：1
7GU Xiaojiang ZHAO Heming Lu Gang.Whispered speaker identification based on feature and model hybrid compensation[J].Chinese Journal of Acoustics,2012,31(4):499-508. 被引量：1
8WU Lili WANG Shoujue.Study on Closed-set Speaker Identification Based on Biomimetic Pattern Recognition[J].Chinese Journal of Electronics,2009,18(2):259-261. 被引量：4
9LIN Wei YANG Lili XU Boling.A new frequency scale of Chinese whispered speech in the application of speaker identification[J].Progress in Natural Science:Materials International,2006,16(10):1072-1078. 被引量：5
10廖元甫,庄智显,杨智合.Maximum Likelihood A Priori Knowledge Interpolation-Based Handset Mismatch Compensation for Robust Speaker Identification[J].Tsinghua Science and Technology,2008,13(4):528-532.

Journal of Shanghai University(English Edition)

2011年第2期

浏览历史

内容加载中请稍等...

Application of formant instantaneous characteristics to speech recognition and speaker identification

参考文献14

二级参考文献3

共引文献73

相关作者

相关机构

相关主题

浏览历史