基于谱稳定性特征的语音与笑声区分新方法被引量：3

Spectral Stability Feature Based Novel Method for Discriminating Speech and Laughter

下载PDF

导出

摘要该文提出一种采用谱稳定性作为特征参数的区分语音与笑声的新方法。通过分析语音与笑声的谱稳定性参数的特性,发现前者明显小于后者,这表明谱稳定性可以作为区分语音与笑声的特征参数。比较了采用谱稳定性参数、Mel频率倒谱系数、感知线性预测和基音频率等特征参数在相同实验条件下区分语音与笑声的性能。实验结果表明:在特定人和非特定人情况下,采用谱稳定性作为特征参数区分语音与笑声的正确率分别为90.74%和73.63%,其区分能力优于其它特征参数。 This paper proposes a novel method which uses spectral stability as feature parameter to discriminate speech and laugh, It is found that the spectral stability of speech is obviously smaller than that of laugh, which indicates that the spectral stability can be used as a feature parameter to discriminate speech and laugh. The performance of discriminating speech and laugh by using Spectral Stability （SS）, Mel-Frequency Cepstrum Coefficients （MFCC）, Perceptual Linear Prediction （PLP） and pitch, are compared to each other in the same experiment conditions. The experiment results show that the accuracy are respectively 90.74% and 73.63% by using spectral stability as feature parameter to discriminate speech and laugh in the speaker-dependent and speaker-independent conditions, and the discrimination power of spectral stability is superior to the counterparts of other feature parameters.

作者李艳雄贺前华陈楠齐朝晖

机构地区华南理工大学电子与信息学院

出处《电子与信息学报》 EI CSCD 北大核心 2008年第6期1359-1362,共4页 Journal of Electronics & Information Technology

基金国家自然科学基金(60572141)资助课题

关键词自然口语语音识别语音笑声区分谱稳定性语音事件 Spontaneous speech recognition Speech laugh discrimination Spectral stability Speech events

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献13

1Rose R C and Riccardi G. Modeling disfluency and background events in ASR for a natural language understanding task. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Phoenix, AZ, USA, March 15-19, 1999, vol.1: 341-344.
2Stouten F, Duchateau J, and Martens J P, et al.. Coping with disfluencies in spontaneous speech recognition: acoustic detection and linguistic context manipulation. SpeechCommunication, 2006, vol.48: 1590-1606.
3Chinese Linguistic Data Consortium. http://www.chineseldc. org/resourse.asp.
4Cal R, Lie L, Zhang H J, and Cai L H. Highlight sound effects detection in audio stream. In Proc. of the IEEE International Conference on Multimedia and Expo, Baltimore, USA, July 6-9, 2003, Vol.3: 37-40.
5Lockerd A and Mueller F. LAFCam Leveraging affective feedback camcorder. In Proc. of the CHI 2002 Conference on Human Factors in Computing Systems, Minneapolis, USA, 2002: 574-575.
6Kennedy L S and Ellis D P W. Laughter detection in meetings. In NIST ICASSP 2004 Meeting Recognition Workshop, Montreal, Canada, 2004: 11-14.
7Ito A, Wang Xinyue, Suzuki M, and Makino S. Smile and laughter recognition using speech processing and face recognition from conversation video. In Proc. of the 2005 International Conference on Cyberworlds, Nanyang Executive Centre, Singapore, November 23-25, 2005: 437-444.
8Truong K P and van Leeuwen D A. Automatic discrimination between laughter and speech. Speech Communication,2007, 49(2): 144-158.
9Hermansky H. Perceptual linear predictive (PLP) analysis of speech. Journal of Acoustic society of America, 1990, 87(4): 1738-1752.
10Sun Xuejing. Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Florida, USA, May 2002, Vol.l: 333-336.

同被引文献27

1Stouten F,Duchateau J, Martens J P, et al. Coping with disfluencies in spontaneous speech recognition: acoustic detection and linguistic context manipulation [ J ]. Speech Communication ,2006,48 ( 11 ) : 1590-1606.
2Cai R,Lu L,Zhang H J,et al. Highlight sound effects detection in audio stream [C]//Proceedings of IEEE International Conference on Multimedia and Expo. Baltimore: IEEE, 2003 : 37-40.
3Kennedy L S, Ellis D P W. Laughter detection in meetings [ C ]//Proceedings of NIST International Conference on Acoustics, Speech and Signal Processing (Meeting Recognition Workshop ). Montreal: The National Institute of Standard and Technology ,2004:118-121.
4Knox M T, Mirghafori N. Automatic laughter detection using neural networks [ C ]//Proceedings of InterSpeech. Antwerpen:International Speech Communication Association, 2007:2973-2976.
5Laskowski K, Schuhz T. Detection of laughter-in-interaction in muhichannel close-talk microphone recordings of meetings [ C ] // Proceedings of the 5th International Workshop on Machine Learning for Muhimodal Interaction. Utrecht : Springer-Verlag ,2008 : 149-160.
6Knox M T, Morgan N, Mirghafori N. Getting the last laugh: automatic laughter segmentation in meetings [ C ]//Proceedings of InterSpeech. Brisbane: International Speech Communication Association,2008:797-800.
7Garg G, Ward N. Detecting filled pauses in tutorial dialogs [R]. EI Paso : Department of Computer Science, University of Texas at EI Paso,2006 : 1-9.
8Audhkhasi K, Kandhway K, Deshmukh O D, et al. Formant-based technique for automatic filled-pause detection in spontaneous spoken English [C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Taipei : IEEE, 2009:4857-4860.
9Li Y X, He Q H, Kwong S, et al. Characteristics-based effective applause detection for meeting speech [ J ]. Signal Processing ,2009,89 ( 8 ) : 1625-1633.
10Carter A. Automatic acoustic laughter detection [ D ]. Staffordshire: Department of Electronic Engineering, Keele Universtiy ,2000.

引证文献3

1贺前华,李艳雄,李韬,张虹,杨继臣.基于两步判决的口语中非文字音频事件检测方法[J].华南理工大学学报（自然科学版）,2011,39(2):20-25. 被引量：1
2赵小蕾,毛启容,詹永照.融合功能性副语言的语音情感识别新方法[J].计算机科学与探索,2014,8(2):186-199. 被引量：5
3朱春媚,黎萍.基于频段互相关系数的咳嗽识别新方法[J].计算机工程与应用,2016,52(2):161-164. 被引量：1

二级引证文献7

1赵小蕾,赵慧青.说话人功能性副语音自动检测算法[J].智能计算机与应用,2015,5(1):73-76. 被引量：1
2李艳雄,王琴,张雪,邹领.基于凝聚信息瓶颈的音频事件聚类方法[J].电子学报,2017,45(5):1064-1071. 被引量：7
3曾一凡,李雅昆.HHT和SVM在4-UCA信源数估计中的应用[J].信息通信,2017,30(3):8-10.
4曹春香.语音特征和情感特征的翻译系统与实现[J].现代电子技术,2018,41(13):123-127. 被引量：1
5赵小蕾,许喜斌.融合浅层学习和深度学习模型的语音情感识别[J].计算机应用与软件,2020,37(12):108-112. 被引量：2
6罗德虎,冉启武,杨超,豆旺.语音情感识别研究综述[J].计算机工程与应用,2022,58(21):40-52. 被引量：4
7孙颖,周雅茹,张雪英.融合功能性副语言比例系数的语音情感识别[J].东北大学学报（自然科学版）,2024,45(1):40-48.

1冯俊兰,杜利民.自然口语语音识别研究概况[J].电子科技导报,1999(9):3-7. 被引量：5
2薛德黔.交互式自然口语语音识别关键技术[J].计算机应用,2002,22(7):45-47. 被引量：3
3Huang Xinlin,Wang Gang,Chen Jian.Cross-layer routing protocol design for high mobility multi-hop cognitive radio networks[J].High Technology Letters,2012,18(3):267-274.
4李立永,张连海,冯志远.基于语谱能量的音素边界检测[J].太赫兹科学与电子信息学报,2013,11(6):936-941. 被引量：1
5吴三明,毛谦.适合于SDH设备的从钟的定时特性分析[J].光通信研究,1996(3):3-9. 被引量：2
6申静.用阴极射线管显示器测量人眼视觉特性[J].电视技术,2011,35(23):135-138. 被引量：3
7郑军,张伟,马兆瑞,施克仁,潘际銮.基于序列图像时间稳定性特征的背景估计技术[J].清华大学学报（自然科学版）,2005,45(5):601-605. 被引量：5
8赵军辉,匡镜明,谢湘.应用于军事指挥中的鲁棒性语音识别系统[J].兵工学报,2004,25(4):509-512. 被引量：5
9王振力,白志强,朱江.基于FSS与PLP的噪声鲁棒语音识别[J].南京邮电大学学报（自然科学版）,2008,28(4):12-15. 被引量：4
10章熙春,曹燕,张军,韦岗.基于WDFT的语音PLP特征提取算法[J].模式识别与人工智能,2005,18(4):469-473. 被引量：4

电子与信息学报

2008年第6期

浏览历史

内容加载中请稍等...

基于谱稳定性特征的语音与笑声区分新方法被引量：3

参考文献13

同被引文献27

引证文献3

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

基于谱稳定性特征的语音与笑声区分新方法 被引量：3

参考文献13

同被引文献27

引证文献3

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

基于谱稳定性特征的语音与笑声区分新方法被引量：3