摘要
为分析病理人群与正常人群的发音差异性,提出一种结合语音融合特征和随机森林的语音识别方法来进行正常语音与构音障碍语音的分类识别,从而为医学诊断和治疗提供科学和客观的依据.首先,使用多伦多大学开发的病理语音数据库,提取出语音的五种韵律特征以及梅尔频率倒谱系数,再计算其统计特征,构成融合特征,最后结合随机森林算法进行分类识别.结果显示,相比于单一类型特征,提出的融合特征在识别性能上有着显著优化作用,与随机森林分类器结合后,对于男性声音的分类准确率达到99.21%,对于女性声音的分类准确率达到98.97%,综合分类准确率达到98.00%.同时研究还发现,相较于句子,患者对短语的发音更为准确.
This paper proposes a method for speech recognition combining the speech fusion feature andrandom forest to classify normial voices and voices with dysarthria. This work aimes at analyzing thedifferences about pronunciation between pathological people and normial people, and scientific and objective evidence for diagnosis and treatment. First, the proposed method uses pathological voice database developed by T'oronto University as the corpus, then extracts five types of prosodic featuresand Mel F'rcqucncy Cepstrumi Coefficient(MFCC), and calculats their statisticthe fusion feature. Finally, the random forest is used as the classifier. The results show that, compared with the single type of feature, the proposed fusion feature significantly optimizes the recognition performance, and after combining withthe random forest, the classification accuracy for male reaches99. 21%, the classification accuracy for fcmialc reaches 98. 97%, and comprehensive reaches 98. 00% . Meanwhile, the research finds that the pronunciation of a patient whenwords is miore accurate than when he/she speaks sentences.
作者
李东
张雪英
段淑斐
闫密密
LI Dong;ZHANG Xueying;DUAN Shufei;YAN Mimi(College of Information Engineering,TaiyuanUniv.of Technology,Taiyuan 030024,Chin)
出处
《西安电子科技大学学报》
EI
CAS
CSCD
北大核心
2018年第3期149-155,共7页
Journal of Xidian University
基金
国家自然科学基金资助项目(61371193)
山西省应用基础研究青年基金资助项目(201601D202045)
关键词
韵律特征
梅尔频率倒谱系数
融合特征
随机森林
构音障碍识别
prosodic feature
Mel frequency ccpstrum coefficient
fusion feature
random forest
dysarthria recognition