摘要
声纹作为人类重要的生物特征,可应用于帕金森等疾病的判别,但现存的患者声纹数据集及样本偏少,故提出HR-DCGAN(High Resolution Deep Convolutional Generative Adversarial Network)进行样本扩充,进而采用深度学习方法区分帕金森患者和健康人.HR-DCGAN通过增加网络层数并结合特征匹配方法生成高分辨的语谱图,依据结构相似度指标(Structural Similarity Index,SSIM)筛选出高相似度的语谱图以扩充样本.构建VGG16提取声纹特征并分类有效地提高识别准确率,使用Dropout方法抑制过拟合问题进而达到正则化效果.在Sakar数据集上进行了多种特征提取方法,多分类方法的对比实验,结果表明HR-DCGAN-VGG16混合模型能够获得最高声纹识别准确率90. 5%和特异性91%,能有效区分帕金森患者和健康人,解决了少量声纹数据下对帕金森患者的早期高效筛查问题.
As an important biological feature of human beings,voiceprint can be applied to identify diseases such as Parkinson’s Disease,but existing patient voiceprint datasets and samples are less,so HR-DCGAN( High Resolution Deep Convolutional Generative Adversarial Network) is proposed for sample augment,and then deep learning method is used to distinguish between Parkinson’s patients and healthy people. HR-DCGAN generates high-resolution spectrogram by increasing the number of network layers and combining feature matching method,and selects high-similarity spectrogram based on Structural Similarity Index values to augment the samples. Constructing VGG16 to extract the voiceprint features and classify them effectively to improve the recognition accuracy. The Dropout method is used to suppress the over-fitting problem and achieve the regularization effect. Comparative experiments of a variety of extraction methods and multi-classification methods were performed on the Sakar dataset,the results show that the HR-DCGANVGG16 hybrid model can achieve the highest voiceprint recognition accuracy of 90. 5% and specificity of 91%,which can effectively distinguish between Parkinson’s patients and healthy people,and solve the problem of early and efficient screening of Parkinson’s patients with a small amount of voiceprint samples.
作者
王娟
徐志京
WANG Juan;XU Zhi-jing(School of Information Engineering,Shanghai Maritime University,Shanghai 201306,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2019年第9期2026-2032,共7页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61673259)资助