摘要
在医学诊断、场景分析、语音识别、生态环境分析等方面语音分类都有着广泛的应用价值。传统的语音分类器采用的是神经网络。但是在精确度,模型设置,参数调整和资料的预处理等方面,有较大的缺陷。在这一基础上,文章提出了一种以“深度森林”为基础的改进方法——LightGBM的深度学习模型(Deep LightGBM模型)。它能够在保证模型简洁的前提下,提高分类精度和泛化能力。该算法有效降低了参数依赖性。在UrbanSound8K这一数据集中,采用向量方法进行语音特征的提取,其分类精确度达95.84%。将卷积神经网络(Convolutional Neural Network, CNN)抽取的特征和向量法获取的特征进行融合,并利用新的模型进行训练,其准确率可达97.67%。实验证明,此算法采用的特征提取方式与Deep LightGBM配合获得的模型参数调整容易,精度高,不会产生过度拟合,并且泛化能力好。
Applications for sound classification include voice identification, scene analysis, medical diagnosis,ecological environment study, and more. Neural networks, which are mostly used in traditional sound classification methods, have clear limitations in accuracy, model setting, parameter modification, and data pre-processing. Based on this, a Deep LightGBM model is developed, which is an upgraded LightGBM Deep learning model that successfully increases classification accuracy and generalization capacity while maintaining the model’s simplicity and lowering the degree of parameter dependence of the method. The suggested model achieves the accuracy of 95.84% on the UrbanSound8K dataset when sound features are extracted by using the vector approach. Accuracy of 97.67% is attained by combining the vector features with the CNN-extracted features before training the new model. The experimental findings demonstrate that the Deep LightGBM model and the implemented sound feature extraction approach have high accuracy, no over-fitting, and good generalization performance.
作者
李行健
汤心溢
张瑞
LI Xingjian;TANG Xinyi;ZHANG Rui(Key Laboratory of Infrared System Detection and Imaging Technology,Shanghai Institute of Technical Physics,Chinese Academy of Sciences,Shanghai 200083,China;School of Information Science and Technology,Shanghaitech University,Shanghai 201210,China;University of Chinese Academy of Sciences,Beijing 100049,China)
出处
《声学技术》
CSCD
北大核心
2022年第6期871-877,共7页
Technical Acoustics