期刊文献+

方言识别网络模型的声学信息表征研究 被引量:2

Presentation of Acoustic Characteristics with Network Models for Dialect Identification
下载PDF
导出
摘要 目的研究语音识别网络模型在声学信息中的表征能力,并对方言自动分类应用进行最优单模型筛选。方法使用python仿真实现SOM、RNN、LSTM与CNN模型,并选择合适的分类器进行方言分类任务的训练与分类验证实验。结果实验结果显示,多分类评价指标PRF条件下,LSTM模型取得了宏平均和微平均的最优评价得分。同时CNN模型则在低信噪比条件下显示了较好的抗噪鲁棒性。结论LSTM+CNN框架下方言信息表征能力较好且兼具强鲁棒性,可满足方言自动分类任务的二次开发应用。 Objective To explore the presentation of acoustic characteristics with network models for dialect identification so as to screen out the optimal singular model for automatic dialect classifier.Methods Four selected typical neural network models for acoustic feature extraction,SOM(self-organizing feature Map),RNN(recurrent neural network),LSTM(long shortterm memory network)and CNN(convolutional neural network),were individually simulated through python.With the dataset containing typical dialects(6036 samples of 105 persons’spoken voices)from 13 cities in Jiangsu province,three aggregates were respectively built up for purpose of training,verification and test at the division ratio of 6:2:2.The test aggregate was then edited into sub-aggregates of 3 and 10 seconds,having each further added of white noise to form the sub-aggregates owning signalto-noise ratio(SNR)of 3 and 10 dB.Thus,4 test aggregates were thereby produced,with each containing 1207 samples.The appropriate classifiers were chosen to evaluate the performance of four above-selected models into their operations of training,verification and test.For the dialect identification,every selected network model was verified of its ability to extract features from the test aggregates owning different SNR and duration.Results With the previously-normalized data and network parameters,the confusion matrices of models were obtained from the output data of 4 neural network models processing into 4 test aggregates,having resulted in the Macro-F1 and Micro-F1 scores that are useful and eligible for evaluation of multi-classification problem.The results showed that LSTM and CNN are significantly better of performance than SOM and RNN.SOM is obviously more sensitive to the SNR of test samples,though having poor identification accuracy with the 3dB test aggregate.RNN has the improved accuracy for dialect identification,yet having the insufficient representation ability to key information of long-term samples.LSTM achieves the optimal evaluation scores of 93.1%(Macro-F1)/92.7%(Micro-F1)with 10dB/10s test aggregate,excelling in overcoming the bug of RNN with its characteristic structural unit.CNN is stable of identification accuracy,not easily affected with the length of speech fragments,thereby having better performance in noise-resistibility for substandard recordings.Owning the nonlinear transformation operations of convolution and pooling,CNN model is of good nonlinear expression ability to demonstrate nice fitting performance for information representation in dialect classification although it is incompetent in real-time presentation with the identified material.Conclusions LSTM+CNN framework is of better acoustic characteristics performance and robustness,capable of meeting the further updating development and application of automatic dialect identification.Besides,the audio sample duration and SNR are still the key for a model(singular or coalesced from two or more)to improve its identification accuracy.
作者 申小虎 金恬 李佳蔚 韩春润 SHEN Xiaohu;JIN Tian;LI Jiawei;HAN Chunrun(Department of Forensic Science and Technology,Jiangsu Police Institute,Nanjing 210031,China;Evidence Identification Center,Jiangsu Provincial Public Security Bureau,Nanjing 210031,China)
出处 《刑事技术》 2021年第3期234-240,共7页 Forensic Science and Technology
基金 公安部应用创新计划项目(2020YYCXHNST046) 现场物证溯源技术国家工程实验室开放课题(2018NELKFKT10)。
关键词 方言识别 声学模型 声学信息表征 自动分类 dialect identification acoustic model acoustic characteristics automatic classifier
  • 相关文献

参考文献5

二级参考文献49

共引文献44

同被引文献20

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部