方言识别网络模型的声学信息表征研究被引量：2

Presentation of Acoustic Characteristics with Network Models for Dialect Identification

下载PDF

导出

摘要目的研究语音识别网络模型在声学信息中的表征能力,并对方言自动分类应用进行最优单模型筛选。方法使用python仿真实现SOM、RNN、LSTM与CNN模型,并选择合适的分类器进行方言分类任务的训练与分类验证实验。结果实验结果显示,多分类评价指标PRF条件下,LSTM模型取得了宏平均和微平均的最优评价得分。同时CNN模型则在低信噪比条件下显示了较好的抗噪鲁棒性。结论LSTM+CNN框架下方言信息表征能力较好且兼具强鲁棒性,可满足方言自动分类任务的二次开发应用。 Objective To explore the presentation of acoustic characteristics with network models for dialect identification so as to screen out the optimal singular model for automatic dialect classifier.Methods Four selected typical neural network models for acoustic feature extraction,SOM(self-organizing feature Map),RNN(recurrent neural network),LSTM(long shortterm memory network)and CNN(convolutional neural network),were individually simulated through python.With the dataset containing typical dialects(6036 samples of 105 persons’spoken voices)from 13 cities in Jiangsu province,three aggregates were respectively built up for purpose of training,verification and test at the division ratio of 6:2:2.The test aggregate was then edited into sub-aggregates of 3 and 10 seconds,having each further added of white noise to form the sub-aggregates owning signalto-noise ratio(SNR)of 3 and 10 dB.Thus,4 test aggregates were thereby produced,with each containing 1207 samples.The appropriate classifiers were chosen to evaluate the performance of four above-selected models into their operations of training,verification and test.For the dialect identification,every selected network model was verified of its ability to extract features from the test aggregates owning different SNR and duration.Results With the previously-normalized data and network parameters,the confusion matrices of models were obtained from the output data of 4 neural network models processing into 4 test aggregates,having resulted in the Macro-F1 and Micro-F1 scores that are useful and eligible for evaluation of multi-classification problem.The results showed that LSTM and CNN are significantly better of performance than SOM and RNN.SOM is obviously more sensitive to the SNR of test samples,though having poor identification accuracy with the 3dB test aggregate.RNN has the improved accuracy for dialect identification,yet having the insufficient representation ability to key information of long-term samples.LSTM achieves the optimal evaluation scores of 93.1%(Macro-F1)/92.7%(Micro-F1)with 10dB/10s test aggregate,excelling in overcoming the bug of RNN with its characteristic structural unit.CNN is stable of identification accuracy,not easily affected with the length of speech fragments,thereby having better performance in noise-resistibility for substandard recordings.Owning the nonlinear transformation operations of convolution and pooling,CNN model is of good nonlinear expression ability to demonstrate nice fitting performance for information representation in dialect classification although it is incompetent in real-time presentation with the identified material.Conclusions LSTM+CNN framework is of better acoustic characteristics performance and robustness,capable of meeting the further updating development and application of automatic dialect identification.Besides,the audio sample duration and SNR are still the key for a model(singular or coalesced from two or more)to improve its identification accuracy.

作者申小虎金恬李佳蔚韩春润 SHEN Xiaohu;JIN Tian;LI Jiawei;HAN Chunrun(Department of Forensic Science and Technology,Jiangsu Police Institute,Nanjing 210031,China;Evidence Identification Center,Jiangsu Provincial Public Security Bureau,Nanjing 210031,China)

机构地区江苏警官学院刑事科学技术系江苏省公安厅物证鉴定中心

出处《刑事技术》 2021年第3期234-240,共7页 Forensic Science and Technology

基金公安部应用创新计划项目(2020YYCXHNST046) 现场物证溯源技术国家工程实验室开放课题(2018NELKFKT10)。

关键词方言识别声学模型声学信息表征自动分类 dialect identification acoustic model acoustic characteristics automatic classifier

分类号 D793.2 [政治法律—中外政治制度]

引文网络
相关文献

参考文献5

1黄建同,黄文林.信息化条件下言语识别技术的新变化[J].刑事技术,2014,39(2):39-40. 被引量：5
2胡峰松,张璇.基于梅尔频率倒谱系数与翻转梅尔频率倒谱系数的说话人识别方法[J].计算机应用,2012,32(9):2542-2544. 被引量：19
3蔡尚,金鑫,高圣翔,潘接林,颜永红.用于噪声鲁棒性语音识别的子带能量规整感知线性预测系数[J].声学学报,2012,37(6):667-672. 被引量：14
4朱颖,钱盛友,赵新民.基于SOM神经网络和支持向量机的方言辨识[J].计算机工程与应用,2009,45(22):200-201. 被引量：6
5艾虎,李菲.基于改进的长短期记忆神经网络方言辨识模型[J].科学技术与工程,2019,19(2):163-169. 被引量：5

二级参考文献49

1汪峥,连翰,王建军.说话人识别中特征参数提取的一种新方法[J].复旦学报（自然科学版）,2005,44(1):197-200. 被引量：16
2郭春霞,裘雪红.基于MFCC的说话人识别系统[J].电子科技,2005,18(11):53-56. 被引量：19
3于明,袁玉倩,董浩,王哲.一种基于MFCC和LPCC的文本相关说话人识别方法[J].计算机应用,2006,26(4):883-885. 被引量：14
4顾明亮,沈兆勇.基于语音配列的汉语方言自动辨识[J].中文信息学报,2006,20(5):77-82. 被引量：19
5齐晓凡.言语识别技术的发展与展望[J].中国司法鉴定,2007(3):40-43. 被引量：6
6顾明亮,夏玉果,张长水.基于支撑矢量机的汉语方言辨识[J].计算机工程与应用,2007,43(29):210-213. 被引量：5
7CAMBELL J P. Speaker recognition: a tutorial [ J]. Proceedings of the IEEE, 1997, 185(9) : 1437 - 1462.
8DAVIS S B, MERMELSTEIN P. Comparison of parametric repre- sentations for monosyllabic word recognition in continuously spoken sentences [ J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1980, 28(4) : 357 - 365.
9QIAN ZHEN, LIU LI-YAN, LI XUE-YAO. Speaker identification based on MFCC and IMFCC [ C]//ICISE: Proceedings of 2009 the 1st International Conference on Information Science and Engineer- ing. Piscataway, NJ: IEEE Press, 2009:5416 - 5419.
10FISHER R A. The use of multiple measurements in taxonomic prob- lems [J]. Annals of Eugenics, 1936, 7(1) : 179 - 188.

共引文献44

1曾月娥,伍世代.福清市商业基准地价评估[J].吉林师范大学学报（自然科学版）,2011,32(1):49-51. 被引量：1
2辜华良.冲击器频率的声波测试法[J].长春科技大学学报,2000,30(2):204-205. 被引量：3
3胡扬,年晓红.一种汉语方言编码与转换机制的研究[J].计算机应用研究,2013,30(1):206-210.
4鲜晓东,樊宇星.基于Fisher比的梅尔倒谱系数混合特征提取方法[J].计算机应用,2014,34(2):558-561. 被引量：16
5何勇军,付茂国,孙广路.语音特征增强方法综述[J].哈尔滨理工大学学报,2014,19(2):19-25. 被引量：3
6侯雷静,郭婷婷,孙燕,齐英杰,应冬文,唐闽,颜永红.面向心音分割的个性化高斯混合建模方法[J].声学学报,2019,44(1):20-27. 被引量：7
7周彬,邹霞,张雄伟.改进的噪声鲁棒语音稀疏线性预测算法[J].声学学报,2014,39(5):655-662. 被引量：1
8黄锐,陆安江,张正平.一种改进型的MEL滤波器混合特征参数提取方法研究[J].通信技术,2014,47(12):1388-1391. 被引量：2
9ZHOU Bin,ZOU Xia,ZHANG Xiongwei.An improved algorithm for noise-robust sparse linear prediction of speech[J].Chinese Journal of Acoustics,2015,34(1):84-95. 被引量：1
10欧阳国亮,李彪.利用字形时代特征佐证文件制成时间[J].刑事技术,2016,41(1):70-73. 被引量：6

同被引文献20

1钱盛友,许慧燕.基于动态时间规整和神经网络的方言辨识研究[J].计算机工程与应用,2008,44(10):211-213. 被引量：9
2王岐学,钱盛友,赵新民.基于差分特征和高斯混合模型的湖南方言识别[J].计算机工程与应用,2009,45(35):129-131. 被引量：4
3艾虎,李菲.基于改进的长短期神经网络的贵州方言辨识系统的设计与实现[J].科学技术与工程,2019,19(5):203-210. 被引量：3
4谢可欣,董胡,邹孝,汤琛,钱盛友.基于GRU-HMM声学模型的湖南方言辨识[J].计算机与数字工程,2019,47(3):493-496. 被引量：2
5何峻青,黄娴,赵学敏,张克亮.利用领域外数据对口语风格短文本的相近语种识别研究[J].中文信息学报,2019,33(3):71-78. 被引量：2
6冀常鹏,程琳,李锋.基于改进BP-Adaboost和HMM混合模型的方言情感识别[J].成都信息工程大学学报,2019,34(5):495-500. 被引量：1
7秦晨光,王海,任杰,郑杰,袁璐,赵子鑫.基于多任务学习的方言语种识别[J].计算机研究与发展,2019,56(12):2632-2640. 被引量：11
8李康澄,洪嘉榕.长沙方言音系研究综述[J].现代语文,2019(12):51-57. 被引量：2
9孙杰,吾守尔·斯拉木,热依曼·吐尔逊,张晶晶.维吾尔语方言识别及相关声学分析[J].声学学报,2019,44(6):1083-1092. 被引量：3
10王福钊,周雁.藏语语音识别研究进展和展望[J].计算机系统应用,2020,29(3):29-38. 被引量：2

引证文献2

1郝焕香.基于深度学习的方言语音识别模型构建[J].自动化与仪器仪表,2022(4):48-51. 被引量：3
2梁小林,沈湘菲,梁曌,邱海琳.基于CTC-GRU模型的长沙方言识别[J].吉首大学学报（自然科学版）,2022,43(2):45-52.

二级引证文献3

1陈浩,柴鹏鑫,卓嘎.常用藏语词汇语音评价关键技术研究和仿真实现[J].信息与电脑,2023,35(2):177-180.
2王志,张琪.基于Python的方言翻译系统设计[J].无线互联科技,2023,20(12):9-12.
3李厚燕.方言声频特性的计算机分析研究[J].电声技术,2023,47(6):58-60.

1孙杰,王宏,吾守尔·斯拉木.结合注意力机制和因果卷积网络的维吾尔语方言识别[J].声学技术,2020,39(6):697-703. 被引量：3
2张闫,赵越,韩啸,张梅,应长江,李伟.2型糖尿病患者NLR与Framingham卒中风险等级的相关性[J].临床与病理杂志,2021,41(5):1057-1063. 被引量：1
3杜藏,张学清,张兴彬,潘福营,殷康.抽水蓄能电站施工机械设备“油改电”可行性分析[J].水电与抽水蓄能,2021,7(2):84-89. 被引量：2

刑事技术

2021年第3期

浏览历史

内容加载中请稍等...

方言识别网络模型的声学信息表征研究被引量：2

参考文献5

二级参考文献49

共引文献44

同被引文献20

引证文献2

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

方言识别网络模型的声学信息表征研究 被引量：2

参考文献5

二级参考文献49

共引文献44

同被引文献20

引证文献2

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

方言识别网络模型的声学信息表征研究被引量：2