期刊文献+

基于语音音素后验概率图关键特征提取的中文方言识别模型 被引量:1

A Chinese Dialect Identification Model Based on Key Feature Extraction from Phonetic Posteriorgram
下载PDF
导出
摘要 不同方言对相同字的发音往往有所不同,因此不同方言所包含音素的概率分布存在较大差异,这是方言差异性的重要体现。为了充分利用这一差异性,提出了基于音素后验概率图分析的方言识别模型,该模型引入Convolutional Block Attention Module(CBAM)的提取音素后验概率图关键特征,并利用Emphasized Channel Attention-Propagation and Aggregation in TDNN(ECAPA-TDNN)模型对其进行聚合和注意力池化得到句子级特征。为进一步提升类间距离,引入了Additive Angular Margin(AAM)损失。实验结果表明,该模型取得了比传统模型更高的分类准确率,并且以上改进均对准确率提升有所贡献。 There are relatively few existing dialect recognition models for phonemic features and different dialects have different pronunciations,all of which lead to large differences in the probability distribution of phonemes contained in different dialects.Aiming at the above issues,this paper proposes a dialect identification model based on the phonetic posteriorgram feature.For the single dimension of attention analysis,this model extracts key features of frame-level phonetic posteriorgram by using the self-attention mechanism of Convolutional Block Attention Module(CBAM).At the same time,in order to make full use of the information in the middle layers of the model and avoid the loss of dialect information,Emphasized Channel Attention-Propagation and Aggregation in TDNN(ECAPA-TDNN)model is used to extract long-range information of frame-level feature and obtain effective sentence-level features via feature aggregation and attention statistical pooling.Finally,in order to avoid the problem of single loss function,we introduce Additive Angular Margin loss based on cross-entropy loss and replace the decision boundary with decision region to maximize inter-class distance across dialects and optimize the classification decision.It is shown via experimental results on Aishell2 and Datatang-Dialect datasets that the proposed model can achieve higher performance than the traditional model.All the above improvements contribute to the improvement of model performance.Meanwhile,the results of the ablation experiments demonstrate that these improvements in phonetic posteriorgram features,convolutional block attention module,ECAPA-TDNN,and additive angular margin loss,contribute to the improvement of identification accuracy.
作者 冯罡 陈宁 FENG Gang;CHEN Ning(School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China)
出处 《华东理工大学学报(自然科学版)》 CAS CSCD 北大核心 2023年第6期900-906,共7页 Journal of East China University of Science and Technology
基金 国家自然科学基金面上项目(61771196)。
关键词 方言识别 音素特征 自注意力机制 ECAPA-TDNN 特征提取 dialect identification phonetic feature self-attention mechanism ECAPA-TDNN feature extractor
  • 相关文献

同被引文献11

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部