摘要
舌头是人类重要的发音器官,对发音时其形状的降维分析能有效协助语言学家分析人类的发音模式。主成分分析(Principal Component Analysis, PCA)是目前最常用的舌位轮廓降维分析方法。近年来,基于深度学习的自动编码器在降维方面被证明优于PCA。然而,舌头隐藏于口腔内部,难以获得大量的相关数据,这使得传统自动编码器无法直接用于舌位轮廓建模研究。为此,本文提出一种面向小样本舌位运动轮廓数据的双阶段自动编码器降维方法。首先该方法采用主动形状模型(Active Shape Model, ASM)产生大量舌头轮廓生理变形数据,并构建通用轮廓重建模型;接着,在第一阶段模型上添加降维层,用于对舌位轮廓数据进行压缩和分析。实验选取了从人类发音X光片中获得的240个元音舌形数据,并将该方法与传统PCA方法进行比较。结果表明,所提出方法获得的元音舌位图谱在二维平面上相对于传统PCA方法,区分度更好,具有更好的舌形降维和重建能力。
The tongue plays a crucial role in human speech production.The dimensionality reduction analysis of tongue pronunciation can effectively assist linguists in analyzing human pronunciation patterns.Traditional methods for tongue position contour compression often relay on Principal Component Analysis(PCA)for dimensionality reduction.In recent years,deep-learning-based autoencoders have been widely used for data compression.However,they require a large number of samples and cannot be directly and effectively used for tongue motion pattern researches.Besides,obtaining a substantial volume of tongue movement data has been challenging due to the tongue's location within the oral cavity.To address these limitations,this paper introduces a two-stage autoencoder dimensionality reduction method designed for small-sample tongue motion contour data.Firstly,Active Shape Model(ASM)is used to generate a large amount of physiological deformation data of tongue contour,and a general tongue contour reconstruction model is constructed based on a conventional automatic encoder.Secondly,on the basis of the automatic encoder in the previous stage,an additional network layer is added to compress and analyze the tongue position data.In experiments,240 vowel and tongue shape datasets obtained from X-ray films of human speech are selected.The tongue position model and traditional PCA methods were compared.The results show that the vowel tongue position map obtained by the proposed method exhibits better discrimination on the two dimensional plane,and has better tongue shape reconstruction performance.
作者
徐正丽
肖素芳
简敏
杨明浩
XU Zhengli;XIAO Sufang;JIAN Min;YANG Minghao(Guilin University of Electronic Technology,Guilin,Guangxi,541004,China;Institute of Automation of the Chinese Academy of Sciences,Beijing,100190,China)
出处
《广西科学》
CAS
北大核心
2023年第4期745-753,共9页
Guangxi Sciences
基金
国家自然科学基金项目(71463010,22180155466)
广西科技计划项目(2021GXNSFBA220048,桂科AB21220038)
桂林科技计划项目(2023010123)资助。
关键词
深度神经网络
自动编码器
主成分分析
舌位轮廓
隐藏单元
deep neural network
autoencoder
Principle Component Analysis(PCA)
tongue contour
hidden units