摘要
连续汉语语音的自动切分是语音识别的基础,准确的连续语音切分方法可以代替人工标记汉字音节。传统的连续汉语语音自动切分技术如双门限端点检测、基于倒谱的端点检测等方法的效果都难以满足语音识别的需要。论文在时间域、频域及倒谱域等多个层次对连续语音信号进行分析,结合端点检测技术、频谱分析和倒等方法对音节切分点进行检测,研究了一种连续语音多级切分方法。相比传统的基于双门限和倒谱的端点检测方法,该方法将单字切分的正确率达到了92.8%。
The automatic segmentation of continuous speech is the basis of speech recognition.An accurate continuous speech segmentation method can replace manual marking of Chinese syllables.Traditional continuous Chinese speech automatic segmentation techniques such as dual-threshold endpoint detection,cepstrum-based endpoint detection and other methods are difficult to meet the needs of speech recognition.This paper analyzes the continuous speech signal at multiple levels in the time domain,frequency domain,and cepstrum domain,and combines endpoint detection technology,spectrum analysis and cepstrum analysis to detect the segmentation points,and a multi-level segmentation method for continuous Chinese speech is studied.Compared with the traditional endpoint detection method based on VAD and cepstrum,the method improves the accuracy to 92.8%.
作者
李琦
张二华
LI Qi;ZHANG Erhua(School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094)
出处
《计算机与数字工程》
2023年第4期959-964,共6页
Computer & Digital Engineering
基金
军委装备发展部十三五装备预研领域基金项目(编号:61403120102)资助。
关键词
语音切分
端点检测
语谱图
双门限法
频带能量
speech segmentation
endpoint detection
spectrogram
voice activity detection(VAD)
energy of frequency range