摘要
目的探究现阶段的AI合成语音与真人语音在声纹检验方面的差异。方法通过收集两名AI虚拟主播及其各自原型的语音,以声纹鉴定的角度进行听觉感知、语谱分析两个方面的研究。结果合成语音在听觉感知上仍能发现缺乏情感和自然度、断句错误等问题,基于实验所用语音高频共振峰的相对稳定性,合成语音与其原型的差异主要表现在4 kHz以上的高频共振峰上,有些音节在3 kHz以上即能显出差别,合成语音部分音节内的辅音-元音过渡段缺失。结论在当前技术水平下,合成语音在处理韵律问题上有待提高,听觉分析可作判断合成语音的声纹检验参考。在语谱分析中能在合成语音和真人语音的高频图谱以及部分音节的辅音-元音过渡中呈现差异。
Objective To explore the differences between AI-synthesized speech and human speech in voiceprint inspection at the present stage.Methods By collecting the voices of two AI virtual anchors and their respective prototypes,two aspects of auditory perception and language spectrum analysis were conducted from the perspective of voiceprint identification.Results Synthetic speech still suffers from lack of emotion,unnatural speech,and punctuation errors in auditory perception.Based on the relative stability of the high-frequency formants of the speech used in the experiment,the difference between the synthesized speech and its prototype was mainly manifested in the high-frequency resonance above 4 kHz.On the peak,some syllables can show differences above 3 kHz,and the consonant-vowel transition in some syllables of synthesized speech is missing.Conclusion At the current level of technology,synthetic speech needs to be improved in dealing with prosody issues,and auditory analysis can be used as a reference for voiceprint test for judging synthetic speech.In the spectral analysis,differences can be shown in the high-frequency maps of synthetic and real speech and the consonant-vowel transitions of some syllables.
作者
张学海
杨璐铭
ZHANG Xuehai;YANG Luming(Forensic Science Center of Guangdong Provincial Public Security Bureau,Guangzhou 510050,China)
出处
《中国司法鉴定》
2022年第2期69-72,共4页
Chinese Journal of Forensic Sciences
关键词
AI虚拟主播
合成语音
声纹鉴定
AI virtual announcer
synthetic speech
voiceprint identification