基于预处理的DOA估计和基频双输入的语音分割

Speech segmentation based on preprocessing DOA estimation and fundamental frequency dual input

下载PDF

导出

摘要语音分割是语音分离系统的一个重要组成部分,它在信源估计和多说话人环境中的自动语音识别、多声源目标跟踪等许多应用中都起着重要的作用,重叠语音的分割一直都是这项工作的重点。在实际生活中,室内的麦克风采集的语音信号通常都包含混响和噪声信号,它们使接收信号的语音质量变差,影响了波达方向估计特征的精度,导致多声源重叠语音的分割性能下降。针对现有的多声源分割方法对噪声和混响信号鲁棒性差的问题,提出了一种通过预处理来消除语音信号中的明显异常噪声和混响信号的方法。该方法使用广义旁瓣相消器和维纳滤波器实现的后滤波器相结合对原始语音信号进行处理,消除了混响和噪声信号,使语音质量得到了提高,进而使波达方向特征估计更加准确。最后用多假设跟踪同时跟踪说话人的基频特征和波达方向特征来进行分割,以多声源重叠语音为例,对AMI语料库中的16个会议音频进行了统计与分析,结果表明,与未进行预处理的方法相比,平均命中率(HIT)提高了2.10%。 Speech segmentation is an important component of speech separation systems,which plays an important role in many ap-plications such as source estimation and automatic speech recognition in multi-speaker environments,multi-source target tracking,etc.Segmentation of overlapping speech has always been the focus of this work.In real life,the speech signals collected by micro-phones in rooms usually contain reverberation and noise signals,which deteriorate the speech quality of the received signals and af-fect the accuracy of the estimated features of the boda direction,leading to the degradation of the segmentation performance of multi-source overlapping speech.To address the problem that existing multi-source segmentation methods are poorly robust to noise and reverberant signals,a method is proposed to eliminate apparently abnormal noise and reverberant signals in speech signals by pre-processing.The method uses a combination of a generalized parametric phase canceller and a post-filter implemented with a Wiener filter to process the original speech signal,eliminating the reverberant and noisy signals,resulting in improved speech quality and,in turn,more accurate estimation of the direction of arrival features.The segmentation is then performed by tracking the speaker's fun-damental frequency features and direction of arrival features simultaneously using multi-hypothesis tracking.16 conference audios from the AMI corpus are statistically and analytically analyzed with multi-source overlapping speech,and the results show that the average hit rate(HIT)rate is improved by 2.10%compared with the method without pre-processing.

作者王玫成家礼 WANG Mei;CHENG Jiali(Ministry of Education Key Laboratory of Cognitive Radio and Information Processing,Guilin University of Electronic Technology,Guilin 541004,China;College of Physics and Electronic Information Engineering,Guilin University of Technology,Guilin 541006,China)

机构地区桂林电子科技大学认知无线电与信息处理省部共建教育部重点实验室桂林理工大学物理与电子信息工程学院

出处《桂林电子科技大学学报》 2024年第4期348-354,共7页 Journal of Guilin University of Electronic Technology

基金国家自然科学基金(62071135) 广西自然科学基金(2019GXNSFBA245103) 桂林电子科技大学研究生教育创新计划(2021YCXS037)。

关键词语音分割广义旁瓣相消器维纳滤波器波达方向多假设跟踪基频 speech segmentation generalized sidelobe canceller Wiener filter direction of arrival multiple hypothesis tracking fundamental frequency

分类号 TN911.7 [电子电信—通信与信息系统] TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1郭玉霞,孟中杰,刘琪,傅嘉政.一种基于幅度相位信息辅助的多假设跟踪算法[J].航空兵器,2024,31(5):88-95.
2刘鲁涛,赵梓君,李利.极化敏感阵列方位依赖误差校正算法[J].国防科技大学学报,2024,46(6):174-183.
3张浩,饶刚,高小清,屈少举,邓休.某车型关门时安全带卷收器异响问题分析和优化[J].应用声学,2024,43(6):1230-1235.
4陈小兰,杨昊,陈敏,邹茂扬,周航.基于DOCnet的强对流天气分类识别[J].软件导刊,2024,23(11):39-46.
5彭祖剑.基于Mamba-UNet架构的音高估计模型[J].电声技术,2024,48(9):50-52.
6王健宗,张旭龙,姜桂林,程宁,肖京.基于分层联邦框架的音频模型生成技术研究[J].智能系统学报,2024,19(5):1331-1339.
7韦朝霞,李丽华,罗庆禄,唐桂华.人工智能在帕金森病构音障碍研究中的应用进展[J].中华神经科杂志,2024,57(11):1259-1263.
8孟祥天,经哲涵,曹丙霞,沙明辉,朱应申,闫锋刚.基于实值子空间线性变换的非均匀圆形阵列高效二维测向方法[J].电子与信息学报,2024,46(11):4328-4334.
9于振华,胡旭飞,叶鸥.类别条件生成对抗网络的语音对抗样本生成方法[J].西安交通大学学报,2024,58(12):153-164.
10孙家宝,施伟,王身云.基于噪声去加权的阵列幅相误差校正方法[J].电子信息对抗技术,2024,39(6):75-81.

桂林电子科技大学学报

2024年第4期

浏览历史

内容加载中请稍等...

基于预处理的DOA估计和基频双输入的语音分割

相关作者

相关机构

相关主题

浏览历史