在以包为单位进行数据传输合、语音应用程序(VOIP,Voice Over Internet Protocol)中,为了补偿数据包在网络传输中不可预知的网络传输延迟,在接收端首先必须把接收到的数据包缓存起来,缓存一定的时间再播放出来,以减少通话的抖动,得到比...在以包为单位进行数据传输合、语音应用程序(VOIP,Voice Over Internet Protocol)中,为了补偿数据包在网络传输中不可预知的网络传输延迟,在接收端首先必须把接收到的数据包缓存起来,缓存一定的时间再播放出来,以减少通话的抖动,得到比较满意的通话质量。文章主要研究动态缓出时延算法,力求使这个缓出时延尽可能小,同时尽可能减少包的丢失率。文章提出了一个有效动态缓出时延算法,该算法主要跟踪最近到达的数据包的网络传输时延求出其近似分布函数,并利用这些信息和延迟峰的侦测算法预测下一个语音峰的缓出时延。实验结果表明利用该算法可以在缓出时延和包丢失率之间达到最佳平衡,是一种理想、有效的算法。展开更多
The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of au...The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of automatic speech recognition(ASR) systems.In this overview paper,we review the techniques proposed in the last two decades in attacking this problem.We focus our discussions on the speech separation problem given its central role in the cocktail party environment,and describe the conventional single-channel techniques such as computational auditory scene analysis(CASA),non-negative matrix factorization(NMF) and generative models,the conventional multi-channel techniques such as beamforming and multi-channel blind source separation,and the newly developed deep learning-based techniques,such as deep clustering(DPCL),the deep attractor network(DANet),and permutation invariant training(PIT).We also present techniques developed to improve ASR accuracy and speaker identification in the cocktail party environment.We argue effectively exploiting information in the microphone array,the acoustic training set,and the language itself using a more powerful model.Better optimization ob jective and techniques will be the approach to solving the cocktail party problem.展开更多
文摘在以包为单位进行数据传输合、语音应用程序(VOIP,Voice Over Internet Protocol)中,为了补偿数据包在网络传输中不可预知的网络传输延迟,在接收端首先必须把接收到的数据包缓存起来,缓存一定的时间再播放出来,以减少通话的抖动,得到比较满意的通话质量。文章主要研究动态缓出时延算法,力求使这个缓出时延尽可能小,同时尽可能减少包的丢失率。文章提出了一个有效动态缓出时延算法,该算法主要跟踪最近到达的数据包的网络传输时延求出其近似分布函数,并利用这些信息和延迟峰的侦测算法预测下一个语音峰的缓出时延。实验结果表明利用该算法可以在缓出时延和包丢失率之间达到最佳平衡,是一种理想、有效的算法。
基金supported by the Tencent and Shanghai Jiao Tong University Joint Project
文摘The cocktail party problem,i.e.,tracing and recognizing the speech of a specific speaker when multiple speakers talk simultaneously,is one of the critical problems yet to be solved to enable the wide application of automatic speech recognition(ASR) systems.In this overview paper,we review the techniques proposed in the last two decades in attacking this problem.We focus our discussions on the speech separation problem given its central role in the cocktail party environment,and describe the conventional single-channel techniques such as computational auditory scene analysis(CASA),non-negative matrix factorization(NMF) and generative models,the conventional multi-channel techniques such as beamforming and multi-channel blind source separation,and the newly developed deep learning-based techniques,such as deep clustering(DPCL),the deep attractor network(DANet),and permutation invariant training(PIT).We also present techniques developed to improve ASR accuracy and speaker identification in the cocktail party environment.We argue effectively exploiting information in the microphone array,the acoustic training set,and the language itself using a more powerful model.Better optimization ob jective and techniques will be the approach to solving the cocktail party problem.