期刊文献+
共找到8篇文章
< 1 >
每页显示 20 50 100
基于分层联邦框架的音频模型生成技术研究
1
作者 王健宗 张旭龙 +2 位作者 姜桂林 程宁 肖京 《智能系统学报》 CSCD 北大核心 2024年第5期1331-1339,共9页
针对音频模型,围绕下一代音频生成技术研究,构建联邦音频模型训练框架,面向超大规模音频数据进行音频表征学习,为音频下游任务提供高效鲁棒的解决方案。提出一种适用于音频模型的联邦学习框架,解决数据异构性、通信效率、隐私保护等问题... 针对音频模型,围绕下一代音频生成技术研究,构建联邦音频模型训练框架,面向超大规模音频数据进行音频表征学习,为音频下游任务提供高效鲁棒的解决方案。提出一种适用于音频模型的联邦学习框架,解决数据异构性、通信效率、隐私保护等问题;提出一种基于对比学习的音频模型的预训练方法,利用<音频,文本描述>数据学习语义特征,提高模型的泛化能力和多样化能力;提出一种基于提示学习的音频生成微调方法,利用少量标注数据提高模型的适应能力和定制化能力;提出一种音频模型分布式优化算法进行模型压缩,降低模型的复杂度和资源消耗,提高模型的部署效率和运行效率。通过在下游任务音效转换上的实验,提出的方法在语音质量平均意见得分可以达到3.81。实验结果表明,该方法在音效转换任务上取得了良好的效果。 展开更多
关键词 音频模型 联邦学习框架 音频表征学习 数据异构性 隐私保护 对比学习 提示学习 模型压缩
下载PDF
基于ACL的蓝牙音频应用模型及算法研究 被引量:1
2
作者 郭昌建 吴永忠 《计算机技术与发展》 2008年第9期68-71,75,共5页
论述了蓝牙协议的体系结构、核心协议、蓝牙传输机制、硬件模块的组成和部分多媒体音频格式。针对基于同步SCO链路的典型蓝牙音频应用模型所存在的一些固有缺陷,提出了一种基于ACL异步链路的新型BRTAAM蓝牙音频应用模型及相应的BRTATP算... 论述了蓝牙协议的体系结构、核心协议、蓝牙传输机制、硬件模块的组成和部分多媒体音频格式。针对基于同步SCO链路的典型蓝牙音频应用模型所存在的一些固有缺陷,提出了一种基于ACL异步链路的新型BRTAAM蓝牙音频应用模型及相应的BRTATP算法,详细地阐述了该算法的同步原理及数据分组格式,最后指出了进一步的研究目标。 展开更多
关键词 蓝牙 音频应用模型 算法 异步链路
下载PDF
音频定义模型简介
3
作者 张静琦 李薰春 《电声技术》 2020年第9期24-27,共4页
随着数字音频技术的不断发展,国际通用的音频元数据对于音频文件的传输交互和沉浸式用户体验变得越来越重要。因此,简要介绍国际电联无线电通信部门在音频定义模型元数据方面的相关标准建议内容,并阐述了元数据在高级音频系统中的广播... 随着数字音频技术的不断发展,国际通用的音频元数据对于音频文件的传输交互和沉浸式用户体验变得越来越重要。因此,简要介绍国际电联无线电通信部门在音频定义模型元数据方面的相关标准建议内容,并阐述了元数据在高级音频系统中的广播应用情况。 展开更多
关键词 音频定义模型 元数据 高级音频系统
下载PDF
视音频内容比对及异态监测的智能播出信号监测系统分析 被引量:1
4
作者 陈思平 司佳 +1 位作者 袁洲 杨艺西 《科技资讯》 2023年第21期42-45,共4页
互联网络的快速普及,使各种节目之间竞争变得更加激烈,为保证自身节目的品质,确保在播出时不会出现异常状况,保证受众观看体验,各节目纷纷加大了对自身视音频内容的优化力度,技术人员也开始针对视音频内容对比与异态监测方式进行了深度... 互联网络的快速普及,使各种节目之间竞争变得更加激烈,为保证自身节目的品质,确保在播出时不会出现异常状况,保证受众观看体验,各节目纷纷加大了对自身视音频内容的优化力度,技术人员也开始针对视音频内容对比与异态监测方式进行了深度研究,开始运用智能化技术,展开播出信号监测系统建设,希望能够更好地完成播出信号的监控,及时发现异常情况,以便保证最终视音频内容呈现效果。通过对传统监测系统应用存在问题的分析,对基于视音频比对监测模型进行探讨,并在此基础上着重对视音频内容比对及异态监测智能播出信号监测系统进行研究,希望能够为智能化监测系统的应用和发展提供一些参考。 展开更多
关键词 音频比对监测模型 信号节点 智能播出信号监测系统 音频内容比对 异态监测
下载PDF
Segregation of voiced and unvoiced components from residual of speech signal 被引量:1
5
作者 JO Cheol-woo KIM Jae-hee 《Journal of Central South University》 SCIE EI CAS 2012年第2期496-503,共8页
In conventional source-filter models, voiced and unvoiced components were considered independently. However, in practice it was difficult to separate the source into two parts. An actual source consists of a mixture o... In conventional source-filter models, voiced and unvoiced components were considered independently. However, in practice it was difficult to separate the source into two parts. An actual source consists of a mixture of two sources and the ratio varies according to the content or the intention of speaker. It had been investigated to separate the voiced and unvoiced components for different source models. Source signals were modeled based on the residual signal measured from inverse filtering. Three different source models were assumed. The parameters of each model were optimized for the original speech signal using a genetic algorithm. The resulting parameters were compared in terms of the mel-cepstral distance to the original signal, the spectrogram and the spectral envelope from the synthesized signal. The optimization method achieves an improvement of 15% for the Klatt model, but there is little improvement in the modified residual case. 展开更多
关键词 voice source model SYNTHESIS optimization genetic algorithm
下载PDF
SCALABLE PERCEPTUAL AUDIO REPRESENTATION WITH AN ADAPTIVE THREE TIME-SCALE SINUSOIDAL SIGNAL MODEL
6
作者 Al-Moussawy Raed 《Journal of Electronics(China)》 2004年第3期213-221,共9页
This work is concerned with the development and optimization of a signal model for scalable perceptual audio coding at low bit rates. A complementary two-part signal model consisting of Sines plus Noise (SN) is descri... This work is concerned with the development and optimization of a signal model for scalable perceptual audio coding at low bit rates. A complementary two-part signal model consisting of Sines plus Noise (SN) is described. The paper presents essentially a fundamental enhancement to the sinusoidal modeling component. The enhancement involves an audio signal scheme based on carrying out overlap-add sinusoidal modeling at three successive time scales, large, medium, and small. The sinusoidal modeling is done in an analysis-by-synthesis overlap- add manner across the three scales by using a psychoacoustically weighted matching pursuits. The sinusoidal modeling residual at the first scale is passed to the smaller scales to allow for the modeling of various signal features at appropriate resolutions.This approach greatly helps to correct the pre-echo inherent in the sinusoidal model. This improves the perceptual audio quality upon our previous work of sinusoidal modeling while using tile same number of sinusoids. Tile most obvious application for the SN model is in scalable, high fidelity audio coding and signal modification. 展开更多
关键词 Multiresolution sinusoidal modeling Parametric audio coding Low-rate audio coding Signal modifications
下载PDF
STATISTICAL FEATURE OF PITCH FREQUENCY DISTRIBUTIONS FOR OBUST SPEAKER IDENTIFICATION
7
作者 ZhangLinghua ZhengBaoyu YangZhen 《Journal of Electronics(China)》 2005年第4期437-442,共6页
This letter proposes an effective and robust speech feature extraction method based on statistical analysis of Pitch Frequency Distributions (PFD) for speaker identification. Compared with the conventional cepstrum, P... This letter proposes an effective and robust speech feature extraction method based on statistical analysis of Pitch Frequency Distributions (PFD) for speaker identification. Compared with the conventional cepstrum, PFD is relatively insensitive to Additive White Gaussian Noise (AWGN), but it does not show good performance for speaker identification, even if under clean environments. To compensate this shortcoming, PFD and conventional cepstrum are combined to make the ultimate decision, instead of simply taking one kind of features into account.Experimental results indicate that the hybrid approach can give outstanding improvement for text-independent speaker identification under noisy environments corrupted by AWGN. 展开更多
关键词 Speaker identification Feature extraction Pitch frequency Gaussian Mixture Model (GMM)
下载PDF
Improved Sinusoid Analysis and Post-Processing in Parametric Audio Coding
8
作者 周宏 陈健 《Journal of Shanghai Jiaotong university(Science)》 EI 2003年第2期163-168,共6页
This paper proposed improvements to the low bit rate parametric audio coder with sinusoid model as its kernel. Firstly, we propose a new method to effectively order and select the perceptually most important sinusoids... This paper proposed improvements to the low bit rate parametric audio coder with sinusoid model as its kernel. Firstly, we propose a new method to effectively order and select the perceptually most important sinusoids. The sinusoid which contributes most to the reduction of overall NMR is chosen. Combined with our improved parametric psychoacoustic model and advanced peak riddling techniques, the number of sinusoids required can be greatly reduced and the coding efficiency can be greatly enhanced. A lightweight version is also given to reduce the amount of computation with only little sacrifice of performance. Secondly, we propose two enhancement techniques for sinusoid synthesis: bandwidth enhancement and line enhancement. With little overhead, the effective bandwidth can be extended one more octave; the timbre tends to sound much brighter, thicker and more beautiful. 展开更多
关键词 parametric audio coding SINUSOID POST-PROCESSING
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部