期刊文献+

一种增量式学习的语音字典构造方法 被引量:1

An Incremental Learning Approach in Voice Compression via Sparse Dictionary Learning
下载PDF
导出
摘要 爆炸式增长的语音数据为存储与传输带来极大困难,现有方法难以实时应对海量语音频域数据.因此本文提出一种增量式学习的语音字典构造方法,该方法先将语音时域信号经短时傅里叶变换处理后转换为各窗频谱幅值,再将高维空间向量投影到低维空间,并以字典中的少数基向量线性拟合当前窗向量.进而通过存储基向量的标识和拟合系数完成对当前窗向量的存储,把无法拟合的窗向量经处理后加入字典,实现增量式学习.解压过程依据用户请求将字典中指定条目经线性拟合实现.实验结果表明,本方法能大幅度压缩语音频谱包络,适用于受带宽限制下实时高采样率的流式语音数据,与同类算法相比,在保证还原质量的情况下,能对信号的存储空间以及传输带宽进行大幅度的压缩. The explosive growth of audio streams brings difficulties in storage and transmission; however, many methods could not give high compression ratio while keeping the quality. In order to solve this problem, the proposed method compresses amplitude spectrum of voice by constructing a dynamic sparse voice dictionary based on incremental learning. It calculates amplitude envelopes spectrums via Short-Time Fourier Transform(STFT)firstly, and then it uses a dictionary to fit each envelope by projecting high dimensional vectors to several 2 D planes. In addition, it minimizes the number of dictionary items and therefore can store the parameters of linear interpolation instead of spectrums. Otherwise, if the fitting step above fails, it will store this window of spectrum directly. By using dictionary and parameters of linear interpolation, it can reconstruct the spectrum efficiently in decompressing process. The results of experiments show that comparing with other methods, the proposed method gives high compression ratio as well as better accuracy in decompressing, and adapt to live voice stream encoding with high sampling rate.
作者 滕少华 宋欢 霍颖翔 张巍 Teng Shao-hua;Song Huan;Huo Ying-xiang;Zhang Wei(School of Computers, Guangdong University of Technology, Guangzhou, 510006, China)
出处 《广东工业大学学报》 CAS 2018年第3期29-36,共8页 Journal of Guangdong University of Technology
基金 国家自然科学基金资助项目(61402118 61673123 61603100 61702110) 广东省科技计划项目(2015B090901016 2016B010108007) 广东省教育厅项目(粤教高函[2018]1号 粤教高函[2015]113号 粤教高函[2014]97号) 广州市科技计划项目(201604020145 2016201604030034 201508010067 201604046017)
关键词 语音压缩 语音解压 实时处理 流式数据 增量学习 稀疏字典学习 voice compression voice decompression real-time processing streaming data incremental learning sparse dictionary learning
  • 相关文献

参考文献4

二级参考文献46

  • 1邓维斌,王国胤,王燕.基于Rough Set的加权朴素贝叶斯分类算法[J].计算机科学,2007,34(2):204-206. 被引量:43
  • 2Benesty J,Makino S,Chen J.Speech enhancement[M].Berlin,Germany:Springer,2005.
  • 3Hao J C,Attias H,Nagarajan S,Lee T W,Sejnowski T J.Speech enhancement,gain,and noise spectrum adaptation using approximate bayesian estimation[J].IEEE Transactions on Audio,Speech,and Language Processing,2009,17(1):24-37.
  • 4Yoshioka T,Nakatani T,Okuno H G.Noisy speech enhancement based on prior knowledge about spectral envelope and harmonic structure[A].2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP)[C],2010:4270-4273.
  • 5Tantibundhit C,Pernkopf F,Kubin G.Joint time-frequency segmentation algorithm for transient speech decomposition and speech enhancement[J].IEEE Transactions on Audio,Speech,and Language Processing,2010,18(6):1417-1428.
  • 6Mallat S,Zhang Z.Matching pursuits with time-frequency dictionaries[J].IEEE Transactions on Signal Processing,1993,41:3397-3415.
  • 7Gowreesunker B V,Tewfik A H.Learning sparse representation using iterative subspace identification[J].IEEE Transactions on Signal Processing,2010,58 (6):3055-3065.
  • 8Aharon M,Elad M,Bruckstein A.K-SVD:an algorithm for designing overcomplete dictionaries for sparse representation[J].IEEE Transactions on Signal Processing,2006,54(11):4311-4322.
  • 9Donoho D,Johnstone I M.Ideal spatial adaptation by wavelet shrinkage[J].Biomet rika,1994,81(3):425-455.
  • 10Chen S S,Donoho D L,and Saunders M A.Atomic decomposition by basis pursuit[J].SIAM Review,2001,43(1):129-159.

共引文献26

同被引文献2

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部