On‐device audio‐visual multi‐person wake word spotting

下载PDF

导出

摘要 Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection performance.However,most audio‐visual wake word spotting models are only suitable for simple single‐speaker scenarios and require high computational complexity.Further development is hindered by complex multi‐person scenarios and computational limitations in mobile environments.In this paper,a novel audio‐visual model is proposed for on‐device multi‐person wake word spotting.Firstly,an attention‐based audio‐visual voice activity detection module is presented,which generates an attention score matrix of audio and visual representations to derive active speaker representation.Secondly,the knowledge distillation method is introduced to transfer knowledge from the large model to the on‐device model to control the size of our model.Moreover,a new audio‐visual dataset,PKU‐KWS,is collected for sentence‐level multi‐person wake word spotting.Experimental results on the PKU‐KWS dataset show that this approach outperforms the previous state‐of‐the‐art methods.

作者 Yidi Li Guoquan Wang Zhan Chen Hao Tang Hong Liu

机构地区 Key Laboratory of Machine Perception College of Computer and Information Computer Vision Lab

出处《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第4期1578-1589,共12页 智能技术学报（英文）

基金 supported by the National Key R&D Program of China(No.2020AAA0108904) the Science and Technology Plan of Shenzhen(No.JCYJ20200109140410340).

关键词 audio‐visual fusion human‐computer interfacing speech processing

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

参考文献1

1Hong Liu,Yongheng Sun,Ge Yang,Yang Chen.Binaural sound source localization based on weighted template matching[J].CAAI Transactions on Intelligence Technology,2021,6(2):214-223. 被引量：2

共引文献1

1赵卫东,徐鑫蔚,宋睿,杨明亮.汽车异响噪声源定位方法研究进展与展望[J].重庆理工大学学报（自然科学）,2022,36(10):73-83. 被引量：2

1Yidi Li,Jiale Ren,Yawei Wang,Guoquan Wang,Xia Li,Hong Liu.Audio-visual keyword transformer for unconstrained sentence-level keyword spotting[J].CAAI Transactions on Intelligence Technology,2024,9(1):142-152.
2Amal Hkiri,Mouna Karmani,Omar Ben Bahri,Ahmed Mohammed Murayr,Fawaz Hassan Alasmari,Mohsen Machhout.RPL-Based IoT Networks under Decreased Rank Attack:Performance Analysis in Static and Mobile Environments[J].Computers, Materials & Continua,2024,78(1):227-247.
3Liang Tao,Maoshen Jia,Lu Li,Jing Wang,Yang Xiang.Multisource localization based on angle distribution of time-frequency points using an FOA microphone[J].CAAI Transactions on Intelligence Technology,2023,8(3):807-823.
4Farhad Abedinzadeh Torghabeh,Seyyed Abed Hosseini,Elham Ahmadi Moghadam.Enhancing Parkinson's disease severity assessment through voice-based wavelet scattering,optimized model selection,and weighted majority voting[J].Medicine in Novel Technology and Devices,2023(4):51-63. 被引量：1
5Md. Saifur Rahman,Any Chowdury,Nargis Parvin,Arpita Saha,Moinur Rahman.Frame Length Dependency for Fundamental Frequency Extraction in Noisy Speech[J].Journal of Signal and Information Processing,2024,15(1):1-17.
6Dayong Liu,Jie Liu,Qihang Li,Nixuan Guo,Tong Chen,Qiaoran Menag.Technology inflow following high-speed railway:Evidence from Chinese cities[J].Journal of Management Science and Engineering,2023,8(4):570-583. 被引量：2
7Zefeng Zheng,Luyao Teng,Wei Zhang,Naiqi Wu,Shaohua Teng.Knowledge Transfer Learning via Dual Density Sampling for Resource-Limited Domain Adaptation[J].IEEE/CAA Journal of Automatica Sinica,2023,10(12):2269-2291. 被引量：1
8Wanyu Luo,Yanqing Wang,Yujia Liu,Yiqin Xu.Design and Implementation of Speech Generation and Demonstration Research Based on Deep Learning[J].国际计算机前沿大会会议论文集,2023(1):475-486.
9Long Jin,Predrag S.Stanimirović.Guest Editorial:Special issue on recurrent dynamic neural networks:Theory and applications[J].CAAI Transactions on Intelligence Technology,2023,8(3):547-548.
10ZHANG Yunzuo,LIU Yameng.Video summarization via global feature difference optimization[J].Optoelectronics Letters,2023,19(9):570-576.

CAAI Transactions on Intelligence Technology

2023年第4期

浏览历史

内容加载中请稍等...

On‐device audio‐visual multi‐person wake word spotting

参考文献1

共引文献1

相关作者

相关机构

相关主题

浏览历史