期刊文献+

基于音视频特征融合的婴幼儿哭泣检测方法研究 被引量:1

Research of Infant Crying Detection Method Based on Audio and Video Fusion
下载PDF
导出
摘要 目前婴儿哭泣检测领域单模态方法的识别精度难以提升,而婴幼儿相关的视频数据日益增加,在此背景下论文提出一种音视频融合的双模态方法检测婴儿哭泣,来达到进一步提高婴儿哭泣识别率目的。论文首先制作复杂环境下婴儿哭泣和非哭泣二分类的音视频数据集,并基于该数据集设计7种对比实验与CNN-3DCNN+LSTM音视频融合网络进行比较。实验表明该融合方法 F1-score分数达到了93.2%,相比较单模态最优分数高5.3%、多模态网络基准线高4.3%。证明了音视频融合方法在婴儿哭泣识别领域可行性。 At present,the recognition accuracy of unimodal methods in the field of infant crying detection is difficult to im-prove,and the video data related to infants is increasing.Based on this context,this paper proposes a CNN+3DCNN+LSTM au-dio-video fusion bimodal method to detect infant crying and further improve the recognition rate of infant crying.This paper first pro-duces audio-video datasets of crying and non-crying bimodal infants in complex environments,and designs seven comparison exper-iments based on this dataset to compare with CNN+3DCNN+LSTM fusion networks.The experiments show that this fusion method achieves an F1-score of 93.2%,which is 5.3%higher than the unimodal optimal score and 4.3%higher than the multimodal net-work baseline.It proves the feasibility of CNN+3DCNN+LSTM audio-video fusion method in the field of infant crying recognition.
作者 刘朋 周娴玮 龚启旭 余松森 LIU Peng;ZHOU Xianwei;GONG Qixu;YU Songsen(School of Software,South China Normal University,Foshan 528225)
出处 《计算机与数字工程》 2023年第7期1534-1539,共6页 Computer & Digital Engineering
关键词 婴幼儿哭泣 音视频融合 深度学习 多模态网络 infants crying audio and video fusion deep learning multimodal network
  • 相关文献

参考文献2

二级参考文献4

共引文献13

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部