摘要
传统即时通信隐藏检测方法主要采用基于监督学习的检测方式,导致部署前需大量复杂的人工预处理,同时训练数据集与测试数据集分布的差异会影响检测的准确率。针对以上问题,该文首先重点针对即时语音通信隐蔽信道提出了一种全新的半监督混合式检测模型,该模型不存在人工挑选与标注训练数据集的过程,解决检测操作人工预处理复杂和适用性差的问题;然后设计了基于自学习的多准则融合模块,用于自行生成伪标注数据集,其可信度和代表度共同决定了即时语音通信隐藏检测系统的性能,且不存在语音通信隐藏检测中训练与测试集分布失配的情况;最后针对即时语音通信中常见的低码率语音流载体进行实验分析,在失配状况下基于有监督的检测方法以及无监督检测方法相比,其准确率具有明显优势;当训练样本与测试样本的分布不匹配时,该方法相比有监督的检测方法所受的影响更小。同时,实验显示该方法可以适用于多种编码检测过程。
Existing instant voice communication steganalysis schemes are mainly based on supervised learning classifiers. These kinds of methods need large amounts of pre-processing and training and their accuracy can he easily destroyed by differences between the distribution of the training and testing data sets. This paper describes a semi-supervised hybrid detection model to improve detection which removes the manually annotated training data set, so this model is simpler and gives better detection scopes. This paper also describes a self-learning, multi-criteria fusion module which can automatically generate pseudo-labelled sets and combines the confidence and representative levels to judge the performance of instant voice communica tion steganalysis. There is no distribution mismatch between the testing data and the training data in this method. Tests with common low bit-rate speech coding carriers show that this method is more accurate than the un-supervised method and the supervised method in mismatched conditions. When the distributions of the training and testing data sets differ, this method is less affected than the supervised method. The tests also show that this method can be deployed on different kinds of speech codecs.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2015年第11期1246-1252,共7页
Journal of Tsinghua University(Science and Technology)
基金
国家自然科学基金重点项目(U1405254
61472092)
中国博士后基金项目(2015M581101)
关键词
即时语音通信
半监督学习
信息隐藏检测
可信度
代表度
instant voice communication
semi-supervised learning
steganalysis
confidence level
representative level