摘要
深度学习模型有时会将一些未知类别数据误分类为已知类别,这些未知类别数据定义为在某些领域的分布外数据,例如生物信息、医疗保健、自动驾驶和网络安全等,这样的误分行为将会导致严重的后果。对网络流量识别与分类技术以及分布外数据进行了简要介绍,提出了一种在测试样本中检测存在分布外数据的方法。根据分布外数据特点,通过训练并计算2个模型得到的结果的似然比判断分布外数据。在网络流量公开数据集Moore数据集和4个自采集数据集上进行了测试,该检测方法的识别精度可以达到92.3%。
Deep learning models sometimes misclassify some unknown categories of data into known categories.These unknown categories of data are defined as out-of-distribution data in some fields,such as biological information,medical care,automatic driving,network security and so on.These mistakes will lead to serious consequences.The identification and classification of network traffic and the out-of-distribution data are briefly introduced,and a method to detect the out-of-distribution data in test samples is proposed.According to the characteristics of out-of-distribution data,the out-of-distribution data can be judged by training two models and calculating the likelihood ratio of the results of the two models.The proposed method is tested on Moore data set and four self-collected data sets.The accuracy of the proposed method can reach 92.3%.
作者
卓子寒
吕欣润
刘立坤
车佳臻
余翔湛
叶麟
张晓慧
ZHUO Zihan;LYU Xinrun;LIU Likun;CHE Jiazhen;YU Xiangzhan;YE Lin;ZHANG Xiaohui(National Computer Network Emergency Response Technical Team/Coordination Center of China,Beijing 100029,China;School of Cyberspace Science,Faculty of Computing,Harbin Institute of Technology,Harbin 150001,China)
出处
《无线电工程》
北大核心
2022年第8期1322-1329,共8页
Radio Engineering
基金
国家自然科学基金面上项目(61872111)。
关键词
深度学习
分布外数据
机器学习
似然比
deep learning
out-of-distribution data
machine learning
likelihood ratio