Malicious webshells currently present tremendous threats to cloud security.Most relevant studies and open webshell datasets consider malicious webshell defense as a binary classification problem,that is,identifying wh...Malicious webshells currently present tremendous threats to cloud security.Most relevant studies and open webshell datasets consider malicious webshell defense as a binary classification problem,that is,identifying whether a webshell is malicious or benign.However,a fine-grained multi-classification is urgently needed to enable precise responses and active defenses on malicious webshell threats.This paper introduces a malicious webshell family dataset named MWF to facilitate webshell multiclassification researches.This dataset contains 1359 malicious webshell samples originally obtained from the cloud servers of Alibaba Cloud.Each of them is provided with a family label.The samples of the same family generally present similar characteristics or behaviors.The dataset has a total of 78 families and 22 outliers.Moreover,this paper introduces the human–machine collaboration process that is adopted to remove benign or duplicate samples,address privacy issues,and determine the family of each sample.This paper also compares the distinguished features of the MWF dataset with previous datasets and summarizes the potential applied areas in cloud security and generalized sequence,graph,and tree data analytics and visualization.展开更多
基金the National Natural Science Foundation of China(No.62272480 and 62072470).
文摘Malicious webshells currently present tremendous threats to cloud security.Most relevant studies and open webshell datasets consider malicious webshell defense as a binary classification problem,that is,identifying whether a webshell is malicious or benign.However,a fine-grained multi-classification is urgently needed to enable precise responses and active defenses on malicious webshell threats.This paper introduces a malicious webshell family dataset named MWF to facilitate webshell multiclassification researches.This dataset contains 1359 malicious webshell samples originally obtained from the cloud servers of Alibaba Cloud.Each of them is provided with a family label.The samples of the same family generally present similar characteristics or behaviors.The dataset has a total of 78 families and 22 outliers.Moreover,this paper introduces the human–machine collaboration process that is adopted to remove benign or duplicate samples,address privacy issues,and determine the family of each sample.This paper also compares the distinguished features of the MWF dataset with previous datasets and summarizes the potential applied areas in cloud security and generalized sequence,graph,and tree data analytics and visualization.