The explosive growth ofmalware variants poses a major threat to information security. Traditional anti-virus systems based on signatures fail to classify unknown malware into their corresponding families and to detect...The explosive growth ofmalware variants poses a major threat to information security. Traditional anti-virus systems based on signatures fail to classify unknown malware into their corresponding families and to detect new kinds of malware pro- grams. Therefore, we propose a machine learning based malware analysis system, which is composed of three modules: data processing, decision making, and new malware detection. The data processing module deals with gray-scale images, Opcode n-gram, and import fimctions, which are employed to extract the features of the malware. The decision-making module uses the features to classify the malware and to identify suspicious malware. Finally, the detection module uses the shared nearest neighbor (SNN) clustering algorithm to discover new malware families. Our approach is evaluated on more than 20 000 malware instances, which were collected by Kingsoft, ESET NOD32, and Anubis. The results show that our system can effectively classify the un- known malware with a best accuracy of 98.9%, and successfully detects 86.7% of the new malware.展开更多
Mobile device manufacturers are rapidly producing miscellaneous Android versions worldwide. Simultaneously, cyber criminals are executing malicious actions, such as tracking user activities, stealing personal data, an...Mobile device manufacturers are rapidly producing miscellaneous Android versions worldwide. Simultaneously, cyber criminals are executing malicious actions, such as tracking user activities, stealing personal data, and committing bank fraud. These criminals gain numerous benefits as too many people use Android for their daily routines, including important communications. With this in mind, security practitioners have conducted static and dynamic analyses to identify malware. This study used static analysis because of its overall code coverage, low resource consumption, and rapid processing. However, static analysis requires a minimum number of features to efficiently classify malware. Therefore, we used genetic search(GS), which is a search based on a genetic algorithm(GA), to select the features among 106 strings. To evaluate the best features determined by GS, we used five machine learning classifiers, namely, Na?ve Bayes(NB), functional trees(FT), J48, random forest(RF), and multilayer perceptron(MLP). Among these classifiers, FT gave the highest accuracy(95%) and true positive rate(TPR)(96.7%) with the use of only six features.展开更多
基金Project supported by the Natiooal Natural Science Foundation of China (No. 61303264) and the National Basic Research Program (973) of China (Nos. 2012CB315906 and 0800065111001)
文摘The explosive growth ofmalware variants poses a major threat to information security. Traditional anti-virus systems based on signatures fail to classify unknown malware into their corresponding families and to detect new kinds of malware pro- grams. Therefore, we propose a machine learning based malware analysis system, which is composed of three modules: data processing, decision making, and new malware detection. The data processing module deals with gray-scale images, Opcode n-gram, and import fimctions, which are employed to extract the features of the malware. The decision-making module uses the features to classify the malware and to identify suspicious malware. Finally, the detection module uses the shared nearest neighbor (SNN) clustering algorithm to discover new malware families. Our approach is evaluated on more than 20 000 malware instances, which were collected by Kingsoft, ESET NOD32, and Anubis. The results show that our system can effectively classify the un- known malware with a best accuracy of 98.9%, and successfully detects 86.7% of the new malware.
基金supported by the Ministry of Science,Technology and Innovation of Malaysia,under the Grant e Science Fund(No.01-01-03-SF0914)
文摘Mobile device manufacturers are rapidly producing miscellaneous Android versions worldwide. Simultaneously, cyber criminals are executing malicious actions, such as tracking user activities, stealing personal data, and committing bank fraud. These criminals gain numerous benefits as too many people use Android for their daily routines, including important communications. With this in mind, security practitioners have conducted static and dynamic analyses to identify malware. This study used static analysis because of its overall code coverage, low resource consumption, and rapid processing. However, static analysis requires a minimum number of features to efficiently classify malware. Therefore, we used genetic search(GS), which is a search based on a genetic algorithm(GA), to select the features among 106 strings. To evaluate the best features determined by GS, we used five machine learning classifiers, namely, Na?ve Bayes(NB), functional trees(FT), J48, random forest(RF), and multilayer perceptron(MLP). Among these classifiers, FT gave the highest accuracy(95%) and true positive rate(TPR)(96.7%) with the use of only six features.