Mobile device manufacturers are rapidly producing miscellaneous Android versions worldwide. Simultaneously, cyber criminals are executing malicious actions, such as tracking user activities, stealing personal data, an...Mobile device manufacturers are rapidly producing miscellaneous Android versions worldwide. Simultaneously, cyber criminals are executing malicious actions, such as tracking user activities, stealing personal data, and committing bank fraud. These criminals gain numerous benefits as too many people use Android for their daily routines, including important communications. With this in mind, security practitioners have conducted static and dynamic analyses to identify malware. This study used static analysis because of its overall code coverage, low resource consumption, and rapid processing. However, static analysis requires a minimum number of features to efficiently classify malware. Therefore, we used genetic search(GS), which is a search based on a genetic algorithm(GA), to select the features among 106 strings. To evaluate the best features determined by GS, we used five machine learning classifiers, namely, Na?ve Bayes(NB), functional trees(FT), J48, random forest(RF), and multilayer perceptron(MLP). Among these classifiers, FT gave the highest accuracy(95%) and true positive rate(TPR)(96.7%) with the use of only six features.展开更多
基金supported by the Ministry of Science,Technology and Innovation of Malaysia,under the Grant e Science Fund(No.01-01-03-SF0914)
文摘Mobile device manufacturers are rapidly producing miscellaneous Android versions worldwide. Simultaneously, cyber criminals are executing malicious actions, such as tracking user activities, stealing personal data, and committing bank fraud. These criminals gain numerous benefits as too many people use Android for their daily routines, including important communications. With this in mind, security practitioners have conducted static and dynamic analyses to identify malware. This study used static analysis because of its overall code coverage, low resource consumption, and rapid processing. However, static analysis requires a minimum number of features to efficiently classify malware. Therefore, we used genetic search(GS), which is a search based on a genetic algorithm(GA), to select the features among 106 strings. To evaluate the best features determined by GS, we used five machine learning classifiers, namely, Na?ve Bayes(NB), functional trees(FT), J48, random forest(RF), and multilayer perceptron(MLP). Among these classifiers, FT gave the highest accuracy(95%) and true positive rate(TPR)(96.7%) with the use of only six features.