摘要
程序理解对于诸如遗留系统重构和恶意软件检测等多类场景具有重要作用.移动应用功能分类旨在通过分析目标移动应用的运行时行为来识别其主要功能.由于运行环境的动态性和开发框架的差异性,移动应用行为模式普遍呈现出较高的复杂性,这给移动应用功能分类带来了挑战.本文致力于通过分析移动应用的执行轨迹实现对其功能的自动分类.在形式化定义移动应用功能分类问题的基础上,本文提出了一个系统性的解决方案设计框架RaT(Run-and-Tell)以指导执行轨迹驱动的移动应用功能分类解决方案的设计.在RaT框架的指导下,本文提出了2种分别基于执行轨迹统计特征和语义特征的行为表征方法.然后,将所生成的2类行为表征与4种基于神经网络(即MLP、FCN、ResNet及LSTM)的移动应用功能分类器相结合构造了8种移动应用功能分类解决方案.此外,通过利用程序插桩技术,本文采集了来自Google Play应用商店3类安卓应用类别涵盖13种不同功能的17个安卓应用程序总计876条执行轨迹以构建实验数据集.实验结果表明,采用执行轨迹语义特征行为表征的RaT框架解决方案在实验数据集上达到了73.2%的类间平均分类准确率,其性能明显优于基线方法.
Program comprehension assists in many scenarios such as legacy system re-engineering,malware detection,etc.Mobile app functionality classification aims to realize the functionality of a mobile app by analyzing its runtime behavior.Due to the dynamic runtime environment and various development frameworks,the mobile app behavior pattern usually presents great complexity which brings the challenge for its functionality classification.In this paper,we focus on the analysis of execution traces of mobile apps to facilitate the automatic classification of their functionalities.Based on the formulation of mobile app functionality classification,we proposed a systematic framework named RaT(Run-and-Tell)to guide the design of trace-driven mobile app functionality classification.Guided by RaT,we introduced two behavior representation methods based on statistical characteristics and semantic features extracted from execution traces,respectively.Afterward,by integrating 2 kinds of behavior representations with 4 types of mobile app functionality classifiers based on neural networks(i.e.MLP,FCN,ResNet,and LSTM),8 different solutions are implemented for mobile app functionality classification.Furthermore,by leveraging the program instrumentation technique,we collected 876 execution traces of 17 Android apps of 3 categories covering 13 different functionalities from Google Play to build the dataset for evaluation.Experimental results show that,by integrating semantics-based representations,solutions based on the RaT framework achieve 73.2%inter-category classification accuracy on average on the collected dataset,which significantly outperforms the baselines.
作者
马超
李俊彤
曹建农
蔡华谦
吴黎兵
石小川
MA Chao;LI Chun-Tung;CAO Jian-Nong;CAI Hua-Qian;WU Li-Bing;SHI Xiao-Chuan(School of Cyber Science and Engineering,Wuhan University,Wuhan 430072;Shenzhen Research Institute,The Hong Kong Polytechnic University,Shenzhen,Guangdong 518057;Department of Computing,The Hong Kong Polytechnic University,Hong Kong 000000;School of Electronics Engineering and Computer Science,Peking University,Beijing 100871)
出处
《计算机学报》
EI
CAS
CSCD
北大核心
2022年第9期1997-2013,共17页
Chinese Journal of Computers
基金
湖北省重点研发计划(2021BAA039)
广东省重点领域研发计划(2020B010164002)资助.
关键词
程序理解
移动应用功能分类
执行轨迹
行为表征
神经网络
program comprehension
mobile app functionality classification
execution trace
behavior representation
neural network