摘要
随着机器学习在社会各领域中自主决策场景的广泛应用,人们对机器学习框架中潜在漏洞的担忧也在日益增加.然而,由于其复杂的实现,针对框架的系统化、自动化测试成为一项艰巨的任务.现有对机器学习框架测试的研究在生成有效测试数据方面尚不成熟,导致测试数据无法通过合法性校验并因此无法检测到目标漏洞.本文提出了ConFL,一种基于约束的机器学习框架模糊测试工具.ConFL能够自动从框架源代码中提取约束而无需任何先验知识.在约束的指导下,ConFL可以生成能够通过校验的有效输入,并执行到框架更深层次的代码逻辑.此外,本文设计了一种算子分组调度技术来提高模糊测试的效率.为了证明ConFL的有效性,本文主要在Tensor-Flow框架上评估了其性能.测试发现,与现有的SOTA工具相比,ConFL能够覆盖更多的代码行,并生成更多有效的测试数据;在相同版本的TensorFlow框架上,ConFL能检测出更多的已知漏洞.此外,ConFL在不同版本的TensorFlow中发现了84个未知漏洞,这些漏洞全部被官方修复并被分配了CVE编号,其中包括3个严重漏洞,13个高危漏洞.最后,本文还在PyTorch和PaddlePaddle中进行了通用性测试,迄今为止发现了7个漏洞.
The increasing integration of machine learning(ML)in various sectors for decision-making automation brings to light significant concerns regarding the vulnerabilities in ML frame-works.Such vulnerabilities pose a considerable risk,potentially undermining the integrity and reliability of ML applications in critical areas.Testing these frameworks,however,is notably challenging due to their complex implementations.The intricacy of these systems often masks vulnerabilities,making them difficult to detect with conventional methods.Historically,fuzzing ML frameworks has been met with limited success.The primary challenge in this area has been the effective extraction of input constraints and the generation of valid inputs.Traditional approa-ches often result in prolonged fuzzing periods,which are not only inefficient but also insufficient in reaching the deeper,more complex execution paths where critical vulnerabilities might lie.In response to these challenges,our paper introduces ConFL(Constraint Fuzzy Lop),a novel,con-straint-guided fuzzer designed specifically for ML frameworks.ConFL marks a significant advancement in the field of ML framework testing.Its ability to automatically extract constraints from source codes is a groundbreaking feature.This automation is particularly beneficial as it eliminates the need for prior knowledge of the framework's inner workings,thus democratizing the testing process.The constraint-guided approach of ConFL is instrumental in generating valid inputs that are more likely to pass through the initial layers of verification in ML frameworks.This capability enables ConFL to delve deeper into the operator code's pathways,thus uncove-ring vulnerabilities that would otherwise remain hidden in traditional testing methods.Moreover,ConFL innovates with a unique grouping technique designed to enhance fuzzing efficiency.This technique organizes the testing process in a more structured manner,allowing for a more thor-ough and systematic exploration of the framework's vulnerabilities.Our evaluation of ConFL's performance,primarily on the TensorFlow framework,has yielded impressive results.ConFL demonstrates a superior capability in covering more code lines and generating a greater number of valid inputs compared to state-of-the-art(SOTA)fuzzers.This increased efficiency is crucial in the practical application of fuzzing in ML frameworks,as it translates to more robust and secure ML applications.In the realm of known vulnerabilities within the TensorFlow framework,Con-FL has shown exceptional prowess.It has successfully detected a larger number of vulnerabilities than existing fuzzers.But perhaps more importantly,ConFL has identified 84 previously unknown vulnerabilities across various versions of TensorFlow.These newly discovered vulnera-bilities,which include 3 of critical severity and 13 of high severity,have been significant enough to warrant new CVE(Common Vulnerabilities and Exposures)ids.The versatility of ConFL is further demonstrated by its application to other ML frameworks such as PyTorch and Paddle.In these frameworks,ConFL has already identified 7 vulnerabilities,indicating its potential as a universal tool for ML framework testing.In conclusion,ConFL represents a significant step forward in securing ML frameworks.Its automated,constraint-guided approach not only makes the fuzzing process more efficient but also more effective in uncovering deep-seated vulnerabili-ties.As ML continues to permeate various sectors,tools like ConFL will be vital in ensuring the security and reliability of ML-driven systems.
作者
刘昭
邹权臣
于恬
王旋
张德岳
孟国柱
陈恺
LIU Zhao;ZOU Quan-Chen;YU Tian;WANG Xuan;ZHANG De-Yue;MENG Guo-Zhu;CHEN Kai(AI Security Lab,Qihoo 360,Beijing 100015;Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100195)
出处
《计算机学报》
EI
CAS
CSCD
北大核心
2024年第5期1120-1137,共18页
Chinese Journal of Computers
基金
国家科技创新2030--“新一代人工智能”重大项目(2020AAA0104300)资助.
关键词
机器学习框架
约束提取
算子测试
模糊测试
漏洞检测
machine learning framework
constraints extraction
operator testing
fuzzing
vul-nerability detection