Mutation-based greybox fuzzing has been one of the most prevalent techniques for security vulnerability discovery and a great deal of research work has been proposed to improve both its efficiency and effectiveness.Mu...Mutation-based greybox fuzzing has been one of the most prevalent techniques for security vulnerability discovery and a great deal of research work has been proposed to improve both its efficiency and effectiveness.Mutation-based greybox fuzzing generates input cases by mutating the input seed,i.e.,applying a sequence of mutation operators to randomly selected mutation positions of the seed.However,existing fruitful research work focuses on scheduling mutation operators,leaving the schedule of mutation positions as an overlooked aspect of fuzzing efficiency.This paper proposes a novel greybox fuzzing method,PosFuzz,that statistically schedules mutation positions based on their historical performance.PosFuzz makes use of a concept of effective position distribution to represent the semantics of the input and to guide the mutations.PosFuzz first utilizes Good-Turing frequency estimation to calculate an effective position distribution for each mutation operator.It then leverages two sampling methods in different mutating stages to select the positions from the distribution.We have implemented PosFuzz on top of AFL,AFLFast and MOPT,called Pos-AFL,-AFLFast and-MOPT respectively,and evaluated them on the UNIFUZZ benchmark(20 widely used open source programs)and LAVA-M dataset.The result shows that,under the same testing time budget,the Pos-AFL,-AFLFast and-MOPT outperform their counterparts in code coverage and vulnerability discovery ability.Compared with AFL,AFLFast,and MOPT,PosFuzz gets 21%more edge coverage and finds 133%more paths on average.It also triggers 275%more unique bugs on average.展开更多
Security vulnerability is one of the root causes of cyber-security threats.To discover vulnerabilities and fix them in advance,researchers have proposed several techniques,among which fuzzing is the most widely used o...Security vulnerability is one of the root causes of cyber-security threats.To discover vulnerabilities and fix them in advance,researchers have proposed several techniques,among which fuzzing is the most widely used one.In recent years,fuzzing solutions,like AFL,have made great improvements in vulnerability discovery.This paper presents a summary of the recent advances,analyzes how they improve the fuzzing process,and sheds light on future work in fuzzing.Firstly,we discuss the reason why fuzzing is popular,by comparing different commonly used vulnerability discovery techniques.Then we present an overview of fuzzing solutions,and discuss in detail one of the most popular type of fuzzing,i.e.,coverage-based fuzzing.Then we present other techniques that could make fuzzing process smarter and more efficient.Finally,we show some applications of fuzzing,and discuss new trends of fuzzing and potential future directions.展开更多
Tackling binary program analysis problems has traditionally implied manually defining rules and heuristics,a tedious and time consuming task for human analysts.In order to improve automation and scalability,we propose...Tackling binary program analysis problems has traditionally implied manually defining rules and heuristics,a tedious and time consuming task for human analysts.In order to improve automation and scalability,we propose an alternative direction based on distributed representations of binary programs with applicability to a number of downstream tasks.We introduce Bin2vec,a new approach leveraging Graph Convolutional Networks(GCN)along with computational program graphs in order to learn a high dimensional representation of binary executable programs.We demonstrate the versatility of this approach by using our representations to solve two semantically different binary analysis tasks–functional algorithm classification and vulnerability discovery.We compare the proposed approach to our own strong baseline as well as published results,and demonstrate improvement over state-of-the-art methods for both tasks.We evaluated Bin2vec on 49191 binaries for the functional algorithm classification task,and on 30 different CWE-IDs including at least 100 CVE entries each for the vulnerability discovery task.We set a new state-of-the-art result by reducing the classification error by 40%compared to the source-code based inst2vec approach,while working on binary code.For almost every vulnerability class in our dataset,our prediction accuracy is over 80%(and over 90%in multiple classes).展开更多
基金This research was supported by National Key R&D Program of China(2022YFB3103900)National Natural Science Foundation of China(62032010,62202462)Strategic Priority Research Program of the CAS(XDC02030200).
文摘Mutation-based greybox fuzzing has been one of the most prevalent techniques for security vulnerability discovery and a great deal of research work has been proposed to improve both its efficiency and effectiveness.Mutation-based greybox fuzzing generates input cases by mutating the input seed,i.e.,applying a sequence of mutation operators to randomly selected mutation positions of the seed.However,existing fruitful research work focuses on scheduling mutation operators,leaving the schedule of mutation positions as an overlooked aspect of fuzzing efficiency.This paper proposes a novel greybox fuzzing method,PosFuzz,that statistically schedules mutation positions based on their historical performance.PosFuzz makes use of a concept of effective position distribution to represent the semantics of the input and to guide the mutations.PosFuzz first utilizes Good-Turing frequency estimation to calculate an effective position distribution for each mutation operator.It then leverages two sampling methods in different mutating stages to select the positions from the distribution.We have implemented PosFuzz on top of AFL,AFLFast and MOPT,called Pos-AFL,-AFLFast and-MOPT respectively,and evaluated them on the UNIFUZZ benchmark(20 widely used open source programs)and LAVA-M dataset.The result shows that,under the same testing time budget,the Pos-AFL,-AFLFast and-MOPT outperform their counterparts in code coverage and vulnerability discovery ability.Compared with AFL,AFLFast,and MOPT,PosFuzz gets 21%more edge coverage and finds 133%more paths on average.It also triggers 275%more unique bugs on average.
基金supported in part by the National Natural Science Foundation of China(Grant No.6177230861472209,and U1736209)Young Elite Scientists Spon-sorship Program by CAST(Grant No.2016QNRC001)award from Tsinghua Information Science And Technology National Laboratory.
文摘Security vulnerability is one of the root causes of cyber-security threats.To discover vulnerabilities and fix them in advance,researchers have proposed several techniques,among which fuzzing is the most widely used one.In recent years,fuzzing solutions,like AFL,have made great improvements in vulnerability discovery.This paper presents a summary of the recent advances,analyzes how they improve the fuzzing process,and sheds light on future work in fuzzing.Firstly,we discuss the reason why fuzzing is popular,by comparing different commonly used vulnerability discovery techniques.Then we present an overview of fuzzing solutions,and discuss in detail one of the most popular type of fuzzing,i.e.,coverage-based fuzzing.Then we present other techniques that could make fuzzing process smarter and more efficient.Finally,we show some applications of fuzzing,and discuss new trends of fuzzing and potential future directions.
文摘Tackling binary program analysis problems has traditionally implied manually defining rules and heuristics,a tedious and time consuming task for human analysts.In order to improve automation and scalability,we propose an alternative direction based on distributed representations of binary programs with applicability to a number of downstream tasks.We introduce Bin2vec,a new approach leveraging Graph Convolutional Networks(GCN)along with computational program graphs in order to learn a high dimensional representation of binary executable programs.We demonstrate the versatility of this approach by using our representations to solve two semantically different binary analysis tasks–functional algorithm classification and vulnerability discovery.We compare the proposed approach to our own strong baseline as well as published results,and demonstrate improvement over state-of-the-art methods for both tasks.We evaluated Bin2vec on 49191 binaries for the functional algorithm classification task,and on 30 different CWE-IDs including at least 100 CVE entries each for the vulnerability discovery task.We set a new state-of-the-art result by reducing the classification error by 40%compared to the source-code based inst2vec approach,while working on binary code.For almost every vulnerability class in our dataset,our prediction accuracy is over 80%(and over 90%in multiple classes).