The existing software bug localization models treat the source file as natural language, which leads to the loss of syntactical and structure information of the source file. A bug localization model based on syntactic...The existing software bug localization models treat the source file as natural language, which leads to the loss of syntactical and structure information of the source file. A bug localization model based on syntactical and semantic information of source code is proposed. Firstly, abstract syntax tree(AST) is divided based on node category to obtain statement sequence. The statement tree is encoded into vectors to capture lexical and syntactical knowledge at the statement level.Secondly, the source code is transformed into vector representation by the sequence naturalness of the statement. Therefore,the problem of gradient vanishing and explosion caused by a large AST size is obviated when using AST to the represent source code. Finally, the correlation between bug reports and source files are comprehensively analyzed from three aspects of syntax, semantics and text to locate the buggy code. Experiments show that compared with other standard models, the proposed model improves the performance of bug localization, and it has good advantages in mean reciprocal rank(MRR), mean average precision(MAP) and Top N Rank.展开更多
For projects with thousands of files, finding the locations of bugs is time-consuming and labor-intensive. Bug reports as a potential resource to help locate bugs in source codes have been used to design automatic too...For projects with thousands of files, finding the locations of bugs is time-consuming and labor-intensive. Bug reports as a potential resource to help locate bugs in source codes have been used to design automatic tools to solve this problem. Existing information retrieval(IR)-based bug localization methods rely heavily on the similarity score between bug report and historical reports. As deep learning methods show great advantages in calculating text semantic similarity, we adapt the transformer network with IR-based bug localization methods to design a novel approach, TSLocator, to bug localization. In TSLocator, we propose five new features between bug reports and source codes. We use SVMRank to model the relation between all the six features and the actual buggy file. Given a new bug report, TSLocator automatically calculates the features and linearly weights the features to produce a suspicious score for all candidate files. TSLocator recommends a list of suspicious buggy files ranked by the score. The experimental results show that TSLocator outperforms existing methods in accuracy and performance of bug localization.展开更多
基金supported by the National Key R&D Program of China (2018YFB1702700)。
文摘The existing software bug localization models treat the source file as natural language, which leads to the loss of syntactical and structure information of the source file. A bug localization model based on syntactical and semantic information of source code is proposed. Firstly, abstract syntax tree(AST) is divided based on node category to obtain statement sequence. The statement tree is encoded into vectors to capture lexical and syntactical knowledge at the statement level.Secondly, the source code is transformed into vector representation by the sequence naturalness of the statement. Therefore,the problem of gradient vanishing and explosion caused by a large AST size is obviated when using AST to the represent source code. Finally, the correlation between bug reports and source files are comprehensively analyzed from three aspects of syntax, semantics and text to locate the buggy code. Experiments show that compared with other standard models, the proposed model improves the performance of bug localization, and it has good advantages in mean reciprocal rank(MRR), mean average precision(MAP) and Top N Rank.
文摘For projects with thousands of files, finding the locations of bugs is time-consuming and labor-intensive. Bug reports as a potential resource to help locate bugs in source codes have been used to design automatic tools to solve this problem. Existing information retrieval(IR)-based bug localization methods rely heavily on the similarity score between bug report and historical reports. As deep learning methods show great advantages in calculating text semantic similarity, we adapt the transformer network with IR-based bug localization methods to design a novel approach, TSLocator, to bug localization. In TSLocator, we propose five new features between bug reports and source codes. We use SVMRank to model the relation between all the six features and the actual buggy file. Given a new bug report, TSLocator automatically calculates the features and linearly weights the features to produce a suspicious score for all candidate files. TSLocator recommends a list of suspicious buggy files ranked by the score. The experimental results show that TSLocator outperforms existing methods in accuracy and performance of bug localization.