Code smell detection is essential to improve software quality, enhancing software maintainability, and decrease the risk of faults and failures in the software system. In this paper, we proposed a code smell predictio...Code smell detection is essential to improve software quality, enhancing software maintainability, and decrease the risk of faults and failures in the software system. In this paper, we proposed a code smell prediction approach based on machine learning techniques and software metrics. The local interpretable model-agnostic explanations (LIME) algorithm was further used to explain the machine learning model's predictions and interpretability. The datasets obtained from Fontana et al. were reformed and used to build binary-label and multi-label datasets. The results of 10-fold cross-validation show that the performance of tree-based algorithms (mainly Random Forest) is higher compared with kernel-based and network-based algorithms. The genetic algorithm based feature selection methods enhance the accuracy of these machine learning algorithms by selecting the most relevant features in each dataset. Moreover, the parameter optimization techniques based on the grid search algorithm significantly enhance the accuracy of all these algorithms. Finally, machine learning techniques have high potential in predicting the code smells, which contribute to detect these smells and enhance the software's quality.展开更多
Software systems have been employed in many fields as a means to reduce human efforts;consequently,stakeholders are interested in more updates of their capabilities.Code smells arise as one of the obstacles in the sof...Software systems have been employed in many fields as a means to reduce human efforts;consequently,stakeholders are interested in more updates of their capabilities.Code smells arise as one of the obstacles in the software industry.They are characteristics of software source code that indicate a deeper problem in design.These smells appear not only in the design but also in software implementation.Code smells introduce bugs,affect software maintainability,and lead to higher maintenance costs.Uncovering code smells can be formulated as an optimization problem of finding the best detection rules.Although researchers have recommended different techniques to improve the accuracy of code smell detection,these methods are still unstable and need to be improved.Previous research has sought only to discover a few at a time(three or five types)and did not set rules for detecting their types.Our research improves code smell detection by applying a search-based technique;we use the Whale Optimization Algorithm as a classifier to find ideal detection rules.Applying this algorithm,the Fisher criterion is utilized as a fitness function to maximize the between-class distance over the withinclass variance.The proposed framework adopts if-then detection rules during the software development life cycle.Those rules identify the types for both medium and large projects.Experiments are conducted on five open-source software projects to discover nine smell types that mostly appear in codes.The proposed detection framework has an average of 94.24%precision and 93.4%recall.These accurate values are better than other search-based algorithms of the same field.The proposed framework improves code smell detection,which increases software quality while minimizing maintenance effort,time,and cost.Additionally,the resulting classification rules are analyzed to find the software metrics that differentiate the nine code smells.展开更多
Code smell is the product of improper design and operation,which may be introduced in many situations.It will cause serious problems for further software development and maintenance.Currently,most code smell detection...Code smell is the product of improper design and operation,which may be introduced in many situations.It will cause serious problems for further software development and maintenance.Currently,most code smell detection methods detect through a single type of software data.There are restrictions on detecting code smells with complex definitions and characteristics.In this paper,an approach of applying multi-dimensional software data is proposed.A complex network was built through structural data and historical version data,and code smell instances were determined by searching the network.We designed two smells detection strategies were designed and evaluated them in four open source projects.The results demonstrate that the proposed method has 23%and 15%higher F-measures on Shotgun Surgery and Parallel Inheritance Hierarchy than the existing mainstream detection ways.The code smell detection based on multi-dimensional software data and complex network is effective,and this method of processing multidimensional software data is also applicable for data-driven software research.展开更多
文摘Code smell detection is essential to improve software quality, enhancing software maintainability, and decrease the risk of faults and failures in the software system. In this paper, we proposed a code smell prediction approach based on machine learning techniques and software metrics. The local interpretable model-agnostic explanations (LIME) algorithm was further used to explain the machine learning model's predictions and interpretability. The datasets obtained from Fontana et al. were reformed and used to build binary-label and multi-label datasets. The results of 10-fold cross-validation show that the performance of tree-based algorithms (mainly Random Forest) is higher compared with kernel-based and network-based algorithms. The genetic algorithm based feature selection methods enhance the accuracy of these machine learning algorithms by selecting the most relevant features in each dataset. Moreover, the parameter optimization techniques based on the grid search algorithm significantly enhance the accuracy of all these algorithms. Finally, machine learning techniques have high potential in predicting the code smells, which contribute to detect these smells and enhance the software's quality.
文摘Software systems have been employed in many fields as a means to reduce human efforts;consequently,stakeholders are interested in more updates of their capabilities.Code smells arise as one of the obstacles in the software industry.They are characteristics of software source code that indicate a deeper problem in design.These smells appear not only in the design but also in software implementation.Code smells introduce bugs,affect software maintainability,and lead to higher maintenance costs.Uncovering code smells can be formulated as an optimization problem of finding the best detection rules.Although researchers have recommended different techniques to improve the accuracy of code smell detection,these methods are still unstable and need to be improved.Previous research has sought only to discover a few at a time(three or five types)and did not set rules for detecting their types.Our research improves code smell detection by applying a search-based technique;we use the Whale Optimization Algorithm as a classifier to find ideal detection rules.Applying this algorithm,the Fisher criterion is utilized as a fitness function to maximize the between-class distance over the withinclass variance.The proposed framework adopts if-then detection rules during the software development life cycle.Those rules identify the types for both medium and large projects.Experiments are conducted on five open-source software projects to discover nine smell types that mostly appear in codes.The proposed detection framework has an average of 94.24%precision and 93.4%recall.These accurate values are better than other search-based algorithms of the same field.The proposed framework improves code smell detection,which increases software quality while minimizing maintenance effort,time,and cost.Additionally,the resulting classification rules are analyzed to find the software metrics that differentiate the nine code smells.
基金Anhui Provincial Natural Science Foundation(2008085MF189,1908085MF206)National Natural Science Foundation of China(NO.61402007)the Scientific Research Foundation for the Returned Overseas Chinese Scholars,State Education Ministry.
文摘Code smell is the product of improper design and operation,which may be introduced in many situations.It will cause serious problems for further software development and maintenance.Currently,most code smell detection methods detect through a single type of software data.There are restrictions on detecting code smells with complex definitions and characteristics.In this paper,an approach of applying multi-dimensional software data is proposed.A complex network was built through structural data and historical version data,and code smell instances were determined by searching the network.We designed two smells detection strategies were designed and evaluated them in four open source projects.The results demonstrate that the proposed method has 23%and 15%higher F-measures on Shotgun Surgery and Parallel Inheritance Hierarchy than the existing mainstream detection ways.The code smell detection based on multi-dimensional software data and complex network is effective,and this method of processing multidimensional software data is also applicable for data-driven software research.