Software systems have been employed in many fields as a means to reduce human efforts;consequently,stakeholders are interested in more updates of their capabilities.Code smells arise as one of the obstacles in the sof...Software systems have been employed in many fields as a means to reduce human efforts;consequently,stakeholders are interested in more updates of their capabilities.Code smells arise as one of the obstacles in the software industry.They are characteristics of software source code that indicate a deeper problem in design.These smells appear not only in the design but also in software implementation.Code smells introduce bugs,affect software maintainability,and lead to higher maintenance costs.Uncovering code smells can be formulated as an optimization problem of finding the best detection rules.Although researchers have recommended different techniques to improve the accuracy of code smell detection,these methods are still unstable and need to be improved.Previous research has sought only to discover a few at a time(three or five types)and did not set rules for detecting their types.Our research improves code smell detection by applying a search-based technique;we use the Whale Optimization Algorithm as a classifier to find ideal detection rules.Applying this algorithm,the Fisher criterion is utilized as a fitness function to maximize the between-class distance over the withinclass variance.The proposed framework adopts if-then detection rules during the software development life cycle.Those rules identify the types for both medium and large projects.Experiments are conducted on five open-source software projects to discover nine smell types that mostly appear in codes.The proposed detection framework has an average of 94.24%precision and 93.4%recall.These accurate values are better than other search-based algorithms of the same field.The proposed framework improves code smell detection,which increases software quality while minimizing maintenance effort,time,and cost.Additionally,the resulting classification rules are analyzed to find the software metrics that differentiate the nine code smells.展开更多
Code smell detection is essential to improve software quality, enhancing software maintainability, and decrease the risk of faults and failures in the software system. In this paper, we proposed a code smell predictio...Code smell detection is essential to improve software quality, enhancing software maintainability, and decrease the risk of faults and failures in the software system. In this paper, we proposed a code smell prediction approach based on machine learning techniques and software metrics. The local interpretable model-agnostic explanations (LIME) algorithm was further used to explain the machine learning model's predictions and interpretability. The datasets obtained from Fontana et al. were reformed and used to build binary-label and multi-label datasets. The results of 10-fold cross-validation show that the performance of tree-based algorithms (mainly Random Forest) is higher compared with kernel-based and network-based algorithms. The genetic algorithm based feature selection methods enhance the accuracy of these machine learning algorithms by selecting the most relevant features in each dataset. Moreover, the parameter optimization techniques based on the grid search algorithm significantly enhance the accuracy of all these algorithms. Finally, machine learning techniques have high potential in predicting the code smells, which contribute to detect these smells and enhance the software's quality.展开更多
文摘Software systems have been employed in many fields as a means to reduce human efforts;consequently,stakeholders are interested in more updates of their capabilities.Code smells arise as one of the obstacles in the software industry.They are characteristics of software source code that indicate a deeper problem in design.These smells appear not only in the design but also in software implementation.Code smells introduce bugs,affect software maintainability,and lead to higher maintenance costs.Uncovering code smells can be formulated as an optimization problem of finding the best detection rules.Although researchers have recommended different techniques to improve the accuracy of code smell detection,these methods are still unstable and need to be improved.Previous research has sought only to discover a few at a time(three or five types)and did not set rules for detecting their types.Our research improves code smell detection by applying a search-based technique;we use the Whale Optimization Algorithm as a classifier to find ideal detection rules.Applying this algorithm,the Fisher criterion is utilized as a fitness function to maximize the between-class distance over the withinclass variance.The proposed framework adopts if-then detection rules during the software development life cycle.Those rules identify the types for both medium and large projects.Experiments are conducted on five open-source software projects to discover nine smell types that mostly appear in codes.The proposed detection framework has an average of 94.24%precision and 93.4%recall.These accurate values are better than other search-based algorithms of the same field.The proposed framework improves code smell detection,which increases software quality while minimizing maintenance effort,time,and cost.Additionally,the resulting classification rules are analyzed to find the software metrics that differentiate the nine code smells.
文摘Code smell detection is essential to improve software quality, enhancing software maintainability, and decrease the risk of faults and failures in the software system. In this paper, we proposed a code smell prediction approach based on machine learning techniques and software metrics. The local interpretable model-agnostic explanations (LIME) algorithm was further used to explain the machine learning model's predictions and interpretability. The datasets obtained from Fontana et al. were reformed and used to build binary-label and multi-label datasets. The results of 10-fold cross-validation show that the performance of tree-based algorithms (mainly Random Forest) is higher compared with kernel-based and network-based algorithms. The genetic algorithm based feature selection methods enhance the accuracy of these machine learning algorithms by selecting the most relevant features in each dataset. Moreover, the parameter optimization techniques based on the grid search algorithm significantly enhance the accuracy of all these algorithms. Finally, machine learning techniques have high potential in predicting the code smells, which contribute to detect these smells and enhance the software's quality.