Abstract
1- Introduction
2- Related works
3- Background on feature selection methods
4- Proposed methodology
5-PROMISE datasets for software fault prediction
6- Performance measure
7- Experimental results
8- Result analysis
9- Conclusion and future work
References
Abstract
Software fault prediction (SFP) is typically used to predict faults in software components. Machine learning techniques (e.g., classification) are widely used to tackle this problem. With the availability of the huge amount of data that can be obtained from mining software historical repositories, it becomes possible to have some features (metrics) that are not correlated with the faults, which consequently mislead the learning algorithm and thus decrease its performance. One possible solution to eliminate those metrics is Feature Selection (FS). In this paper, a novel FS approach is proposed to enhance the performance of a layered recurrent neural network (L-RNN), which is used as a classification technique for the SFP problem. Three different wrapper FS algorithms (i.e, Binary Genetic Algorithm (BGA), Binary Particle Swarm Optimization (BPSO), and Binary Ant Colony Optimization (BACO)) were employed iteratively. To assess the performance of the proposed approach, 19 real-world software projects from PROMISE repository are investigated and the experimental results are discussed. Receiver operating characteristic - area under the curve (ROC-AUC) is used as a performance measure. The results are compared with other state-of-art approaches including Naïve Bayes (NB), Artificial Neural Network (ANN), logistic regression (LR), the k-nearest neighbors (k-NN) and C4.5 decision trees, in terms of area under the curve (AUC). Our results have demonstrated that the proposed approach can outperform other existing methods.
Introduction
Software Fault Prediction (SFP) is the process of predicting the fault-prone modules for the future releases of software versions being developed, depending on predefined software metrics or historical fault datasets (from previous projects) (Catal, 2011; Porter & Selby, 1990). The SFP process becomes easier with the adoption of the Agile Software Development (ASD) (Fowler & Highsmith, 2001) methodologies (e.g., Agile Unified Process, Extreme Programming, Scrum and Kanban) rather than the traditional methodologies (e.g., waterfall model (Royce, 1987), software development (Hoda, Salleh, Grundy, & Tee, 2017; Stavru, 2014)). In ASD the incremental delivery of the software opens the door for rapidly adapting the volatile requirements, and increasing the opportunities for collaboration between business owners and software developers (Hoda et al., 2017). Moreover, adopting ASD methodologies allows conducting software engineering activities (maintenance, review, refactoring or testing) synchronously with the development process. Predicting faults in software subsystems (modules, components, classes, etc.) in the earlier stages (before delivering them to the end user), plays a vital role in reducing the time and effort costs required to accomplish the project, since it reduces the number of modules to be processed in each activity, and eliminates the unnecessary efforts in finding faults during the development process. The importance of SFP comes from the fact that delivering a software version with some faults will affect the subsequent versions. This is because there is a distinct relation between the different versions of the software products.