Abstract
1. Introduction
2. Literature review
3. Research methodology
4. Experimental analysis and results
5. Conclusion and future work
Author Contribution
Declaration of Competing Interest
References
Abstract
This research introduces the Recursive General Regression Neural Network Oracle (R-GRNN Oracle) and is demonstrated on several binary classification datasets. The traditional GRNN Oracle classifier (Masters et al., 1998) combines the predictive powers of several machine learning classifiers by weighing the amount of error each classifier has on the final predictions. Each classifier is assigned a weight based on the percentage of errors it contributes to the final predictions as the classifiers evaluate the dataset. The proposed R-GRNN Oracle is an enhancement to the GRNN Oracle in which the proposed algorithm consists of an oracle within an oracle – where the inner oracle acts as a classifier with its own predictions and error contribution. By combining the inner oracle with other classifiers, the R-GRNN Oracle produces superior results. The classifiers considered in this study are: Support Vector Machine (SVM), Multilayer Perceptron (MLP), Probabilistic Neural Network (PNN), Gaussian Naïve Bayes (GNB), K-Nearest Neighbor (KNN), and Random Forest (RF). To demonstrate the effectiveness of the proposed approach, several datasets were used, with the primary one being the publicly available Spambase dataset. The predictions of SVM, MLP, KNN, and RF were used to create the first GRNN Oracle, which was then enhanced with the high performances of SVM and RF to create the second oracle, the R-GRNN Oracle. The combined recursive model was 93.24% accurate using 10-fold cross validation, higher than the 91.94% of the inner GRNN Oracle and the 91.29% achieved by RF, the highest performance by a stand-alone classifier. The R-GRNN Oracle was not only the most accurate, but it also had the highest AUC, sensitivity, specificity, precision, and F1-score (97.99%, 91.86%, 94.40%, 93.28%, and 92.57%, respectively). The research contribution of this paper is introducing the concept of recursion (a concept not fully explored in machine learning models and applications) and testing this structure’s ability on further enhancing the performance of the traditional oracle. The recursive model has also been applied to several other datasets: The Human Resources, Bank Marketing, and Monoclonal Gammopathy of Undetermined Significance (MGUS) datasets. The results of these implementations are summarized in this paper.
Introduction
The world today produces huge and complex amounts of data every second. In this new era of big data, advanced analytic methods can extract valuable information, patterns, trends, and associations to provide meaningful insights. Processing such data manually would be impractical if not impossible, therefore, the need to automate such processes is needed. Tasks too complex for humans to code and process directly require machine learning. Machine learning helps analyze big data by focusing on designing algorithms that can learn patterns in the data to make predictions. It is a branch of artificial intelligence that teaches machines how to learn from experiences and adapt. Successful data mining requires effective machine learning techniques. Data mining is defined as the process of discovering properties and extracting valuable information from large, incomplete, and noisy raw data that is stored in databases, data warehouses, or other information repositories. In data mining, the data is stored electronically and is processed through computers (Witten, Frank, Hall, & Pal, 2016). It is about solving problems by analyzing data already present in databases, where some of its tasks include association rule learning, clustering, classification, and regression (Esfandiari, Babavalian, Moghadam, & Tabar, 2014). It is applied to various disciplines and industries such as manufacturing, customer relationship management, fraud detection, banking, marketing, and healthcare. The General Regression Neural Network Oracle (GRNN Oracle), developed by Masters et al. in 1998, combines the predictions of individually trained classifiers and outputs one superior prediction.