Abstract
1- Introduction
2- Related work
3- Optimization by continuation
4- Results and discussion
5- Conclusions and recommendations
References
Abstract
Artificial Neural Networks research field is among the areas of major activity in Artificial Intelligence. Training a neural network is an NP-hard optimization problem that presents several theoretical and computational limitations. In optimization, continuation refers to an homotopy transformation of the fitness function that is used to obtain simpler versions of such fitness function and improve convergence. In this paper we propose an approach for Artificial Neural Network training based on optimization by continuation and meta-heuristic algorithms. The goal is to reduce overall execution time of training without causing negative effects in accuracy. We use continuation together with Particle Swarm Optimization, Firefly Algorithm and Cuckoo Search for training neural networks on public benchmark datasets. The continuation variations of the studied meta-heuristic algorithms reduce execution time required to complete training in about 5–30% without statistically significant loss of accuracy when compared with standard variations of the meta-heuristics.
Introduction
In the latest years, we have seen the efforts in optimization field to solve very complex and challenging tasks. At the same time, the success in this area has led researchers and practitioners to address much larger instances and more difficult classes of problems, specially in meta-heuristic area [1,2]. Training a neural network is an NP-hard optimization problem [3] that usually involves thousands and millions of parameters. Additionally, the fitness function of Artificial Neural Networks (ANNs) is highly non-convex [4] and presents a poor correspondence between its local and global structure [5]. Also, the number of local minima increases exponentially respect to the number of parameters of the network. In fact, such local minima appears to be saddle points instead of true minima [6] reducing the convergence speed of optimization, specially for gradient based training algorithms. In general, second-order methods have not obtained better results than first-order methods in this area [5]. These facts become even more problematic as we start using modern deep learning architectures. ANNs are widely used in several applications of image recognition [7], signal processing [8], speech [9] and several other fields Nevertheless, neural networks are playing a major role in the field of pattern recognition and learning representations from data. The most popular algorithm for training is Stochastic Gradient Descent (SGD) which is a gradient based first order1 method that has been proven to converge into local minima in the parameter space. In general, first and second order2 optimization methods require the calculation of derivatives and Hessians for which Automatic Reverse Mode Differentiation (ARMD) have to be used [10]. The parallel implementation of such methods faces important challenges due to the inherent sequential nature of ARMD. Recently, several meta-heuristic algorithms for optimization have been actively used in ANN training. Meta-heuristic algorithms have features to escape local minima and increase the probability of global convergence. In the literature there are reports of natureinspired meta-heuristics such as Cuckoo Search [11,12], Firefly Algorithm [13,14], Wolf Search Optimization [15,16], Particle Swarm Optimization (PSO) [17,18] and others.