Abstract
1. Introduction
2. Related works
3. Experimental setting
4. Results
5. Conclusion
Declaration of Competing Interest
Acknowledgments
Appendix A. Visualisation of the steps of the proposed approach
Appendix B. External dataset benchmarks
Appendix C. Learning curves for CNN networks
Appendix D. CIFAR-10 significance tests
Appendix E. CIFAR-100 significance tests
Appendix F. ILSVRC-2012 significance tests
Appendix G. Best parameters for classification algorithms on CIFAR-10 intermediate data
Appendix H. Best parameters for classification algorithms on CIFAR-100 intermediate data
Appendix I. Best parameters of classifiers on high-level image features
References
Abstract
Over the course of research on convolutional neural network (CNN) architectures, few modifications have been made to the fully connected layers at the ends of the networks. In image classification, these neural network layers are responsible for creating the final classification results based on the output of the last layer of high-level image filters. Before the breakthrough of CNNs, these image filters were handcrafted, and any classification algorithm could be applied to their output. Because neural networks use gradient descent to learn their weights subject to the classification error, fully connected neural networks are a natural choice for CNNs. But a question arises: Are fully connected layers in a CNN superior to other classification algorithms? In this work, we benchmark different classification algorithms on CNNs by removing the existing fully connected classifiers. Thus, the flattened output from the last convolutional layer is used as the input for multiple benchmark classification algorithms. To ensure the generalisability of the findings, numerous CNNs are trained on CIFAR-10, CIFAR-100, and a subset of ILSVRC-2012 with 100 classes. The experimental results reveal that multiple classification algorithms, namely logistic regression, support vector machines, eXtreme gradient boosting, random forests and K-nearest neighbours, are capable of outperforming fully connected neural networks. Furthermore, the superiority of a particular classification algorithm depends on the underlying CNN structure and the nature of the classification problem. For classification problems with many classes or for CNNs that produce many high-level image features, other classification algorithms are likely to perform better than fully connected neural networks. It follows that it is advisable to benchmark multiple classification algorithms on high-level image features produced from the CNN layers to improve classification performance.
Introduction
Computer vision is a sub-field of artificial intelligence and computer science that enables computers to develop a visual perception of real-world entities (Szeliski, 2010). This is achieved by automatically extracting, analysing, and understanding information from input images. The field of computer vision can be divided into multiple sub-areas, each of which focuses on specific information of the image data: classification, localisation, detection, semantic segmentation, and instance segmentation (see Fig. 1). This paper focuses on algorithms applied to the task of image classification. Before the dominance of neural networks in computer vision research, any classification algorithm was used to distinguish the classes based on the output from manually designed feature extractors (filters) (LeCun, Bottou, Bengio, & Haffner, 1998). The emergence of convolutional neural networks in computer vision produced a shift from hand-designed feature extractors to automatically generated feature extractors trained with backpropagation. Computer vision has made a lot of progress since the breakthrough of artificial neural networks (ANN). This development was fostered by the increases in available computational power and training data. But it was only in 2012, when a research team from the University of Toronto constructed AlexNet (Krizhevsky, Sutskever, & Hinton, 2012) for the ImageNet Large Scale Visual Recognition Challenge (ILSVRC-2012), that a convolutional neural network (CNN) architecture outperformed traditional approaches in the classification and localisation tasks (Russakovsky et al., 2015). After that, CNN architectures were improved in many ways, found their way into multiple fields of research and were successfully applied in the industry (Szegedy, Vanhoucke, Ioffe, Shlens, & Wojna, 2016).