Abstract
I. Introduction
II. Related Works
III. Methodology
IV. Benchmark
V. Experiments
Authors
Figures
References
Abstract
Semantic segmentation performs pixel-wise classification for given images, which can be widely used in autonomous driving, robotics, medical diagnostics and etc. The recent advanced approaches have witnessed rapid progress in semantic segmentation. However, these supervised learning based methods rely heavily on large-scale datasets to acquire strong generalizing ability, such that they are coupled with some constraints. Firstly, human annotation of pixel-level segmentation masks is laborious and timeconsuming, which causes relatively expensive training data and make it hard to deal with urgent tasks in dynamic environment. Secondly, the outstanding performance of the above data-hungry methods will decrease with few available training examples. In order to overcome the limitations of the supervised learning semantic segmentation methods, this paper proposes a generalized meta-learning framework, named MetaSeg. It consists of a meta-learner and a base-learner. Specifically, the meta-learner learns a good initialization and a parameter update strategy from a distribution of few-shot semantic segmentation tasks. The baselearner can be any semantic segmentation models theoretically and can implement fast adaptation (that is updating parameters with few iterations) under the guidance of the meta-learner. In this work, the successful semantic segmentation model FCN8s is integrated into Meta-Seg. Experiments on the famous few-shot semantic segmentation dataset PASCAL5i prove Meta-Seg is a promising framework for few-shot semantic segmentation. Besides, this method can provide with reference for the relevant researches of meta-learning semantic segmentation.
Introduction
In recent years, deep learning, especially convolutional networks [1], have made significant breakthroughs in many visual understanding tasks including image classification [2]–[5], object detection [6]–[13] and semantic segmentation [14]–[21]. One crucial reason driving their development is the availability of large-scale datasets such as ImageNet [22] that enable the training of deep networks. Semantic segmentation aims to assign a class label to each pixel in an image. Deep convolutional network in semantic segmentation, as shown in Fig. 1 (a), requires a large amount of annotated data to ensure the robustness of the model. It still faces the challenges of overfitting in a few-shot regime. Nevertheless, data labeling is expensive and laborious, particularly for dense prediction tasks, e.g, semantic segmentation, instance segmentation and panoptic segmentation. Hence, weakly supervised semantic segmentation methods [23]–[27] are proposed to reduce the burden of data annotation. These methods merely solve the dependencies on annotated data, which still require plenty of training images. In addition to that, once the segmentation model is trained, it is difficult to use existing model to predict new classes. In contrast, humans can segment a novel concept from the scene easily even with few samples.