شبکه های عصبی کانولوشن عمیق اسپایک مبتنی بر STDP
ترجمه نشده

شبکه های عصبی کانولوشن عمیق اسپایک مبتنی بر STDP

عنوان فارسی مقاله: شبکه های عصبی کانولوشن عمیق اسپایک مبتنی بر STDP برای تشخیص هدف
عنوان انگلیسی مقاله: STDP-based spiking deep convolutional neural networks for object recognition
مجله/کنفرانس: شبکه های عصبی - Neural Networks
رشته های تحصیلی مرتبط: مهندسی کامپیوتر، فناوری اطلاعات
گرایش های تحصیلی مرتبط: هوش مصنوعی، شبکه های کامپیوتری
کلمات کلیدی فارسی: شبکه عصبی اسپایک، STDP، یادگیری عمیق، تشخیص هدف، کدینگ موقتی
کلمات کلیدی انگلیسی: Spiking neural network، STDP، Deep learning، Object recognition، Temporal coding
نوع نگارش مقاله: مقاله پژوهشی (Research Article)
شناسه دیجیتال (DOI): https://doi.org/10.1016/j.neunet.2017.12.005
دانشگاه: Department of Computer Science - School of Mathematics Statistics and Computer Science - University of Tehran - Iran
صفحات مقاله انگلیسی: 12
ناشر: الزویر - Elsevier
نوع ارائه مقاله: ژورنال
نوع مقاله: ISI
سال انتشار مقاله: 2018
ایمپکت فاکتور: 8/446 در سال 2017
شاخص H_index: 121 در سال 2019
شاخص SJR: 2/359 در سال 2017
شناسه ISSN: 0893-6080
شاخص Quartile (چارک): Q1 در سال 2017
فرمت مقاله انگلیسی: PDF
وضعیت ترجمه: ترجمه نشده است
قیمت مقاله انگلیسی: رایگان
آیا این مقاله بیس است: خیر
کد محصول: E10736
فهرست مطالب (انگلیسی)

Abstract

1- Introduction

2- Proposed spiking deep neural network

3- Results

4- Discussion

5- Supporting information

References

بخشی از مقاله (انگلیسی)

Abstract

Previous studies have shown that spike-timing-dependent plasticity (STDP) can be used in spiking neural networks (SNN) to extract visual features of low or intermediate complexity in an unsupervised manner. These studies, however, used relatively shallow architectures, and only one layer was trainable. Another line of research has demonstrated – using rate-based neural networks trained with back-propagation – that having many layers increases the recognition robustness, an approach known as deep learning. We thus designed a deep SNN, comprising several convolutional (trainable with STDP) and pooling layers. We used a temporal coding scheme where the most strongly activated neurons fire first, and less activated neurons fire later or not at all. The network was exposed to natural images. Thanks to STDP, neurons progressively learned features corresponding to prototypical patterns that were both salient and frequent. Only a few tens of examples per category were required and no label was needed. After learning, the complexity of the extracted features increased along the hierarchy, from edge detectors in the first layer to object prototypes in the last layer. Coding was very sparse, with only a few thousands spikes per image, and in some cases the object category could be reasonably well inferred from the activity of a single higher-order neuron. More generally, the activity of a few hundreds of such neurons contained robust category information, as demonstrated using a classifier on Caltech 101, ETH-80, and MNIST databases. We also demonstrate the superiority of STDP over other unsupervised techniques such as random crops (HMAX) or auto-encoders. Taken together, our results suggest that the combination of STDP with latency coding may be a key to understanding the way that the primate visual system learns, its remarkable processing speed and its low energy consumption. These mechanisms are also interesting for artificial vision systems, particularly for hardware solutions.

Introduction

Primate’s visual system solves the object recognition task through hierarchical processing along the ventral pathway of the visual cortex (DiCarlo, Zoccolan, & Rust, 2012). Through this hierarchy, the visual preference of neurons gradually increases from oriented bars in primary visual cortex (V1) to complex objects in inferotemporal cortex (IT), where neural activity provides a robust, invariant, and linearly-separable object representation (DiCarlo & Cox, 2007; DiCarlo et al., 2012). Despite the extensive feedback connections in the visual cortex, the first feed-forward wave of spikes in IT (∼ 100–150 ms post-stimulus presentation) appears to be sufficient for crude object recognition (Hung, Kreiman, Poggio, & DiCarlo, 2005; Liu, Agam, Madsen, & Kreiman, 2009; Thorpe, Fize, Marlot, et al., 1996). During the last decades, various computational models have been proposed to mimic this hierarchical feed-forward processing (Fukushima, 1980; LeCun & Bengio, 1998; Lee, Grosse, Ranganath, & Ng, 2009; Masquelier & Thorpe, 2007; Serre, Wolf, Bileschi, Riesenhuber, & Poggio, 2007). Despite the limited successes of the early models (Ghodrati, Farzmahdi, Rajaei, Ebrahimpour, & Khaligh-Razavi, 2014; Pinto, Barhomi, Cox, & DiCarlo, 2011), recent advances in deep convolutional neural networks (DCNN) led to high performing models (Krizhevsky, Sutskever, & Hinton, 2012; Simonyan & Zisserman, 2014; Zeiler & Fergus, 2014). Beyond the high precision, DCNNs can tolerate object variations as humans do (Kheradpisheh, Ghodrati, Ganjtabesh, & Masquelier, 2016a, b), use IT-like object representations (Cadieu et al., 2014; Khaligh-Razavi & Kriegeskorte, 2014), and match the spatiotemporal dynamics of the ventral visual pathway (Cichy, Khosla, Pantazis, Torralba, & Oliva, 2016). Although the architecture of DCNNs is somehow inspired by the primate’s visual system (LeCun, Bengio, & Hinton, 2015) (a hierarchy of computational layers with gradually increasing receptive fields), they totally neglect the actual neural processing and learning mechanisms in the cortex. The computing units of DCNNs send floating-point values to each other which correspond to their activation level, while, biological neurons communicate to each other by sending electrical impulses (i.e., spikes). The amplitude and duration of all spikes are almost the same, so they are fully characterized by their emission time. Interestingly, mean spike rates are very low in the primate visual systems (perhaps only a few of hertz Shoham, OConnor, & Segev, 2006). Hence, neurons appear to fire a spike only when they have to send an important message, and some information can be encoded in their spike times. Such spike-time coding leads to a fast and extremely energy-efficient neural computation in the brain (the whole human brain consumes only about 10–20 W of energy Maass, 2002).