At present, intelligent computing applications are widely used in different domains, including retail stores. The analysis of customer behaviour has become crucial for the benefit of both customers and retailers. In this regard, the concept of remote gaze estimation using deep learning has shown promising results in analyzing customer behaviour in retail due to its scalability, robustness, low cost, and uninterrupted nature. This study presents a three-stage, three-attention-based deep convolutional neural network for remote gaze estimation in retail using image data. In the first stage, we design a mechanism to estimate the 3D gaze of the subject using image data and monocular depth estimation. The second stage presents a novel three-attention mechanism to estimate the gaze in the wild from field-of-view, depth range, and object channel attentions. The third stage generates the gaze saliency heatmap from the output attention map of the second stage. We train and evaluate the proposed model using benchmark GOO-Real dataset and compare results with baseline models. Further, we adapt our model to real-retail environments by introducing a novel Retail Gaze dataset. Extensive experiments demonstrate that our approach significantly improves remote gaze target estimation performance on GOO-Real and Retail Gaze datasets.
In today’s world, retail stores are becoming smarter with the availability of numerous data and the power to analyze them autonomously. Even with the rise of online shopping, most of the physical retail stores use smart applications for the purchasing process . Several techniques and devices have been introduced to automate the shopping process and analyze shoppers’ behaviour inside stores. At the same time, the shopping experience is a key consideration towards the success of a retail business, which affects the performance of customer satisfaction, customer purchase probability, and customer loyalty –.
In order to improve the shopping experience and maximize business profits, it is essential to capture and and analyze the customer’s behaviours without interfering their natural shopping journey , . Various solutions have introduced for customer behaviour analyzis in retail using developments in computer vision technology. For instance, counting the number of people and detecting the hot spots in retail  and public , and tracking shoppers’ emotion  are such applications. However, the existing solutions only capture coarse touch-points of a shopper’s journey and vulnerable to unconstrained environment settings. With the adaptation of computer vision technologies in gaze estimation, there has been eye tracking-based solutions for customer behaviour analysis in retail as well , . Moreover, there are solutions based on virtual reality devices and head-mounted displays, wearable eye tracker based solutions , and non-intrusive 3D eye tracking solutions . However, these solutions do not completely satisfy the retailers due to high cost of 3D eye tracking solutions, unscalability of wearable, and head-mounted display-based solutions, and manual calibration of eye tracking systems.
Remote gaze saliency estimation in retail is a novel concept that has a significant potential towards building innovative retail stores. In this study, we researched the application of remote gaze saliency estimation for non-interruptive, low-cost, and scalable customer behaviour analysis in retail. We proposed a Depth-based Dual Attention model, a three-stage, three-attention-based deep CNN for gaze saliency estimation from back-head images in the wild. We developed four design solutions to comprehensively represent the parameters of gaze saliency estimation problem in retail and introduced the novel object channel and depth-rebasing components as hand-designed features, designed in our two preceding model architectures and then combined in the final model.
Extensive quantitative and qualitative analysis on the benchmark GOO-Real dataset demonstrates the superiority of the proposed models and the importance of our introduced hand-designed components. Our proposed solution improved 33% for angular error compared to the current best work in the literature. Furthermore, we introduced Retail Gaze, a real-world retail gaze saliency estimation dataset, to ensure the validity and applicability of our proposed solution in real retail environments. The proposed solution achieved an angular error of 15.3° on the Retail Gaze dataset, which demonstrates that it performs favourably in real retail environments.