Human Gaze Assisted Artificial Intelligence: A Review

Human gaze reveals a wealth of information about internal cognitive state. Thus, gaze-related research has significantly increased in computer vision, natural language processing, decision learning, and robotics in recent years. We provide a high-level overview of the research efforts in these fields, including collecting human gaze data sets, modeling gaze behaviors, and utilizing gaze information in various applications, with the goal of enhancing communication between these research areas. We discuss future challenges and potential applications that work towards a common goal of human-centered artificial intelligence.

[1]  Frédo Durand,et al.  What Do Different Evaluation Metrics Tell Us About Saliency Models? , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  B. S. Manjunath,et al.  How Do Drivers Allocate Their Potential Attention? Driving Fixation Prediction via Convolutional Neural Networks , 2020, IEEE Transactions on Intelligent Transportation Systems.

[3]  Shuo Wang,et al.  Predicting human gaze beyond pixels. , 2014, Journal of vision.

[4]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[5]  James M. Rehg,et al.  In the Eye of Beholder: Joint Learning of Gaze and Actions in First Person Video , 2018, ECCV.

[6]  Qi Zhao,et al.  Attentive Systems: A Survey , 2017, International Journal of Computer Vision.

[7]  Matthias Bethge,et al.  DeepGaze II: Reading fixations from deep features trained on object recognition , 2016, ArXiv.

[8]  Bingbing Ni,et al.  Egocentric Activity Prediction via Event Modulated Attention , 2018, ECCV.

[9]  Hsiu-Chin Lin,et al.  The 2017 IEEE International Conference on Robotics and Automation (ICRA) , 2017 .

[10]  Sven Wachsmuth,et al.  Are you talking to me?: Improving the Robustness of Dialogue Systems in a Multi Party HRI Scenario by Incorporating Gaze Direction and Lip Movement of Attendees , 2016, HAI.

[11]  Emiel Krahmer,et al.  DIDEC: The Dutch Image Description and Eye-tracking Corpus , 2018, COLING.

[12]  W. Cowan,et al.  Annual Review of Neuroscience , 1995 .

[13]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Cristian Sminchisescu,et al.  Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Luxin Zhang,et al.  AGIL: Learning Attention from Human for Visuomotor Tasks , 2018, ECCV.

[16]  Rita Cucchiara,et al.  Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention , 2017 .

[17]  Dhruv Batra,et al.  Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions? , 2016, EMNLP.

[18]  Tony Belpaeme,et al.  Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction , 2014, HRI 2014.

[19]  A. Aldo Faisal,et al.  Gaze-based, Context-aware Robotic System for Assisted Reaching and Grasping , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[20]  Karon E. MacLean,et al.  Meet Me where I’m Gazing: How Shared Attention Gaze Affects Human-Robot Handover Timing , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[21]  Frédo Durand,et al.  Where Should Saliency Models Look Next? , 2016, ECCV.

[22]  Mario Fritz,et al.  MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Sergio Escalera,et al.  LSTA: Long Short-Term Attention for Egocentric Action Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Chen Yu,et al.  What you learn is what you see: using eye movements to study infant cross-situational word learning. , 2011, Developmental science.

[25]  James Hays,et al.  WebGazer: Scalable Webcam Eye Tracking Using User Interactions , 2016, IJCAI.

[26]  R. Venkatesh Babu,et al.  DeepFix: A Fully Convolutional Neural Network for Predicting Human Eye Fixations , 2015, IEEE Transactions on Image Processing.

[27]  Luc Van Gool,et al.  European conference on computer vision (ECCV) , 2006, eccv 2006.

[28]  Andrea Palazzi,et al.  Predicting the Driver's Focus of Attention: The DR(eye)VE Project , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Naila Murray,et al.  End-to-End Saliency Mapping via Probability Distribution Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Travis J. Wiltshire,et al.  Toward understanding social cues and signals in human–robot interaction: effects of robot gaze and proxemic behavior , 2013, Front. Psychol..

[31]  Shenghua Gao,et al.  Gaze Prediction in Dynamic 360° Immersive Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Luxin Zhang,et al.  Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset , 2019, ArXiv.

[33]  Peter Robinson,et al.  Rendering of Eyes for Eye-Shape Registration and Gaze Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Guy Hoffman,et al.  Computational Human-Robot Interaction , 2016, Found. Trends Robotics.

[35]  Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction , 2018, HRI.

[36]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[37]  Giulio Sandini,et al.  Robot reading human gaze: Why eye tracking is better than head tracking for human-robot collaboration , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[38]  Yifan Peng,et al.  Studying Relationships between Human Gaze, Description, and Computer Vision , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Ali Borji,et al.  Revisiting Video Saliency: A Large-Scale Benchmark and a New Model , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Jianfeng Dong,et al.  Exploring Human-like Attention Supervision in Visual Question Answering , 2017, AAAI.

[41]  N. Emery,et al.  The eyes have it: the neuroethology, function and evolution of social gaze , 2000, Neuroscience & Biobehavioral Reviews.

[42]  Anca D. Dragan,et al.  Nonverbal Robot Feedback for Human Teachers , 2019, CoRL.

[43]  Norman I. Badler,et al.  A Review of Eye Gaze in Virtual Agents, Social Robotics and HCI: Behaviour Generation, User Interaction and Perception , 2015, Comput. Graph. Forum.

[44]  Joyce Yue Chai,et al.  Embodied Collaborative Referring Expression Generation in Situated Human-Robot Interaction , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[45]  David Whitney,et al.  Predicting Driver Attention in Critical Situations , 2017, ACCV.

[46]  Sean Andrist,et al.  Looking Coordinated: Bidirectional Gaze Mechanisms for Collaborative Interaction with Virtual Characters , 2017, CHI.

[47]  Yu Wang,et al.  Human-Robot Interaction Based on Gaze Gestures for the Drone Teleoperation , 2014 .

[48]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[49]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[50]  P. Quinn,et al.  Developmental Science. , 2011, Developmental science.

[51]  William D. Smart,et al.  Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction , 2015 .

[52]  Qi Zhao,et al.  SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Yonghong Peng,et al.  Gaze-Informed Egocentric Action Recognition for Memory Aid Systems , 2018, IEEE Access.

[54]  Yusuke Sugano,et al.  Seeing with Humans: Gaze-Assisted Neural Image Captioning , 2016, ArXiv.

[55]  Ali Borji,et al.  Human Attention in Image Captioning: Dataset and Analysis , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[56]  Dhruv Batra,et al.  Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions? , 2016, EMNLP.

[57]  Rita Cucchiara,et al.  Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model , 2016, IEEE Transactions on Image Processing.

[58]  T. Michael Knasel,et al.  Robotics and autonomous systems , 1988, Robotics Auton. Syst..

[59]  Qiong Huang,et al.  TabletGaze: dataset and analysis for unconstrained appearance-based gaze estimation in mobile tablets , 2017, Machine Vision and Applications.

[60]  Ashwin P. Dani,et al.  Gaze and motion information fusion for human intention inference , 2018, International Journal of Intelligent Robotics and Applications.

[61]  Shenghua Gao,et al.  Saliency Detection in 360° Videos , 2018, ECCV 2018.

[62]  Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications , 2019, ETRA.

[63]  Scott Niekum,et al.  Understanding Teacher Gaze Patterns for Robot Learning , 2019, CoRL.

[64]  Vision Research , 1961, Nature.

[65]  L. Itti,et al.  Defending Yarbus: eye movements reveal observers' task. , 2014, Journal of vision.

[66]  Ali Borji,et al.  Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[67]  Jongwook Choi,et al.  Supervising Neural Attention Models for Video Captioning by Human Gaze Data , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  ScassellatiBrian,et al.  Social eye gaze in human-robot interaction , 2017, HRI 2017.

[69]  Qi Zhao,et al.  Boosted Attention: Leveraging Human Attention for Image Captioning , 2018, ECCV.

[70]  G. Herrera DESCRIPTION , 1949 .