A Multimodal Information Collector for Content-Based Image Retrieval System

Explicit relevance feedback requires the user to explicitly refine the search queries for content-based image retrieval. This may become laborious or even impossible due to the ever-increasing volume of digital databases. We present a multimodal information collector that can unobtrusively record and asynchronously transmit the user’s implicit relevance feedback on a displayed image to the remote CBIR server for assisting in retrieving relevant images. The modalities of user interaction include eye movements, pointer tracks and clicks, keyboard strokes, and audio including speech. The client-side information collector has been implemented as a browser extension using the JavaScript programming language and has been integrated with an existing CBIR server. We verify its functionality by evaluating the performance of the gaze-enhanced CBIR system in on-line image tagging tasks.

[1]  Paul P. Maglio,et al.  Attentive agents , 2003, Commun. ACM.

[2]  Kitsuchart Pasupa,et al.  Image ranking with implicit feedback from eye movements , 2010, ETRA.

[3]  K. Rayner Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.

[4]  Samuel Kaski,et al.  Can relevance of images be inferred from eye movements? , 2008, MIR '08.

[5]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[6]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[7]  John Shawe-Taylor,et al.  Information Retrieval by Inferring Implicit Queries from Eye Movements , 2007, AISTATS.

[8]  M. Koskela,et al.  Report on Forms of Enriched Relevance Feedback.Deliverable D1.1 of FP7 Project nº 216529 PinView , 2008 .

[9]  Erkki Oja,et al.  PicSOM-self-organizing image retrieval with MPEG-7 content descriptors , 2002, IEEE Trans. Neural Networks.

[10]  Teuvo Kohonen,et al.  Self-Organizing Maps, Third Edition , 2001, Springer Series in Information Sciences.

[11]  Jaime Teevan,et al.  Implicit feedback for inferring user preference: a bibliography , 2003, SIGF.

[12]  Jorma Laaksonen,et al.  Evaluating the performance in automatic image annotation: Example case by adaptive fusion of global image features , 2007, Signal Process. Image Commun..

[13]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[14]  Ian McGraw,et al.  The WAMI toolkit for developing, deploying, and evaluating web-accessible multimodal interfaces , 2008, ICMI '08.

[15]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[16]  Mor Naaman,et al.  Why we tag: motivations for annotation in mobile and online media , 2007, CHI.

[17]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.