ICE: Enabling Non-Experts to Build Models Interactively for Large-Scale Lopsided Problems

Quick interaction between a human teacher and a learning machine presents numerous benefits and challenges when working with web-scale data. The human teacher guides the machine towards accomplishing the task of interest. The learning machine leverages big data to find examples that maximize the training value of its interaction with the teacher. When the teacher is restricted to labeling examples selected by the machine, this problem is an instance of active learning. When the teacher can provide additional information to the machine (e.g., suggestions on what examples or predictive features should be used) as the learning task progresses, then the problem becomes one of interactive learning. To accommodate the two-way communication channel needed for efficient interactive learning, the teacher and the machine need an environment that supports an interaction language. The machine can access, process, and summarize more examples than the teacher can see in a lifetime. Based on the machine's output, the teacher can revise the definition of the task or make it more precise. Both the teacher and the machine continuously learn and benefit from the interaction. We have built a platform to (1) produce valuable and deployable models and (2) support research on both the machine learning and user interface challenges of the interactive learning problem. The platform relies on a dedicated, low-latency, distributed, in-memory architecture that allows us to construct web-scale learning machines with quick interaction speed. The purpose of this paper is to describe this architecture and demonstrate how it supports our research efforts. Preliminary results are presented as illustrations of the architecture but are not the primary focus of the paper.

[1]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[2]  References , 1971 .

[3]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[4]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[5]  Jerry Alan Fails,et al.  Interactive machine learning , 2003, IUI '03.

[6]  Josep Lluís de la Rosa i Esteva,et al.  A Taxonomy of Recommender Agents on the Internet , 2003, Artificial Intelligence Review.

[7]  Abhay Harpale,et al.  Document Classification Through Interactive Supervision of Document and Term Labels , 2004, PKDD.

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[9]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[10]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[11]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[12]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[13]  Desney S. Tan,et al.  CueFlik: interactive concept learning in image search , 2008, CHI.

[14]  Desney S. Tan,et al.  EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers , 2009, CHI.

[15]  John Langford,et al.  Slow Learners are Fast , 2009, NIPS.

[16]  Foster J. Provost,et al.  A Unified Approach to Active Dual Supervision for Labeling Features and Examples , 2010, ECML/PKDD.

[17]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[18]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[19]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[20]  Desney S. Tan,et al.  Interactive optimization for steering machine classification , 2010, CHI.

[21]  Perry R. Cook,et al.  Human model evaluation in interactive supervised learning , 2011, CHI.

[22]  Burr Settles,et al.  Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances , 2011, EMNLP.

[23]  Andrew McCallum,et al.  Toward interactive training and evaluation , 2011, CIKM '11.

[24]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[25]  Sriram Subramanian,et al.  Talking about tactile experiences , 2013, CHI.

[26]  Emanuele Della Valle,et al.  An Introduction to Information Retrieval , 2013 .

[27]  David Maxwell Chickering,et al.  Approximating the shapley value via multi-issue decompositions , 2014, AAMAS.

[28]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[29]  A. Couch 2 nd USENIX Workshop on Hot Topics in Cloud Computing ( HotCloud ’ 10 ) , .