Generalizing Manipulations using Vision Kernels

In order to perform complex manipulation tasks, a robot must know which actions it can perform with the available objects. In unstructured environments, potential manipulations afforded by objects will not be pre-specified, and must instead be learned. Rather than determining each novel object’s affordances from scratch, the robot can learn more efficiently by generalizing manipulations from similar known objects. Actions can be generalized to new objects by learning direct mappings from the object’s visual features to actions [1]. This approach differentiates itself from indirect methods by not requiring intermediate representations, such as object classes [2]. A robot can autonomously learn the afforded actions of an object by applying the action to the object and observing the resulting effects [3], [4]. If the desired effect is achieved, then the object can be labeled as affording this action. Thus, this affordance learning task can be treated as a binary classification problem for a given action. Our approach is based on two key insights: 1) The perception of objects and the interactions between objects are based largely on the objects’ surface geometries [1], and 2) the affordances of objects are often related to only subparts of objects and not the whole object [5]. Therefore, we propose generalizing actions to new objects by finding subparts of objects that have similar shapes and are, therefore, more likely to have the same affordances. The subparts of objects are represented in a nonparametric manner, which is based directly on the observed point clouds of the subparts. Thus, the robot does not rely on task-specific visual features, and can discriminate between any subparts that are not visually identical. Using this nonparametric representation, we also define a kernel function for computing the similarity between different subparts. Hence, we can use kernel learning methods [6], such as kernel logistic regression, in order to learn which subparts afford a given action. The proposed method was successfully tested on a real robot, as shown in Fig. 1. Starting with a single human demonstration of the task, the robot was able to learn to generalize this action to novel objects of different shapes and sizes.