Multi-modal Information Processing inCompanion-Systems: A Ticket Purchase System

We demonstrate a successful multimodal dynamic human-computer interaction (HCI) in which the system adapts to the current situation and the user’s state is provided using the scenario of purchasing a train ticket. This scenario demonstrates that Companion Systems are facing the challenge of analyzing and interpreting explicit and implicit observations obtained from sensors under changing environmental conditions. In a dedicated experimental setup, a wide range of sensors was used to capture the situative context and the user, comprising video and audio capturing devices, laser scanners, a touch screen, and a depth sensor. Explicit signals describe a user’s direct interaction with the system, such as interaction gestures, speech and touch input. Implicit signals are not directly addressed to the system; they comprise the user’s situative context, his or her gesture, speech, body pose, facial expressions and prosody. Both multimodally fused explicit signals and interpreted information from implicit signals steer the application component, which was kept deliberately robust. The application offers stepwise dialogs gathering the most relevant information for purchasing a train ticket, where the dialog steps are sensitive and adaptable within the processing time to the interpreted signals and data. We further highlight the system’s potential for a fast-track ticket purchase when several pieces of information indicate a hurried user.