MMI: Multimodal Interaction in Virtual Reality using Speech and Gestures

Oct 1, 2018 - Oct 20, 2020·

Erik Wolf

•

Sara Wolf

Summary

The “Multimodal Interaction using Speech and Gestures in VR” project investigated the effectiveness of instruction-based multimodal interfaces (MMIs) that combine speech and gesture as an alternative to conventional unimodal interfaces (UMIs) for creative design tasks in Virtual Reality (VR). The work confirmed that multimodal interaction provides significant advantages for key characteristics of VR design processes and ultimately performs on par with traditional methods in open design tasks, suggesting technological maturity of MMIs in this domain.

This research was conducted at the HCI group of the University of Würzburg and has been recognized for its significant contributions to the field, earning Best Paper Awards at the International Conference on Multimodal Interaction (ICMI) in 2019 and 2020. The results provide guidance for system architects, strongly recommending simple synergistic multimodal interfaces for 3D object manipulation in VR design tasks.

Motivation

While VR is considered a promising medium for designers, typical graphical user interfaces often divert attention from the object being manipulated to the interface itself. The required context switching hampers the creative process and can break the user’s feeling of flow in the design process. The project sought to leverage multimodal interaction, as pioneered by concepts like Bolt’s “Put-That-There”, to allow users to stay focused on their object of interest, thereby promoting flow, usability, and presence to create creativity-enhancing conditions.

Concept and Technological Approach

To address the need for seamless interaction, we developed a Multimodal Toolbox that supports basic object manipulation. The technological approach implemented a synergistic combination of speech and pointing/grabbing gestures to change the size, color, and texture of objects. The system architecture utilized the VR platform Unity alongside the open-source platform Simulator X, which employs a concurrent Augmented Transition Network to define and perform semantic fusion of the concurrent speech and gesture inputs. This design eliminated the need for visual attention shifts inherent in menu-based systems.

Outcomes and Impact

The initial evaluation demonstrated significant advantages for the proposed multimodal interaction paradigm, with MMI resulting in a lower perceived task duration and a higher reported feeling of flow. Furthermore, it provided a higher intuitive use and a lower mental workload compared to the UMI interface. Subsequent research applying the MMI to an open design task confirmed that the MMI performs on par with the unimodal interface across measures of flow, usability, presence, and the judged creativity of the designed products, despite the MMI’s still-present reliability and flexibility limitations.

2020

Chris Zimmerer, Erik Wolf, Sara Wolf, Martin Fischbach, Jean-Luc Lugrin, Marc Erich Latoschik (2020). Finally on Par?! Multimodal and Unimodal Interaction for Open Creative Design Tasks in Virtual Reality. 2020 International Conference on Multimodal Interaction.

PDF DOI

2019

Erik Wolf, Sara Klüber, Chris Zimmerer, Jean-Luc Lugrin, Marc Erich Latoschik (2019). “Paint That Object Yellow”: Multimodal Interaction to Enhance Creativity During Design Tasks in VR. 2019 International Conference on Multimodal Interaction.