I Hear, See, Speak & Do: Bringing Multimodal Information Processing to Intelligent Virtual Agents for Natural Human-AI Communication
Ke Li
Fariba Mostajeran
Sebastian Rings
Lucie Kruse
Susanne Schmidt
Michael Arz
Erik Wolf
Frank Steinicke
Abstract
In this demo paper, we present an Extended Reality (XR) framework providing a streamlined workflow for creating and interacting with intelligent virtual agents (IVAs) with multimodal information processing capabilities using commercially available artificial intelligence (AI) tools and cloud services such as large language and vision models. The system supports (i) the integration of high-quality, customizable virtual 3D human models for visual representations of IVAs and (ii) multimodal communication with generative AI-driven IVAs in immersive XR, featuring realistic human behavior simulations. Our demo showcases the enormous potential and vast design space of embodied IVAs for various XR applications.
Type
Conference paper
Publication
2025 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)