I Hear, See, Speak & Do: Bringing Multimodal Information Processing to Intelligent Virtual Agents for Natural Human-AI Communication

Ke Li

Fariba Mostajeran

Sebastian Rings

Lucie Kruse

Susanne Schmidt

Michael Arz

Erik Wolf

Frank Steinicke

PDF Cite DOI

Abstract

In this demo paper, we present an Extended Reality (XR) framework providing a streamlined workflow for creating and interacting with intelligent virtual agents (IVAs) with multimodal information processing capabilities using commercially available artificial intelligence (AI) tools and cloud services such as large language and vision models. The system supports (i) the integration of high-quality, customizable virtual 3D human models for visual representations of IVAs and (ii) multimodal communication with generative AI-driven IVAs in immersive XR, featuring realistic human behavior simulations. Our demo showcases the enormous potential and vast design space of embodied IVAs for various XR applications.

Type

Conference paper

Publication

2025 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)