PRESENCE: Realizing Intelligent Virtual Humans for eXtended Reality

PRESENCE: Realizing Intelligent Virtual Humans for eXtended Reality

Apr 15, 2024 - Oct 31, 2024·
Erik Wolf
Erik Wolf
Credit: PRESENCE

Summary

The Intelligent Virtual Human (IVH) pillar of the PRESENCE project is dedicated to developing a robust software toolkit (SDK) for creating and interacting with hyper-realistic virtual characters in eXtended Reality (XR) environments. These virtual humans serve dual roles: Smart Avatars (SAs), which represent human users, and Intelligent Virtual Agents (IVAs), which are autonomous, AI-powered characters. The central goal is to facilitate natural and realistic human-to-human and human-to-agent interactions by delivering characters capable of sophisticated, multimodal communication, incorporating speech, accurate facial expressions, gaze tracking, and realistic full-body movements. This research is funded by the European Union’s Horizon Europe research and innovation program.

Motivation

Providing users with compelling representations of themselves and others is essential for achieving a high sense of presence and immersion in XR environments, a key aspect of the Metaverse. However, traditional methods face several challenges. Generating highly realistic 3D human models that accurately reflect a user’s visual appearance for use as avatars must be fast and cost-effective. Furthermore, IVAs require complex behavioral generation to ensure their actions are natural, plausible, and socially compatible, adhering to real-world physics and social rules to ensure high levels of embodiment and presence. Addressing these gaps, the IVH toolkit strives to move beyond simple human-human communication to enable hybrid interactions involving multiple real and artificial users.

Concept and Technological Approach

The IVH toolkit is developed for the Unity3D game engine and consists of five integrated modules that collectively enable advanced virtual human capabilities.

  • Virtual Human 3D Models: This module provides cost-efficient, real-time photorealistic humanoid 3D models. It leverages an efficient user-driven offline pipeline to generate fully rigged, skinned, and animatable models from simple camera inputs, such as a frontal facial photo and basic user data (e.g., height and weight). These models are used to create diverse SAs and IVAs for use cases in cultural heritage, healthcare, professional collaboration, and manufacturing training.
  • Motion Tracking and Action Classification: This task focuses on detecting and classifying the actions and intentions of virtual humans (SAs or IVAs) in real-time. It employs a novel framework that combines a powerful Python-based Visual Analyzer, utilizing deep learning for object detection, tracking, and 2D pose estimation, with a Unity package for receiving classification results via a REST API.
  • Speech and Facial Interaction: This module enhances communication capabilities using advanced AI services. It integrates Speech-to-Text (STT), Text-to-Speech (TTS), and Large Language Models (LLMs) to provide IVAs with contextually relevant conversational abilities. IVAs can so far express six basic emotions by modifying facial blend shapes according to the Facial Action Coding System (FACS).
  • Full Body Animation and Interaction: This task develops physics-based character animation methods that often rely on deep reinforcement learning (DRL) techniques. The goal is to generate physics-based controllers that run in real time within the Unity simulation, ensuring the virtual characters’ body movements are realistic, flexible, and capable of handling proximal space.
  • Multimodal Interaction: This module integrates the outcomes of the other packages mentioned above to ensure that IVHs function as a coherent, interactive system. This integration enables complex behaviors, such as an IVA detecting user actions, approaching while respecting social distance, and engaging in a natural, speech-based, emotionally expressive dialogue.

Outcomes and Impact

The initial implementation of the IVH SDK has successfully integrated several core functionalities and is being deployed across various XR use cases, including manufacturing training and cultural heritage. A preliminary evaluation of the toolkit among XR developers showed neutral to positive satisfaction with the user experience and technology acceptance. Although feedback indicated a need for improved usability and more diverse models/animations, the participants generally stated they would recommend and work with the toolkit again.

2025

(2025). The Influence of Avatar Visual Fidelity on Embodiment and User Experience in Virtual Reality.
(2025). I Hear, See, Speak & Do: Bringing Multimodal Information Processing to Intelligent Virtual Agents for Natural Human-AI Communication . 2025 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW).
(2025). A Toolkit for Creating Intelligent Virtual Humans in Extended Reality. 2025 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW).