Monday, January 8th 2024

NVIDIA and Developers Pioneer Lifelike Digital Characters for Games and Applications With NVIDIA Avatar Cloud Engine

NVIDIA today introduced production microservices for the NVIDIA Avatar Cloud Engine (ACE) that allow developers of games, tools and middleware to integrate state-of-the-art generative AI models into the digital avatars in their games and applications. The new ACE microservices let developers build interactive avatars using AI models such as NVIDIA Omniverse Audio2Face (A2F), which creates expressive facial animations from audio sources, and NVIDIA Riva automatic speech recognition (ASR), for building customizable multilingual speech and translation applications using generative AI.

Developers embracing ACE include Charisma.AI, Convai, Inworld, miHoYo, NetEase Games, Ourpalm, Tencent, Ubisoft and UneeQ. "Generative AI technologies are transforming virtually everything we do, and that also includes game creation and gameplay," said Keita Iida, vice president of developer relations at NVIDIA. "NVIDIA ACE opens up new possibilities for game developers by populating their worlds with lifelike digital characters while removing the need for pre-scripted dialogue, delivering greater in-game immersion."
Top Game and Interactive Avatar Developers Embrace NVIDIA ACE
Top game and interactive avatar developers are pioneering ways ACE and generative AI technologies can be used to transform interactions between players and non-playable characters (NPCs) in games and applications.

"For years NVIDIA has been the pied piper of gaming technologies, delivering new and innovative ways to create games," said Zhipeng Hu, senior vice president of NetEase and head of LeiHuo business group. "NVIDIA is making games more intelligent and playable through the adoption of gaming AI technologies, which ultimately creates a more immersive experience."

"This is a milestone moment for AI in games," said Tencent Games. "NVIDIA ACE and Tencent Games will help lay the foundation that will bring digital avatars with individual, lifelike personalities and interactions to video games."

NVIDIA ACE Brings Game Characters to Life
NPCs have historically been designed with predetermined responses and facial animations. This limited player interactions, which tended to be transactional, short-lived and, as a result, skipped by a majority of players.

"Generative AI-powered characters in virtual worlds unlock various use cases and experiences that were previously impossible," said Purnendu Mukherjee, founder and CEO at Convai. "Convai is leveraging Riva ASR and A2F to enable lifelike NPCs with low-latency response times and high-fidelity natural animation."

To showcase how ACE can transform NPC interactions, NVIDIA worked with Convai to expand the NVIDIA Kairos demo, which debuted at Computex, with a number of new features and inclusion of ACE microservices.

In the latest version of Kairos, Riva ASR and A2F are used extensively, improving NPC interactivity. Convai's new framework allows NPCs to converse among themselves and gives them awareness of objects, enabling them to pick up and deliver items to desired areas. Furthermore, NPCs gain the ability to lead players to objectives and traverse worlds.

The Audio2Face and Riva automatic speech recognition microservices are available now. Interactive avatar developers can incorporate the models individually into their development pipelines.
Add your own comment

3 Comments on NVIDIA and Developers Pioneer Lifelike Digital Characters for Games and Applications With NVIDIA Avatar Cloud Engine

#1
kurta999
This one looks very interesting. I'm curious, how this will work in real games.
Posted on Reply
#2
evernessince
kurta999This one looks very interesting. I'm curious, how this will work in real games.
They have a video demonstrating it:

Pretty much what it's doing is generating NPC voice-lines for NPC to NPC and PC to NPC conversations.

There are a couple of clear problems with this though:

1) The lip sync and facial animations are not that good, not even Bethesda level.

2) The dynamically generated content is meaningless. The AI is just producing worthless filler. In order for a conversation to advance the game in any meaningful way the developer has to specifically program that just the same as if they weren't using this. The part of the video where the player requests a drink for example was 100% specifically programmed in by either requiring that specific phrase or by requiring a phrase the AI considers close enough. Honestly that alone could lead to a lot of frustration if the player doesn't know the phrase to advance a quest.

3) Current AI models cannot learn on the fly, which means that the AI cannot ouput responses containing anything relating to main story progress or the changing game world. The developer can program in flags that let the AI know that game events changed but that would be a terrible idea. Why? The problem stems from how models are trained, you can only train the AI on one set of data. You could train one AI for every potential world state in your game but that would be a programming nightmare and would use way too much space. Your average AI model hits 3.7 GB to 6.2GB so that times 100 plus isn't feasible. On the flip side if you were to train one model on every potential game state all you'd create is an AI that mixes responses from world states all the time. The way the weights work for AI neurons, even if the devs did specifically include flags to inform the AI of the world state it would still frequently mix up responses from other potential world states. This is because modern AI networks are complicated multi-level neuron networks where changing the weight of a single input does not guarantee desirable results, hence why most AI is currently used for single tasks.

3) AI content that isn't specifically moderated by a human is typically either bad or mediocre at best. I play around with stable diffusion a lot and it takes quite a bit of work to get an acceptable result. 99% of the time the output by the AI has multiple issues (floating or extra limbs is common), makes no sense, or is generic af. You also tend to notice that AIs follow certain predictable output patterns as well. At the end of the day I'm spending a few hours curating a prompt to get 1 good picture with the 100 or so other AI generated pictures going into the trash.

There's already a Skyrim mod called Mantella that does exactly what is advertised here, you can speak to an AI and they will respond. It's good for generating generic filler or for those that are so lonely they need a response from an AI they ultimatley know is meaningless.

I definitely support the use of AI to assist game developers as I use AI to assist me. AI without human help can not generate interesting content. I fear it will be widely used to justify a further reduction in video game quality and more layoffs in an already difficult period for many in the industry. Hopefully gamers are able to see this because otherwise I fear a glut of generic AI generated games would only create a bubble in the market that is prone to bursting.
Posted on Reply
#3
kurta999
I saw that video already, that's started my interest here.

Thank you for the detailed response, this is exactly what I was curious about. So I don't have to wait for that in "real life", you clarified lot of things.
Posted on Reply
Dec 26th, 2024 06:37 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts