Tuesday, August 22nd 2023
NVIDIA Paves the Way for Natural Speech Conversations with Game NPCs
Imagine you're in a vast RPG fill with hundreds, if not thousands, of interactive NPCs (non-playable characters). All current RPGs conduct your interactions with them over a bunch of pre-defined statement selections, where you choose among a bunch of text-based options on the screen, which elicits a certain response from the NPC. This feels very unnatural and railroaded, but NVIDIA plans to change this. With ACE (character engine) and NeMo SteerLM (a natural language model), NVIDIA wants to make voice based interactions with NPCs possible. This is a very necessary stepping stone toward the near-future, where NPCs will be backed by large GPTs letting you have lengthy conversations with them.
The way this works is, the player gives an NPC a natural language voice input. A speech-to-text engine and LLM process the voice input, and generate a natural language response. Omniverse Audio2Face is leveraged to create the NPC's response in real time. Announcing this Gamescom, NVIDIA's new NeMo SteerLLM adds life to the part of ACE that processes the natural voice input, and based on the kind of personality traits the game developer gives an NPC, generates responses with varying degree of creativity, humor, and toxicity among other attributes.
The way this works is, the player gives an NPC a natural language voice input. A speech-to-text engine and LLM process the voice input, and generate a natural language response. Omniverse Audio2Face is leveraged to create the NPC's response in real time. Announcing this Gamescom, NVIDIA's new NeMo SteerLLM adds life to the part of ACE that processes the natural voice input, and based on the kind of personality traits the game developer gives an NPC, generates responses with varying degree of creativity, humor, and toxicity among other attributes.
26 Comments on NVIDIA Paves the Way for Natural Speech Conversations with Game NPCs
NPC: Your amazon order has shipped and should arrive by 9PM.
NPC: There are new recommendations for you, would you like to see them?
Smartphones gave everyone a voice but they didn't make the Internet a better place. They just generated more noise.
It's a shame that ML technology won't be used to filter out the garbage and improve the signal-to-noise ratio. There's little monetization in that.
The Internet has become a mental health ghetto. In the real world, there are contextual clues that someone is to be avoided. And you can move away from them so they are out of earshot. On the Web -- especially in a text-based chat -- everyone is equidistant from you.
This Nvidia announcement is mostly about making the conversation a little less stilted. They still have to be programmed in with parameters that narrow the conversation topics to enhance gameplay and the storyline. It's not like you should be able to discuss Einstein's Special Theory of Relativity with a butcher in 16th century Scotland.
Just like pretty much any mature technology (and not just video games).
I expect to see plenty of memes when this feature first rolls out.
Another example is how gyros in smartphones are underutilized in games, there was a time when it felt like almost every game had motion controls nowadays almost none have it.
There are appropriate and very useful implementations for things like gyroscopic control. It's up to the developers to choose these technologies wisely. With new technologies there are learning curves, both for the developer and the end user.
I've tried VR in various forms over 25 years and I currently own an Oculus Rift S VR HMD. There are pros and cons with this technology and a lot of it has to do with how it is implemented.
For sure there will be some developers who put this ML speech NPC stuff in game titles where it doesn't improve the game one bit. But at some point, someone will put out something where people will say "That's pretty neat, I wish ____ had this."
Kinect failed because it both had no good games and had many technical limitations Not because people didn't want more interactive games. Ditto goes for the PS Move and Sixaxis, which were downright aweful. The PS Move in particular is one of the biggest complaints people had with the PSVR1. The lighthouse system created by Valve was and Oculus's camera based systems were / are vastly superior. /facepalm
"A gyroscope is a device used for measuring or maintaining orientation and angular velocity."
en.wikipedia.org/wiki/Gyroscope
It's impossible to track an object in 3D space with a gyro alone as you imply, hence why not a single company from Oculus to Valve to Apple does it. A gyroscope can assist a tracking system if it's bad at rotational velocity but the primary tracking will be done through cameras (oculus), IR (Wii), lasers (valve), or any other technology capable of tracking an object in 3D space.
Nintendo's motion controls succeeded because like most Nintendo consoles, people had fun playing that games. I don't understand why you hate motion controls to the point where you have to go and try to rewrite history. VR sales hit 38 million this year, a increase from 24 million last year. On top of that Apple is releasing a mixed reality headset. Clearly there is potential for the technology.
If you don't understand what motion controls are, you should not be in this argument in the first place.
Inertial guidance uses GPS, accelerometers, and gyros to dead reckon the position of an object. The problem is that it is not accurate enough to be used for motion controls, hence why not a single vendors uses it nor is it considered a viable motion control system.
"Even the best accelerometers, with a standard error of 10 micro-g, would accumulate a 50-meter (164-ft) error within 17 minutes."
en.wikipedia.org/wiki/Inertial_navigation_system
It's infeasible even with equipment that is far more costly than is reasonable for consumer products. It should go without saying that 164 ft drift is not acceptable. Not even 1/200th of that figure is acceptable for motion controls.
Often times inertial guidance systems are supplemented by barometric altimeters and magnetic sensors to offset these inaccuracies. Even still drift is far too significant a problem for this system to be remotely feasible for workable motion controls. Hence why consoles like the Wii and Wii U primarily relied on IR sensors with other instruments improving the accuracy. That's considering that the Wii and Wii U's tracking it relatively primitive compared to lighthouse or Meta's Camera Array, both of which offer far superior accuracy. Phone orientation tracking, shake detection, or other basic gesture tracking is not motion controls. Their capability and precision is much lower than what could be feasibly used to control games or apps to a point where it would be a pleasant experience. Phone apps don't utilize motion controls because the phone doesn't support them, at least not to degree that motion controls are defined on Wikipeida, which is really a baseline for them to be useful to apps / the end user.
Here's wikipedia's definition: Phones do have accelerometers and other sensors aka gyros, to track motion and provide input to games.