News Posts matching #Language

Return to Keyword Browsing

Ubisoft Exploring Generative AI, Could Revolutionize NPC Narratives

Have you ever dreamed of having a real conversation with an NPC in a video game? Not just one gated within a dialogue tree of pre-determined answers, but an actual conversation, conducted through spontaneous action and reaction? Lately, a small R&D team at Ubisoft's Paris studio, in collaboration with Nvidia's Audio2Face application and Inworld's Large Language Model (LLM), have been experimenting with generative AI in an attempt to turn this dream into a reality. Their project, NEO NPC, uses GenAI to prod at the limits of how a player can interact with an NPC without breaking the authenticity of the situation they are in, or the character of the NPC itself.

Considering that word—authenticity—the project has had to be a hugely collaborative effort across artistic and scientific disciplines. Generative AI is a hot topic of conversation in the videogame industry, and Senior Vice President of Production Technology Guillemette Picard is keen to stress that the goal behind all genAI projects at Ubisoft is to bring value to the player; and that means continuing to focus on human creativity behind the scenes. "The way we worked on this project, is always with our players and our developers in mind," says Picard. "With the player in mind, we know that developers and their creativity must still drive our projects. Generative AI is only of value if it has value for them."

Groq LPU AI Inference Chip is Rivaling Major Players like NVIDIA, AMD, and Intel

AI workloads are split into two different categories: training and inference. While training requires large computing and memory capacity, access speeds are not a significant contributor; inference is another story. With inference, the AI model must run extremely fast to serve the end-user with as many tokens (words) as possible, hence giving the user answers to their prompts faster. An AI chip startup, Groq, which was in stealth mode for a long time, has been making major moves in providing ultra-fast inference speeds using its Language Processing Unit (LPU) designed for large language models (LLMs) like GPT, Llama, and Mistral LLMs. The Groq LPU is a single-core unit based on the Tensor-Streaming Processor (TSP) architecture which achieves 750 TOPS at INT8 and 188 TeraFLOPS at FP16, with 320x320 fused dot product matrix multiplication, in addition to 5,120 Vector ALUs.

Having massive concurrency with 80 TB/s of bandwidth, the Groq LPU has 230 MB capacity of local SRAM. All of this is working together to provide Groq with a fantastic performance, making waves over the past few days on the internet. Serving the Mixtral 8x7B model at 480 tokens per second, the Groq LPU is providing one of the leading inference numbers in the industry. In models like Llama 2 70B with 4096 token context length, Groq can serve 300 tokens/s, while in smaller Llama 2 7B with 2048 tokens of context, Groq LPU can output 750 tokens/s. According to the LLMPerf Leaderboard, the Groq LPU is beating the GPU-based cloud providers at inferencing LLMs Llama in configurations of anywhere from 7 to 70 billion parameters. In token throughput (output) and time to first token (latency), Groq is leading the pack, achieving the highest throughput and second lowest latency.

Apple Wants to Store LLMs on Flash Memory to Bring AI to Smartphones and Laptops

Apple has been experimenting with Large Language Models (LLMs) that power most of today's AI applications. The company wants these LLMs to serve the users best and deliver them efficiently, which is a difficult task as they require a lot of resources, including compute and memory. Traditionally, LLMs have required AI accelerators in combination with large DRAM capacity to store model weights. However, Apple has published a paper that aims to bring LLMs to devices with limited memory capacity. By storing LLMs on NAND flash memory (regular storage), the method involves constructing an inference cost model that harmonizes with the flash memory behavior, guiding optimization in two critical areas: reducing the volume of data transferred from flash and reading data in larger, more contiguous chunks. Instead of storing the model weights on DRAM, Apple wants to utilize flash memory to store weights and only pull them on-demand to DRAM once it is needed.

Two principal techniques are introduced within this flash memory-informed framework: "windowing" and "row-column bundling." These methods collectively enable running models up to twice the size of the available DRAM, with a 4-5x and 20-25x increase in inference speed compared to native loading approaches on CPU and GPU, respectively. Integrating sparsity awareness, context-adaptive loading, and a hardware-oriented design pave the way for practical inference of LLMs on devices with limited memory, such as SoCs with 8/16/32 GB of available DRAM. Especially with DRAM prices outweighing NAND Flash, setups such as smartphone configurations could easily store and inference LLMs with multi-billion parameters, even if the DRAM available isn't sufficient. For a more technical deep dive, read the paper on arXiv here.

Google Bard Available Across the EU, Updated with 40 Languages & Spoken Response Function

Google has notified the world about its AI chatbot, Bard, getting a wider release and new features—with a rollout across Europe (27 territories), plus the addition of Brazil: "Today we're announcing Bard's biggest expansion to date. It's now available in most of the world, and in the most widely spoken languages. And we're launching new features to help you better customize your experience, boost your creativity and get more done." Their updated system is available now, so users "can collaborate with Bard in over 40 languages." A spoken response function has been implemented which is advertised as being very "helpful if you want to hear the correct pronunciation of a word or listen to a poem or script. Simply enter a prompt and select the sound icon to hear Bard's answers."

Jack Krawczyk, Bard Product Lead, and Amarnag Subramanya, Bard's VP of Engineering made sure to mention that Google is covering its bases, since privacy issues have delayed Bard's ability to reach new places (now mostly in the past): "As part of our bold and responsible approach to AI, we've proactively engaged with experts, policymakers and privacy regulators on this expansion. And as we bring Bard to more regions and languages over time, we'll continue to use our AI Principles as a guide, incorporate user feedback, and take steps to protect people's privacy and data." The initial "trial" period was restricted to the USA and UK, when Google launched Bard back in March.

Team Xbox Celebrates Disability Pride Month

This July, as part of Disability Pride Month, Team Xbox proudly celebrates players, creators, and community members with disabilities. More than 400 million video game players have disabilities worldwide, and we recognize the incredible contributions the gaming and disability community has made in making Team Xbox, and the broader gaming industry, more inclusive and welcoming for everyone.

Disability Pride holds a special place in my heart, as I am not only a Program Manager on our Gaming Accessibility Team, but also a person with disabilities. Most people wouldn't think of me as having a disability at first glance. In fact, I didn't know I had disabilities until I was in my 20's when I was diagnosed as being neurodiverse. Now I know that I have had Obsessive Compulsive Disorder and Sensory Processing Disorder since I was a young child. And, as the years have gone by, I've acquired new disabilities due to illness, injury, and trauma. Chronic pain is now part of my life, as is hearing loss, and anxiety and depression related to complex post-traumatic stress disorder.

Linux Foundation Launches New TLA+ Organization

SAN FRANCISCO, April 21, 2023 -- The Linux Foundation, the nonprofit organization enabling mass innovation through open source, today announced the launch of the TLA+ Foundation to promote the adoption and development of the TLA+ programming language and its community of TLA+ practitioners. Inaugural members include Amazon Web Services (AWS), Oracle and Microsoft. TLA+ is a high-level language for modeling programs and systems, especially concurrent and distributed ones. TLA+ has been successfully used by companies to verify complex software systems, reducing errors and improving reliability. The language helps detect design flaws early in the development process, saving time and resources.

TLA+ and its tools are useful for eliminating fundamental design errors, which are hard to find and expensive to correct in code. The language is based on the idea that the best way to describe things precisely is with simple mathematics. The language was invented decades ago by the pioneering computer scientist Leslie Lamport, now a distinguished scientist with Microsoft Research. After years of Lamport's stewardship and Microsoft's support, TLA+ has found a new home at the Linux Foundation.
Return to Keyword Browsing
May 1st, 2024 02:56 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts