News Posts matching #LLM

Return to Keyword Browsing

AMD Introduces GAIA - an Open-Source Project That Runs Local LLMs on Ryzen AI NPUs

AMD has launched a new open-source project called, GAIA (pronounced /ˈɡaɪ.ə/), an awesome application that leverages the power of Ryzen AI Neural Processing Unit (NPU) to run private and local large language models (LLMs). In this blog, we'll dive into the features and benefits of GAIA, while introducing how you can take advantage of GAIA's open-source project to adopt into your own applications.

Introduction to GAIA
GAIA is a generative AI application designed to run local, private LLMs on Windows PCs and is optimized for AMD Ryzen AI hardware (AMD Ryzen AI 300 Series Processors). This integration allows for faster, more efficient processing - i.e. lower power- while keeping your data local and secure. On Ryzen AI PCs, GAIA interacts with the NPU and iGPU to run models seamlessly by using the open-source Lemonade (LLM-Aid) SDK from ONNX TurnkeyML for LLM inference. GAIA supports a variety of local LLMs optimized to run on Ryzen AI PCs. Popular models like Llama and Phi derivatives can be tailored for different use cases, such as Q&A, summarization, and complex reasoning tasks.

LG Develops Custom "EXAONE Deep" State of The Art Reasoning AI Model

LG AI Research, yes, that LG, which develops consumer electronics, has launched EXAONE Deep, a high-performance reasoning AI that demonstrates exceptional capabilities in mathematical logic, scientific concepts, and programming challenges despite its relatively small parameter count. The flagship 32B model achieves performance metrics rivaling substantially larger models like GPT-4o and DeepSeek R1. In contrast, the 7.8B and 2.4B variants establish new benchmarks in the lightweight and on-device AI categories. The EXAONE Deep 32B model registered a 94.5 score on the CSAT 2025 Mathematics section and 90.0 on the AIME 2024, outperforming competing models while requiring only 5% of the computational resources of larger alternatives like DeepSeek-R1 (671B).

In scientific reasoning, it achieved a 66.1 score on the GPQA Diamond test, which evaluates PhD-level problem-solving capabilities across physics, chemistry, and biology. The model's 83.0 score on MMLU establishes it as the highest-performing domestically developed model in South Korea, showing LG's approach to creating efficient, high-performance AI systems. Particularly notable is the performance of the smaller variants: the 7.8B model scored 94.8 in MATH-500 and 59.6 in AIME 2025, while the 2.4B model achieved 92.3 in MATH-500 and 47.9 in AIME 2024. These results position EXAONE Deep's smaller models at the top of their respective categories across all major benchmarks, suggesting significant potential for deployment in resource-constrained environments. With sizes of up to 32 billion parameters, it will excel in single-GPU deployments. Interestingly, these models can run on a range of discrete GPUs, laptop GPUs, and some edge systems that don't have massive computational power.

Tencent Will Launch Hunyuan T1 Inference Model on March 21

Tencent's large language model (LLM) specialist division has announced the imminent launch of their T1 AI inference model. The Chinese technology giant's Hunyuan social media accounts revealed a grand arrival, scheduled to take place on Friday (March 21). A friendly reminder was issued to interested parties, regarding the upcoming broadcast/showcase: "please set aside your valuable time. Let's step into T1 together." Earlier in the week, the Tencent AI team started to tease their "first ultra-large Mamba-powered reasoning model." Local news reports have highlighted Hunyuan's claim of Mamba architecture being applied losslessly to a super-large Mixture of Experts (MoE) model.

Late last month, the company released its Hunyuan Turbo S AI model—advertised as offering faster replies than DeepSeek's R1 system. Tencent's plucky solution has quickly climbed up the Chatbot Arena LLM Leaderboard. The Hunyuan team was in a boastful mood earlier today, and loudly proclaimed that their proprietary Turbo S model had charted in fifteenth place. At the time of writing, DeepSeek R1 is ranked seventh on the leaderboard. As explained by ITHome, this community-driven platform is driven by users interactions: "with multiple models anonymously, voting to decide which model is better, and then generating a ranking list based on the scores. This kind of evaluation is also seen as an arena for big models to compete directly, which is simple and direct."

Phison Expands aiDAPTIV+ GPU Memory Extension Capabilities

Phison Electronics (8299TT), a leading innovator in NAND flash technologies, today announced an array of expanded capabilities on aiDAPTIV+, the affordable AI training and inferencing solution for on-premises environments. aiDAPTIV+ will be integrated into a ML-series Maingear laptop, the first AI laptop PC capable of LLMOps, utilizing NVIDIA GPUs and available for concept demonstration and registration this week at NVIDIA GTC 2025. Customers will be able to fine-tune Large Language Models (LLMs) up to 8 billion parameters using their own data.

Phison also expanded aiDAPTIV+ capabilities to run on edge computing devices powered by the NVIDIA Jetson platform, for enhanced generative AI inference at the edge and robotics deployments. With today's announcement, new and current aiDAPTIV+ users can look forward to the new aiDAPTIVLink 3.0 middleware, which will provide faster Time to First Token (TTFT) recall and extend the token length for greater context, improving inferencing performance and accuracy. These expansions will unlock access for users ranging from university students and AI industry professionals learning to train LLMs, or researchers uncovering deeper insights within their own data using a PC, all the way to manufacturing engineers automating factory floor enhancements via edge devices.

ASUS Introduces New "AI Cache Boost" BIOS Feature - R&D Team Claims Performance Uplift

Large language models (LLMs) love large quantities of memory—so much so, in fact, that AI enthusiasts are turning to multi-GPU setups to make even more VRAM available for their AI apps. But since many current LLMs are extremely large, even this approach has its limits. At times, the GPU will decide to make use of CPU processing power for this data, and when it does, the performance of your CPU cache and DRAM comes into play. All this means that when it comes to the performance of AI applications, it's not just the GPU that matters, but the entire pathway that connects the GPU to the CPU to the I/O die to the DRAM modules. It stands to reason, then, that there are opportunities to boost AI performance by optimizing these elements.

That's exactly what we've found as we've spent time in our R&D labs with the latest AMD Ryzen CPUs. AMD just launched two new Ryzen CPUs with AMD 3D V-Cache Technology, the AMD Ryzen 9 9950X3D and Ryzen 9 9900X3D, pushing the series into new performance territory. After testing a wide range of optimizations in a variety of workloads, we uncovered a range of settings that offer tangible benefits for AI enthusiasts. Now, we're ready to share these optimizations with you through a new BIOS feature: AI Cache Boost. Available through an ASUS AMD 800 Series motherboard and our most recent firmware update, AI Cache Boost can accelerate performance up to 12.75% when you're working with massive LLMs.

AMD's Ryzen AI MAX+ 395 Delivers up to 12x AI LLM Performance Compared to Intel's "Lunar Lake"

AMD's latest flagship APU, the Ryzen AI MAX+ 395 "Strix Halo," demonstrates some impressive performance advantages over Intel's "Lunar Lake" processors in large language model (LLM) inference workloads, according to recent benchmarks on AMD's blog. Featuring 16 Zen 5 CPU cores, 40 RDNA 3.5 compute units, and over 50 AI TOPS via its XDNA 2 NPU, the processor achieves up to 12.2x faster response times than Intel's Core Ultra 258V in specific LLM scenarios. Notably, Intel's Lunar Lake has four E-cores and four P-cores, which in total is half of the Ryzen AI MAX+ 395 CPU core count, but the performance difference is much more pronounced than the 2x core gap. The performance delta becomes even more notable with model complexity, particularly with 14-billion parameter models approaching the limit of what standard 32 GB laptops can handle.

In LM Studio benchmarks using an ASUS ROG Flow Z13 with 64 GB unified memory, the integrated Radeon 8060S GPU delivered 2.2x higher token throughput than Intel's Arc 140V across various model architectures. Time-to-first-token metrics revealed a 4x advantage in smaller models like Llama 3.2 3B Instruct, expanding to 9.1x with 7-8B parameter models such as DeepSeek R1 Distill variants. AMD's architecture particularly excels in multimodal vision tasks, where the Ryzen AI MAX+ 395 processed complex visual inputs up to 7x faster in IBM Granite Vision 3.2 3B and 6x faster in Google Gemma 3 12B compared to Intel's offering. The platform's support for AMD Variable Graphics Memory allows allocating up to 96 GB as VRAM from systems equipped with 128 GB unified memory, enabling the deployment of state-of-the-art models like Google Gemma 3 27B Vision. The processor's performance advantages extend to practical AI applications, including medical image analysis and coding assistance via higher-precision 6-bit quantization in the DeepSeek R1 Distill Qwen 32B model.

MSI GeForce RTX 50 Laptops Are Prepped for High-end Gaming & Local AI Applications

MSI's latest high-end gaming laptops, including the Titan 18 HX AI, Raider 18 HX AI, and Vector 16 HX AI, feature Intel Core Ultra 200 HX series CPUs and NVIDIA RTX 50 series GPUs, while Raider A18 HX and Vector A18 HX run on AMD Ryzen 9000 series. With NVIDIA's last major GPU upgrade in over two years, these laptops deliver top-tier performance for ultra-high-resolution gaming. Beyond gaming, MSI's Titan, Raider, Vector, and Stealth series excel in AI applications, particularly Small Language Models (SLM), making them ideal for both gaming and AI-driven tasks.

Next-Gen GPUs: A Breakthrough for AI Applications
NVIDIA's latest RTX 50 series GPUs, built on the cutting-edge Blackwell architecture, introduce 5th-generation Tensor Cores, 4th-generation RT Cores, and Neural Rendering technology for the first time. With expanded memory capacity and GDDR7, these GPUs optimize AI-enhanced neural computations, reducing memory usage while boosting graphics rendering and AI processing efficiency. This results in unmatched performance for both gaming and creative workloads, enabling smoother, more efficient execution of complex tasks.

Global Top 10 IC Design Houses See 49% YoY Growth in 2024, NVIDIA Commands Half the Market

TrendForce reveals that the combined revenue of the world's top 10 IC design houses reached approximately US$249.8 billion in 2024, marking a 49% YoY increase. The booming AI industry has fueled growth across the semiconductor sector, with NVIDIA leading the charge, posting an astonishing 125% revenue growth, widening its lead over competitors, and solidifying its dominance in the IC industry.

Looking ahead to 2025, advancements in semiconductor manufacturing will further enhance AI computing power, with LLMs continuing to emerge. Open-source models like DeepSeek could lower AI adoption costs, accelerating AI penetration from servers to personal devices. This shift positions edge AI devices as the next major growth driver for the semiconductor industry.

Niantic Offloads Games Division to Scopely - Deal Valued at $3.5 Billion

We're announcing changes at Niantic that will set us on a bold new course. Nearly a decade ago, we spun out as a small team from Google with a bold vision: to use technology to overlay the world with rich digital experiences. Our goal: to inspire people to explore their surroundings and foster real-world connections, especially at a time when relationships were becoming increasingly digital. To bring this mission and technology to life, we started building games; today, more than 100 million people play our games annually, with more than a billion friend connections made across the world.

People have discovered their neighborhoods, explored new places, and moved more than 30 billion miles. They've also come together at our live events - where everyone is a participant, not just a spectator—contributing over a billion dollars in economic impact in the cities that host them. As we grew, the company naturally evolved along two complementary paths - one focused on creating games and bringing them to the world, and the other dedicated to advancing augmented reality, artificial intelligence, and geospatial technology. Meanwhile, the rapid progress in AI reinforces our belief in the future of geospatial computing to unlock new possibilities for both consumer experiences and enterprise applications. At the same time, we remain committed to creating "forever games" that will last for generations.

GIGABYTE Showcases Future-Ready AI and HPC Technologies for High-Efficiency Computing at SCA 2025

Giga Computing, a subsidiary of GIGABYTE and a pioneer in AI-driven enterprise computing, is set to make a significant impact at Supercomputing Asia 2025 (SCA25) in Singapore (March 11-13). At booth #D5, GIGABYTE showcases its latest advancements in liquid cooling, solutions for AI training and high-performance computing (HPC). The booth highlights GIGABYTE's innovative technology and comprehensive direct liquid cooling (DLC) strategies, reinforcing its commitment to energy-efficient, high-performance computing.

Revolutionizing AI Training with DLC
A key highlight of GIGABYTE's showcase is the NVIDIA HGX H200 platform, a next-generation solution for AI workloads. GIGABYTE is presenting both its liquid-cooled G4L3-SD1 server and its air-cooled G893 series, providing businesses with advanced cooling solutions tailored for high-performance demands. The G4L3-SD1 server, equipped with CoolIT Systems' cold plates, effectively cools Intel Xeon CPUs and eight NVIDIA H200 GPUs, ensuring optimal performance with enhanced energy efficiency.

Jio Platforms Limited Along with AMD, Cisco, and Nokia Unveil Plans for Open Telecom AI Platform at MWC 2025

Jio Platforms Limited (JPL), together with AMD, Cisco, and Nokia, announced at Mobile World Congress 2025 plans to form an innovative, new Open Telecom AI Platform. Designed to support today's operators and service providers with real-world, AI-driven solutions, the Telecom AI Platform is set to drive unprecedented efficiency, security, capabilities, and new revenue opportunities for the service provider industry.

End-to-end Network Intelligence
Fueled by the collective expertise of world leaders from across domains including RAN, Routing, AI Data Center, Security and Telecom, the Telecom AI Platform will create a new central intelligence layer for telecom and digital services. This multi-domain intelligence framework will integrate AI and automation into every layer of network operations.

OpenAI Has "Run Out of GPUs" - Sam Altman Mentions Incoming Delivery of "Tens of Thousands"

Yesterday, OpenAI introduced its "strongest" GPT-4.5 model. A research preview build is only available to paying customers—Pro-tier subscribers fork out $200 a month for early access privileges. The non-profit organization's CEO shared an update via social media post; complete with a "necessary" hyping up of version 4.5: "it is the first model that feels like talking to a thoughtful person to me. I have had several moments where I've sat back in my chair and been astonished at getting actual good advice from an AI." There are apparent performance caveats—Sam Altman proceeded to add a short addendum: "this isn't a reasoning model and won't crush benchmarks. It's a different kind of intelligence, and there's a magic to it (that) I haven't felt before. Really excited for people to try it!" OpenAI had plans to make GPT-4.5 available to its audience of "Plus" subscribers, but major hardware shortages have delayed a roll-out to the $20 per month tier.

Altman disclosed his personal disappointment: "bad news: it is a giant, expensive model. We really wanted to launch it to Plus and Pro (customers) at the same time, but we've been growing a lot and are out of GPUs. We will add tens of thousands of GPUs next week, and roll it out to the plus tier then...Hundreds of thousands coming soon, and I'm pretty sure y'all will use every one we can rack up." Insiders believe that OpenAI is finalizing a proprietary AI-crunching solution, but a rumored mass production phase is not expected to kick-off until 2026. In the meantime, Altman & Co. are still reliant on NVIDIA for new shipments of AI GPUs. Despite being a very important customer, OpenAI is reportedly not satisfied about the "slow" flow of Team Green's latest DGX B200 and DGX H200 platforms into server facilities. Several big players are developing in-house designs, in an attempt to ween themselves off prevalent NVIDIA technologies.

NVIDIA & Partners Will Discuss Supercharging of AI Development at GTC 2025

Generative AI is redefining computing, unlocking new ways to build, train and optimize AI models on PCs and workstations. From content creation and large and small language models to software development, AI-powered PCs and workstations are transforming workflows and enhancing productivity. At GTC 2025, running March 17-21 in the San Jose Convention Center, experts from across the AI ecosystem will share insights on deploying AI locally, optimizing models and harnessing cutting-edge hardware and software to enhance AI workloads—highlighting key advancements in RTX AI PCs and workstations.

Develop and Deploy on RTX
RTX GPUs are built with specialized AI hardware called Tensor Cores that provide the compute performance needed to run the latest and most demanding AI models. These high-performance GPUs can help build digital humans, chatbots, AI-generated podcasts and more. With more than 100 million GeForce RTX and NVIDIA RTX GPUs users, developers have a large audience to target when new AI apps and features are deployed. In the session "Build Digital Humans, Chatbots, and AI-Generated Podcasts for RTX PCs and Workstations," Annamalai Chockalingam, senior product manager at NVIDIA, will showcase the end-to-end suite of tools developers can use to streamline development and deploy incredibly fast AI-enabled applications.

Micron Unveils Its First PCIe Gen5 NVMe High-Performance Client SSD

Micron Technology, Inc., today announced the Micron 4600 PCIe Gen 5 NVMe SSD, an innovative client storage drive for OEMs that is designed to deliver exceptional performance and user experience for gamers, creators and professionals. Leveraging Micron G9 TLC NAND, the 4600 SSD is Micron's first Gen 5 client SSD and doubles the performance of its predecessor.

The Micron 4600 SSD showcases sequential read speeds of 14.5 GB/s and write speeds of 12.0 GB/s. These capabilities allow users to load a large language model (LLM) from the SSD to DRAM in less than one second, enhancing the user experience with AI PCs. For AI model loading times, the 4600 SSD reduces load times by up to 62% compared to Gen 4 performance SSDs ensuring rapid deployment of LLMs and other AI workloads. Additionally, the 4600 SSD provides up to 107% improved energy efficiency (MB/s per watt) compared to Gen 4 performance SSDs, enhancing battery life and overall system efficiency.

AMD & Nexa AI Reveal NexaQuant's Improvement of DeepSeek R1 Distill 4-bit Capabilities

Nexa AI, today, announced NexaQuants of two DeepSeek R1 Distills: The DeepSeek R1 Distill Qwen 1.5B and DeepSeek R1 Distill Llama 8B. Popular quantization methods like the llama.cpp based Q4 K M allow large language models to significantly reduce their memory footprint and typically offer low perplexity loss for dense models as a tradeoff. However, even low perplexity loss can result in a reasoning capability hit for (dense or MoE) models that use Chain of Thought traces. Nexa AI has stated that NexaQuants are able to recover this reasoning capability loss (compared to the full 16-bit precision) while keeping the 4-bit quantization and all the while retaining the performance advantage. Benchmarks provided by Nexa AI can be seen below.

We can see that the Q4 K M quantized DeepSeek R1 distills score slightly less (except for the AIME24 bench on Llama 3 8b distill, which scores significantly lower) in LLM benchmarks like GPQA and AIME24 compared to their full 16-bit counter parts. Moving to a Q6 or Q8 quantization would be one way to fix this problem - but would result in the model becoming slightly slower to run and requiring more memory. Nexa AI has stated that NexaQuants use a proprietary quantization method to recover the loss while keeping the quantization at 4-bits. This means users can theoretically get the best of both worlds: accuracy and speed.

Moore Threads Teases Excellent Performance of DeepSeek-R1 Model on MTT GPUs

Moore Threads, a Chinese manufacturer of proprietary GPU designs is (reportedly) the latest company to jump onto the DeepSeek-R1 bandwagon. Since late January, NVIDIA, Microsoft and AMD have swooped in with their own interpretations/deployments. By global standards, Moore Threads GPUs trail behind Western-developed offerings—early 2024 evaluations presented the firm's MTT S80 dedicated desktop graphics card struggling against an AMD integrated solution: Radeon 760M. The recent emergence of DeepSeek's open source models has signalled a shift away from reliance on extremely powerful and expensive AI-crunching hardware (often accessed via the cloud)—widespread excitement has been generated by DeepSeek solutions being relatively frugal, in terms of processing requirements. Tom's Hardware has observed cases of open source AI models running (locally) on: "inexpensive hardware, like the Raspberry Pi."

According to recent Chinese press coverage, Moore Threads has announced a successful deployment of DeepSeek's R1-Distill-Qwen-7B distilled model on the aforementioned MTT S80 GPU. The company also revealed that it had taken similar steps with its MTT S4000 datacenter-oriented graphics hardware. On the subject of adaptation, a Moore Threads spokesperson stated: "based on the Ollama open source framework, Moore Threads completed the deployment of the DeepSeek-R1-Distill-Qwen-7B distillation model and demonstrated excellent performance in a variety of Chinese tasks, verifying the versatility and CUDA compatibility of Moore Threads' self-developed full-featured GPU." Exact performance figures, benchmark results and technical details were not disclosed to the Chinese public, so Moore Threads appears to be teasing the prowess of its MTT GPU designs. ITHome reported that: "users can also perform inference deployment of the DeepSeek-R1 distillation model based on MTT S80 and MTT S4000. Some users have previously completed the practice manually on MTT S80." Moore Threads believes that its: "self-developed high-performance inference engine, combined with software and hardware co-optimization technology, significantly improves the model's computing efficiency and resource utilization through customized operator acceleration and memory management. This engine not only supports the efficient operation of the DeepSeek distillation model, but also provides technical support for the deployment of more large-scale models in the future."

ASUS AI POD With NVIDIA GB200 NVL72 Platform Ready to Ramp-Up Production for Scheduled Shipment in March

ASUS is proud to announce that ASUS AI POD, featuring the NVIDIA GB200 NVL72 platform, is ready to ramp-up production for a scheduled shipping date of March 2025. ASUS remains dedicated to providing comprehensive end-to-end solutions and software services, encompassing everything from AI supercomputing to cloud services. With a strong focus on fostering AI adoption across industries, ASUS is positioned to empower clients in accelerating their time to market by offering a full spectrum of solutions.

Proof of concept, funded by ASUS
Honoring the commitment to delivering exceptional value to clients, ASUS is set to launch a proof of concept (POC) for the groundbreaking ASUS AI POD, powered by the NVIDIA Blackwell platform. This exclusive opportunity is now open to a select group of innovators who are eager to harness the full potential of AI computing. Innovators and enterprises can experience firsthand the full potential of AI and deep learning solutions at exceptional scale. To take advantage of this limited-time offer, please complete this surveyi at: forms.office.com/r/FrAbm5BfH2. The expert ASUS team of NVIDIA GB200 specialists will guide users through the next steps.

NVIDIA GeForce RTX 50 Series AI PCs Accelerate DeepSeek Reasoning Models

The recently released DeepSeek-R1 model family has brought a new wave of excitement to the AI community, allowing enthusiasts and developers to run state-of-the-art reasoning models with problem-solving, math and code capabilities, all from the privacy of local PCs. With up to 3,352 trillion operations per second of AI horsepower, NVIDIA GeForce RTX 50 Series GPUs can run the DeepSeek family of distilled models faster than anything on the PC market.

A New Class of Models That Reason
Reasoning models are a new class of large language models (LLMs) that spend more time on "thinking" and "reflecting" to work through complex problems, while describing the steps required to solve a task. The fundamental principle is that any problem can be solved with deep thought, reasoning and time, just like how humans tackle problems. By spending more time—and thus compute—on a problem, the LLM can yield better results. This phenomenon is known as test-time scaling, where a model dynamically allocates compute resources during inference to reason through problems. Reasoning models can enhance user experiences on PCs by deeply understanding a user's needs, taking actions on their behalf and allowing them to provide feedback on the model's thought process—unlocking agentic workflows for solving complex, multi-step tasks such as analyzing market research, performing complicated math problems, debugging code and more.

DeepSeek-R1 Goes Live on NVIDIA NIM

DeepSeek-R1 is an open model with state-of-the-art reasoning capabilities. Instead of offering direct responses, reasoning models like DeepSeek-R1 perform multiple inference passes over a query, conducting chain-of-thought, consensus and search methods to generate the best answer. Performing this sequence of inference passes—using reason to arrive at the best answer—is known as test-time scaling. DeepSeek-R1 is a perfect example of this scaling law, demonstrating why accelerated computing is critical for the demands of agentic AI inference.

As models are allowed to iteratively "think" through the problem, they create more output tokens and longer generation cycles, so model quality continues to scale. Significant test-time compute is critical to enable both real-time inference and higher-quality responses from reasoning models like DeepSeek-R1, requiring larger inference deployments. R1 delivers leading accuracy for tasks demanding logical inference, reasoning, math, coding and language understanding while also delivering high inference efficiency.

KIOXIA Releases AiSAQ as Open-Source Software to Reduce DRAM Needs in AI Systems

Kioxia Corporation, a world leader in memory solutions, today announced the open-source release of its new All-in-Storage ANNS with Product Quantization (AiSAQ) technology. A novel "approximate nearest neighbor" search (ANNS) algorithm optimized for SSDs, KIOXIA AiSAQ software delivers scalable performance for retrieval-augmented generation (RAG) without placing index data in DRAM - and instead searching directly on SSDs.

Generative AI systems demand significant compute, memory and storage resources. While they have the potential to drive transformative breakthroughs across various industries, their deployment often comes with high costs. RAG is a critical phase of AI that refines large language models (LLMs) with data specific to the company or application.

NVIDIA Outlines Cost Benefits of Inference Platform

Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform—a full stack comprising world-class silicon, systems and software—is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost. NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.

AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience. But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system—and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task. Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.

NVIDIA AI Helps Fight Against Fraud Across Many Sectors

Companies and organizations are increasingly using AI to protect their customers and thwart the efforts of fraudsters around the world. Voice security company Hiya found that 550 million scam calls were placed per week in 2023, with INTERPOL estimating that scammers stole $1 trillion from victims that same year. In the U.S., one of four noncontact-list calls were flagged as suspected spam, with fraudsters often luring people into Venmo-related or extended warranty scams.

Traditional methods of fraud detection include rules-based systems, statistical modeling and manual reviews. These methods have struggled to scale to the growing volume of fraud in the digital era without sacrificing speed and accuracy. For instance, rules-based systems often have high false-positive rates, statistical modeling can be time-consuming and resource-intensive, and manual reviews can't scale rapidly enough.

Seagate Anticipates Cloud Storage Growth due to AI-Driven Data Creation

According to a recent, global Recon Analytics survey commissioned by Seagate Technology, business leaders from across 15 industry sectors and 10 countries expect that adoption of artificial intelligence (AI) applications will generate unprecedented volumes of data, driving a boom in demand for data storage, in particular cloud-based storage. With hard drives delivering scalability relative to terabyte-per-dollar cost efficiencies, cloud service providers rely on hard drives to store mass quantities of data.

Recently, analyst firm IDC estimated that 89% of data stored by leading cloud service providers is stored on hard drives. Now, according to this Recon Analytics study, nearly two-thirds of respondents (61%) from companies that use cloud as their leading storage medium expect their cloud-based storage to grow by more than 100% over the next 3 years. "The survey results generally point to a coming surge in demand for data storage, with hard drives emerging as the clear winner," remarked Roger Entner, founder and lead analyst of Recon Analytics. "When you consider that the business leaders we surveyed intend to store more and more of this AI-driven data in the cloud, it appears that cloud services are well-positioned to ride a second growth wave."

NVIDIA NeMo AI Guardrails Upgraded with Latest NIM Microservices

AI agents are poised to transform productivity for the world's billion knowledge workers with "knowledge robots" that can accomplish a variety of tasks. To develop AI agents, enterprises need to address critical concerns like trust, safety, security and compliance. New NVIDIA NIM microservices for AI guardrails—part of the NVIDIA NeMo Guardrails collection of software tools—are portable, optimized inference microservices that help companies improve the safety, precision and scalability of their generative AI applications.

Central to the orchestration of the microservices is NeMo Guardrails, part of the NVIDIA NeMo platform for curating, customizing and guardrailing AI. NeMo Guardrails helps developers integrate and manage AI guardrails in large language model (LLM) applications. Industry leaders Amdocs, Cerence AI and Lowe's are among those using NeMo Guardrails to safeguard AI applications. Developers can use the NIM microservices to build more secure, trustworthy AI agents that provide safe, appropriate responses within context-specific guidelines and are bolstered against jailbreak attempts. Deployed in customer service across industries like automotive, finance, healthcare, manufacturing and retail, the agents can boost customer satisfaction and trust.

Aetina & Qualcomm Collaborate on Flagship MegaEdge AIP-FR68 Edge AI Solution

Aetina, a leading provider of edge AI solutions and a subsidiary of Innodisk Group, today announced a collaboration with Qualcomm Technologies, Inc., who unveiled a revolutionary Qualcomm AI On-Prem Appliance Solution and Qualcomm AI Inference Suite for On-Prem. This collaboration combines Qualcomm Technologies' cutting-edge inference accelerators and advanced software with Aetina's edge computing hardware to deliver unprecedented computing power and ready-to-use AI applications for enterprises and industrial organizations.

The flagship offering, the Aetina MegaEdge AIP-FR68, sets a new industry benchmark by integrating Qualcomm Cloud AI family of accelerator cards. Each Cloud AI 100 Ultra card delivers an impressive 870 TOPS of AI computing power at 8-bit integer (INT8) while maintaining remarkable energy efficiency at just 150 W power consumption. The system supports dual Cloud AI 100 Ultra cards in a single desktop workstation. This groundbreaking combination of power and efficiency in a compact form factor revolutionizes on-premises AI processing, making enterprise-grade computing more accessible than ever.
Return to Keyword Browsing
Mar 25th, 2025 07:01 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts