News Posts matching #LLM

Return to Keyword Browsing

MediaTek Unveils New Flagship Dimensity 9400+ Mobile Platform; with Enhanced AI Performance

Press Release by

Apr 10th, 2025 11:51 Discuss (2 Comments)

MediaTek today announced the Dimensity 9400+ SoC, the latest addition to MediaTek's Dimensity flagship chipset family. Providing exceptional Generative and agentic AI capabilities as well as other performance enhancements, the Dimensity 9400+ supports the latest Large Language Models (LLM) while sustaining a super power-efficient design. The Dimensity 9400+ features an All Big Core design, integrating one Arm Cortex-X925 core operating up to 3.73 GHz, combined with 3x Cortex-X4 and 4x Cortex-A720 cores. This powerful configuration accelerates single and multithreaded performance for top-tier Android UX experiences.

"The MediaTek Dimensity 9400+ will make it easier to deliver innovative, personalized AI experiences on-device, combined with enhanced overall performance to ensure your device can handle all tasks with ease," said JC Hsu, Corporate Senior Vice President at MediaTek. "We are working closely with developers and manufacturers to continue building a robust ecosystem of AI applications and other features that will bring a number of speed and privacy benefits to consumers."

Read full story

UGREEN Showcases the New AI-Powered NASync iDX Series at NAB Show 2025

Press Release by

Apr 7th, 2025 08:24 Discuss (0 Comments)

From April 6-9th, UGREEN, a global leader in consumer electronics and charging technology, is showcasing its innovative NASync series at the NAB Show in Las Vegas. The UGREEN NASync iDX6011 and iDX6011 Pro have been the highlights of the display at Booth SL9210 in the Las Vegas Convention Center. These latest UGREEN NASync iDX models revolutionize data management and security for content creators through advanced AI technology, setting a new standard as the world-first AI-powered NAS.

UGREEN NASync is a series of network-attached storage devices tailored for personal, home, or business use. In March 2024, UGREEN launched a 44-day crowdfunding campaign on Kickstarter for the NASync DXP series, successfully raising over $6.6 million achieved No.1 in the NAS category. This remarkable support highlights the strong demand for advanced storage solutions.

Read full story

Tachyum Demonstrates DRAM Failover for Large Scale AI on Prodigy FPGA Prototype

Press Release by

Mar 27th, 2025 12:33 Discuss (0 Comments)

Tachyum today announced that it has successfully enabled DRAM Failover correct system on its Prodigy Universal Processor, demonstrating enhanced reliability for even larger-scale AI and HPC applications even in the case of DRAM chip failures.

Tachyum's DRAM Failover is an advanced memory error correction technology that improves the reliability of DRAM and provides a higher level of protection than traditional Error Correction Code (ECC). DRAM Failover can correct multi-bit errors within a single memory chip or across multiple memory chips, allowing continued memory operation in the event of device-level faults in memory. With DRAM Failover, even a whole DRAM chip failure can be tolerated without affecting the system and applications.

Read full story

NVIDIA NIM Microservices Now Available to Streamline Agentic Workflows on RTX AI PCs and Workstations

Press Release by

Mar 25th, 2025 09:59 Discuss (0 Comments)

Generative AI is unlocking new capabilities for PCs and workstations, including game assistants, enhanced content-creation and productivity tools and more. NVIDIA NIM microservices, available now, and AI Blueprints, in the coming weeks, accelerate AI development and improve its accessibility. Announced at the CES trade show in January, NVIDIA NIM provides prepackaged, state-of-the-art AI models optimized for the NVIDIA RTX platform, including the NVIDIA GeForce RTX 50 Series and, now, the new NVIDIA Blackwell RTX PRO GPUs. The microservices are easy to download and run. They span the top modalities for PC development and are compatible with top ecosystem applications and tools.

The experimental System Assistant feature of Project G-Assist was also released today. Project G-Assist showcases how AI assistants can enhance apps and games. The System Assistant allows users to run real-time diagnostics, get recommendations on performance optimizations, or control system software and peripherals - all via simple voice or text commands. Developers and enthusiasts can extend its capabilities with a simple plug-in architecture and new plug-in builder.

Read full story

AMD Introduces GAIA - an Open-Source Project That Runs Local LLMs on Ryzen AI NPUs

Press Release by

Mar 21st, 2025 15:34 Discuss (30 Comments)

AMD has launched a new open-source project called, GAIA (pronounced /ˈɡaɪ.ə/), an awesome application that leverages the power of Ryzen AI Neural Processing Unit (NPU) to run private and local large language models (LLMs). In this blog, we'll dive into the features and benefits of GAIA, while introducing how you can take advantage of GAIA's open-source project to adopt into your own applications.

Introduction to GAIA
GAIA is a generative AI application designed to run local, private LLMs on Windows PCs and is optimized for AMD Ryzen AI hardware (AMD Ryzen AI 300 Series Processors). This integration allows for faster, more efficient processing - i.e. lower power- while keeping your data local and secure. On Ryzen AI PCs, GAIA interacts with the NPU and iGPU to run models seamlessly by using the open-source Lemonade (LLM-Aid) SDK from ONNX TurnkeyML for LLM inference. GAIA supports a variety of local LLMs optimized to run on Ryzen AI PCs. Popular models like Llama and Phi derivatives can be tailored for different use cases, such as Q&A, summarization, and complex reasoning tasks.

Read full story

LG Develops Custom "EXAONE Deep" State of The Art Reasoning AI Model

by

Mar 19th, 2025 14:04 Discuss (2 Comments)

LG AI Research, yes, that LG, which develops consumer electronics, has launched EXAONE Deep, a high-performance reasoning AI that demonstrates exceptional capabilities in mathematical logic, scientific concepts, and programming challenges despite its relatively small parameter count. The flagship 32B model achieves performance metrics rivaling substantially larger models like GPT-4o and DeepSeek R1. In contrast, the 7.8B and 2.4B variants establish new benchmarks in the lightweight and on-device AI categories. The EXAONE Deep 32B model registered a 94.5 score on the CSAT 2025 Mathematics section and 90.0 on the AIME 2024, outperforming competing models while requiring only 5% of the computational resources of larger alternatives like DeepSeek-R1 (671B).

In scientific reasoning, it achieved a 66.1 score on the GPQA Diamond test, which evaluates PhD-level problem-solving capabilities across physics, chemistry, and biology. The model's 83.0 score on MMLU establishes it as the highest-performing domestically developed model in South Korea, showing LG's approach to creating efficient, high-performance AI systems. Particularly notable is the performance of the smaller variants: the 7.8B model scored 94.8 in MATH-500 and 59.6 in AIME 2025, while the 2.4B model achieved 92.3 in MATH-500 and 47.9 in AIME 2024. These results position EXAONE Deep's smaller models at the top of their respective categories across all major benchmarks, suggesting significant potential for deployment in resource-constrained environments. With sizes of up to 32 billion parameters, it will excel in single-GPU deployments. Interestingly, these models can run on a range of discrete GPUs, laptop GPUs, and some edge systems that don't have massive computational power.

Tencent Will Launch Hunyuan T1 Inference Model on March 21

by

Mar 19th, 2025 13:59 Discuss (0 Comments)

Tencent's large language model (LLM) specialist division has announced the imminent launch of their T1 AI inference model. The Chinese technology giant's Hunyuan social media accounts revealed a grand arrival, scheduled to take place on Friday (March 21). A friendly reminder was issued to interested parties, regarding the upcoming broadcast/showcase: "please set aside your valuable time. Let's step into T1 together." Earlier in the week, the Tencent AI team started to tease their "first ultra-large Mamba-powered reasoning model." Local news reports have highlighted Hunyuan's claim of Mamba architecture being applied losslessly to a super-large Mixture of Experts (MoE) model.

Late last month, the company released its Hunyuan Turbo S AI model—advertised as offering faster replies than DeepSeek's R1 system. Tencent's plucky solution has quickly climbed up the Chatbot Arena LLM Leaderboard. The Hunyuan team was in a boastful mood earlier today, and loudly proclaimed that their proprietary Turbo S model had charted in fifteenth place. At the time of writing, DeepSeek R1 is ranked seventh on the leaderboard. As explained by ITHome, this community-driven platform is driven by users interactions: "with multiple models anonymously, voting to decide which model is better, and then generating a ranking list based on the scores. This kind of evaluation is also seen as an arena for big models to compete directly, which is simple and direct."

Phison Expands aiDAPTIV+ GPU Memory Extension Capabilities

Press Release by

Mar 19th, 2025 00:33 Discuss (0 Comments)

Phison Electronics (8299TT), a leading innovator in NAND flash technologies, today announced an array of expanded capabilities on aiDAPTIV+, the affordable AI training and inferencing solution for on-premises environments. aiDAPTIV+ will be integrated into a ML-series Maingear laptop, the first AI laptop PC capable of LLMOps, utilizing NVIDIA GPUs and available for concept demonstration and registration this week at NVIDIA GTC 2025. Customers will be able to fine-tune Large Language Models (LLMs) up to 8 billion parameters using their own data.

Phison also expanded aiDAPTIV+ capabilities to run on edge computing devices powered by the NVIDIA Jetson platform, for enhanced generative AI inference at the edge and robotics deployments. With today's announcement, new and current aiDAPTIV+ users can look forward to the new aiDAPTIVLink 3.0 middleware, which will provide faster Time to First Token (TTFT) recall and extend the token length for greater context, improving inferencing performance and accuracy. These expansions will unlock access for users ranging from university students and AI industry professionals learning to train LLMs, or researchers uncovering deeper insights within their own data using a PC, all the way to manufacturing engineers automating factory floor enhancements via edge devices.

Read full story

ASUS Introduces New "AI Cache Boost" BIOS Feature - R&D Team Claims Performance Uplift

Press Release by

Mar 18th, 2025 14:38 Discuss (9 Comments)

Large language models (LLMs) love large quantities of memory—so much so, in fact, that AI enthusiasts are turning to multi-GPU setups to make even more VRAM available for their AI apps. But since many current LLMs are extremely large, even this approach has its limits. At times, the GPU will decide to make use of CPU processing power for this data, and when it does, the performance of your CPU cache and DRAM comes into play. All this means that when it comes to the performance of AI applications, it's not just the GPU that matters, but the entire pathway that connects the GPU to the CPU to the I/O die to the DRAM modules. It stands to reason, then, that there are opportunities to boost AI performance by optimizing these elements.

That's exactly what we've found as we've spent time in our R&D labs with the latest AMD Ryzen CPUs. AMD just launched two new Ryzen CPUs with AMD 3D V-Cache Technology, the AMD Ryzen 9 9950X3D and Ryzen 9 9900X3D, pushing the series into new performance territory. After testing a wide range of optimizations in a variety of workloads, we uncovered a range of settings that offer tangible benefits for AI enthusiasts. Now, we're ready to share these optimizations with you through a new BIOS feature: AI Cache Boost. Available through an ASUS AMD 800 Series motherboard and our most recent firmware update, AI Cache Boost can accelerate performance up to 12.75% when you're working with massive LLMs.

Read full story

AMD's Ryzen AI MAX+ 395 Delivers up to 12x AI LLM Performance Compared to Intel's "Lunar Lake"

by

Mar 17th, 2025 14:13 Discuss (23 Comments)

AMD's latest flagship APU, the Ryzen AI MAX+ 395 "Strix Halo," demonstrates some impressive performance advantages over Intel's "Lunar Lake" processors in large language model (LLM) inference workloads, according to recent benchmarks on AMD's blog. Featuring 16 Zen 5 CPU cores, 40 RDNA 3.5 compute units, and over 50 AI TOPS via its XDNA 2 NPU, the processor achieves up to 12.2x faster response times than Intel's Core Ultra 258V in specific LLM scenarios. Notably, Intel's Lunar Lake has four E-cores and four P-cores, which in total is half of the Ryzen AI MAX+ 395 CPU core count, but the performance difference is much more pronounced than the 2x core gap. The performance delta becomes even more notable with model complexity, particularly with 14-billion parameter models approaching the limit of what standard 32 GB laptops can handle.

In LM Studio benchmarks using an ASUS ROG Flow Z13 with 64 GB unified memory, the integrated Radeon 8060S GPU delivered 2.2x higher token throughput than Intel's Arc 140V across various model architectures. Time-to-first-token metrics revealed a 4x advantage in smaller models like Llama 3.2 3B Instruct, expanding to 9.1x with 7-8B parameter models such as DeepSeek R1 Distill variants. AMD's architecture particularly excels in multimodal vision tasks, where the Ryzen AI MAX+ 395 processed complex visual inputs up to 7x faster in IBM Granite Vision 3.2 3B and 6x faster in Google Gemma 3 12B compared to Intel's offering. The platform's support for AMD Variable Graphics Memory allows allocating up to 96 GB as VRAM from systems equipped with 128 GB unified memory, enabling the deployment of state-of-the-art models like Google Gemma 3 27B Vision. The processor's performance advantages extend to practical AI applications, including medical image analysis and coding assistance via higher-precision 6-bit quantization in the DeepSeek R1 Distill Qwen 32B model.

MSI - Micro-Star International

MSI GeForce RTX 50 Laptops Are Prepped for High-end Gaming & Local AI Applications

Press Release by

Mar 17th, 2025 12:41 Discuss (0 Comments)

MSI's latest high-end gaming laptops, including the Titan 18 HX AI, Raider 18 HX AI, and Vector 16 HX AI, feature Intel Core Ultra 200 HX series CPUs and NVIDIA RTX 50 series GPUs, while Raider A18 HX and Vector A18 HX run on AMD Ryzen 9000 series. With NVIDIA's last major GPU upgrade in over two years, these laptops deliver top-tier performance for ultra-high-resolution gaming. Beyond gaming, MSI's Titan, Raider, Vector, and Stealth series excel in AI applications, particularly Small Language Models (SLM), making them ideal for both gaming and AI-driven tasks.

Next-Gen GPUs: A Breakthrough for AI Applications
NVIDIA's latest RTX 50 series GPUs, built on the cutting-edge Blackwell architecture, introduce 5th-generation Tensor Cores, 4th-generation RT Cores, and Neural Rendering technology for the first time. With expanded memory capacity and GDDR7, these GPUs optimize AI-enhanced neural computations, reducing memory usage while boosting graphics rendering and AI processing efficiency. This results in unmatched performance for both gaming and creative workloads, enabling smoother, more efficient execution of complex tasks.

Read full story

Global Top 10 IC Design Houses See 49% YoY Growth in 2024, NVIDIA Commands Half the Market

Press Release by

Mar 17th, 2025 05:42 Discuss (20 Comments)

TrendForce reveals that the combined revenue of the world's top 10 IC design houses reached approximately US$249.8 billion in 2024, marking a 49% YoY increase. The booming AI industry has fueled growth across the semiconductor sector, with NVIDIA leading the charge, posting an astonishing 125% revenue growth, widening its lead over competitors, and solidifying its dominance in the IC industry.

Looking ahead to 2025, advancements in semiconductor manufacturing will further enhance AI computing power, with LLMs continuing to emerge. Open-source models like DeepSeek could lower AI adoption costs, accelerating AI penetration from servers to personal devices. This shift positions edge AI devices as the next major growth driver for the semiconductor industry.

Read full story

Niantic Offloads Games Division to Scopely - Deal Valued at $3.5 Billion

Press Release by

Mar 13th, 2025 12:11 Discuss (2 Comments)

We're announcing changes at Niantic that will set us on a bold new course. Nearly a decade ago, we spun out as a small team from Google with a bold vision: to use technology to overlay the world with rich digital experiences. Our goal: to inspire people to explore their surroundings and foster real-world connections, especially at a time when relationships were becoming increasingly digital. To bring this mission and technology to life, we started building games; today, more than 100 million people play our games annually, with more than a billion friend connections made across the world.

People have discovered their neighborhoods, explored new places, and moved more than 30 billion miles. They've also come together at our live events - where everyone is a participant, not just a spectator—contributing over a billion dollars in economic impact in the cities that host them. As we grew, the company naturally evolved along two complementary paths - one focused on creating games and bringing them to the world, and the other dedicated to advancing augmented reality, artificial intelligence, and geospatial technology. Meanwhile, the rapid progress in AI reinforces our belief in the future of geospatial computing to unlock new possibilities for both consumer experiences and enterprise applications. At the same time, we remain committed to creating "forever games" that will last for generations.

Read full story

GIGABYTE Showcases Future-Ready AI and HPC Technologies for High-Efficiency Computing at SCA 2025

Press Release by

Mar 11th, 2025 07:41 Discuss (0 Comments)

Giga Computing, a subsidiary of GIGABYTE and a pioneer in AI-driven enterprise computing, is set to make a significant impact at Supercomputing Asia 2025 (SCA25) in Singapore (March 11-13). At booth #D5, GIGABYTE showcases its latest advancements in liquid cooling, solutions for AI training and high-performance computing (HPC). The booth highlights GIGABYTE's innovative technology and comprehensive direct liquid cooling (DLC) strategies, reinforcing its commitment to energy-efficient, high-performance computing.

Revolutionizing AI Training with DLC
A key highlight of GIGABYTE's showcase is the NVIDIA HGX H200 platform, a next-generation solution for AI workloads. GIGABYTE is presenting both its liquid-cooled G4L3-SD1 server and its air-cooled G893 series, providing businesses with advanced cooling solutions tailored for high-performance demands. The G4L3-SD1 server, equipped with CoolIT Systems' cold plates, effectively cools Intel Xeon CPUs and eight NVIDIA H200 GPUs, ensuring optimal performance with enhanced energy efficiency.

Read full story

Jio Platforms Limited Along with AMD, Cisco, and Nokia Unveil Plans for Open Telecom AI Platform at MWC 2025

Press Release by

Mar 3rd, 2025 13:49 Discuss (0 Comments)

Jio Platforms Limited (JPL), together with AMD, Cisco, and Nokia, announced at Mobile World Congress 2025 plans to form an innovative, new Open Telecom AI Platform. Designed to support today's operators and service providers with real-world, AI-driven solutions, the Telecom AI Platform is set to drive unprecedented efficiency, security, capabilities, and new revenue opportunities for the service provider industry.

End-to-end Network Intelligence
Fueled by the collective expertise of world leaders from across domains including RAN, Routing, AI Data Center, Security and Telecom, the Telecom AI Platform will create a new central intelligence layer for telecom and digital services. This multi-domain intelligence framework will integrate AI and automation into every layer of network operations.

Read full story

OpenAI Has "Run Out of GPUs" - Sam Altman Mentions Incoming Delivery of "Tens of Thousands"

by

Feb 28th, 2025 12:49 Discuss (48 Comments)

Yesterday, OpenAI introduced its "strongest" GPT-4.5 model. A research preview build is only available to paying customers—Pro-tier subscribers fork out $200 a month for early access privileges. The non-profit organization's CEO shared an update via social media post; complete with a "necessary" hyping up of version 4.5: "it is the first model that feels like talking to a thoughtful person to me. I have had several moments where I've sat back in my chair and been astonished at getting actual good advice from an AI." There are apparent performance caveats—Sam Altman proceeded to add a short addendum: "this isn't a reasoning model and won't crush benchmarks. It's a different kind of intelligence, and there's a magic to it (that) I haven't felt before. Really excited for people to try it!" OpenAI had plans to make GPT-4.5 available to its audience of "Plus" subscribers, but major hardware shortages have delayed a roll-out to the $20 per month tier.

Altman disclosed his personal disappointment: "bad news: it is a giant, expensive model. We really wanted to launch it to Plus and Pro (customers) at the same time, but we've been growing a lot and are out of GPUs. We will add tens of thousands of GPUs next week, and roll it out to the plus tier then...Hundreds of thousands coming soon, and I'm pretty sure y'all will use every one we can rack up." Insiders believe that OpenAI is finalizing a proprietary AI-crunching solution, but a rumored mass production phase is not expected to kick-off until 2026. In the meantime, Altman & Co. are still reliant on NVIDIA for new shipments of AI GPUs. Despite being a very important customer, OpenAI is reportedly not satisfied about the "slow" flow of Team Green's latest DGX B200 and DGX H200 platforms into server facilities. Several big players are developing in-house designs, in an attempt to ween themselves off prevalent NVIDIA technologies.

NVIDIA & Partners Will Discuss Supercharging of AI Development at GTC 2025

Press Release by

Feb 26th, 2025 11:27 Discuss (1 Comment)

Generative AI is redefining computing, unlocking new ways to build, train and optimize AI models on PCs and workstations. From content creation and large and small language models to software development, AI-powered PCs and workstations are transforming workflows and enhancing productivity. At GTC 2025, running March 17-21 in the San Jose Convention Center, experts from across the AI ecosystem will share insights on deploying AI locally, optimizing models and harnessing cutting-edge hardware and software to enhance AI workloads—highlighting key advancements in RTX AI PCs and workstations.

Develop and Deploy on RTX
RTX GPUs are built with specialized AI hardware called Tensor Cores that provide the compute performance needed to run the latest and most demanding AI models. These high-performance GPUs can help build digital humans, chatbots, AI-generated podcasts and more. With more than 100 million GeForce RTX and NVIDIA RTX GPUs users, developers have a large audience to target when new AI apps and features are deployed. In the session "Build Digital Humans, Chatbots, and AI-Generated Podcasts for RTX PCs and Workstations," Annamalai Chockalingam, senior product manager at NVIDIA, will showcase the end-to-end suite of tools developers can use to streamline development and deploy incredibly fast AI-enabled applications.

Read full story

Micron Unveils Its First PCIe Gen5 NVMe High-Performance Client SSD

Press Release by

Feb 18th, 2025 10:19 Discuss (15 Comments)

Micron Technology, Inc., today announced the Micron 4600 PCIe Gen 5 NVMe SSD, an innovative client storage drive for OEMs that is designed to deliver exceptional performance and user experience for gamers, creators and professionals. Leveraging Micron G9 TLC NAND, the 4600 SSD is Micron's first Gen 5 client SSD and doubles the performance of its predecessor.

The Micron 4600 SSD showcases sequential read speeds of 14.5 GB/s and write speeds of 12.0 GB/s. These capabilities allow users to load a large language model (LLM) from the SSD to DRAM in less than one second, enhancing the user experience with AI PCs. For AI model loading times, the 4600 SSD reduces load times by up to 62% compared to Gen 4 performance SSDs ensuring rapid deployment of LLMs and other AI workloads. Additionally, the 4600 SSD provides up to 107% improved energy efficiency (MB/s per watt) compared to Gen 4 performance SSDs, enhancing battery life and overall system efficiency.

Read full story

AMD & Nexa AI Reveal NexaQuant's Improvement of DeepSeek R1 Distill 4-bit Capabilities

Press Release by

Feb 18th, 2025 10:01 Discuss (3 Comments)

Nexa AI, today, announced NexaQuants of two DeepSeek R1 Distills: The DeepSeek R1 Distill Qwen 1.5B and DeepSeek R1 Distill Llama 8B. Popular quantization methods like the llama.cpp based Q4 K M allow large language models to significantly reduce their memory footprint and typically offer low perplexity loss for dense models as a tradeoff. However, even low perplexity loss can result in a reasoning capability hit for (dense or MoE) models that use Chain of Thought traces. Nexa AI has stated that NexaQuants are able to recover this reasoning capability loss (compared to the full 16-bit precision) while keeping the 4-bit quantization and all the while retaining the performance advantage. Benchmarks provided by Nexa AI can be seen below.

We can see that the Q4 K M quantized DeepSeek R1 distills score slightly less (except for the AIME24 bench on Llama 3 8b distill, which scores significantly lower) in LLM benchmarks like GPQA and AIME24 compared to their full 16-bit counter parts. Moving to a Q6 or Q8 quantization would be one way to fix this problem - but would result in the model becoming slightly slower to run and requiring more memory. Nexa AI has stated that NexaQuants use a proprietary quantization method to recover the loss while keeping the quantization at 4-bits. This means users can theoretically get the best of both worlds: accuracy and speed.

Read full story

Moore Threads Teases Excellent Performance of DeepSeek-R1 Model on MTT GPUs

by

Feb 6th, 2025 11:36 Discuss (4 Comments)

Moore Threads, a Chinese manufacturer of proprietary GPU designs is (reportedly) the latest company to jump onto the DeepSeek-R1 bandwagon. Since late January, NVIDIA, Microsoft and AMD have swooped in with their own interpretations/deployments. By global standards, Moore Threads GPUs trail behind Western-developed offerings—early 2024 evaluations presented the firm's MTT S80 dedicated desktop graphics card struggling against an AMD integrated solution: Radeon 760M. The recent emergence of DeepSeek's open source models has signalled a shift away from reliance on extremely powerful and expensive AI-crunching hardware (often accessed via the cloud)—widespread excitement has been generated by DeepSeek solutions being relatively frugal, in terms of processing requirements. Tom's Hardware has observed cases of open source AI models running (locally) on: "inexpensive hardware, like the Raspberry Pi."

According to recent Chinese press coverage, Moore Threads has announced a successful deployment of DeepSeek's R1-Distill-Qwen-7B distilled model on the aforementioned MTT S80 GPU. The company also revealed that it had taken similar steps with its MTT S4000 datacenter-oriented graphics hardware. On the subject of adaptation, a Moore Threads spokesperson stated: "based on the Ollama open source framework, Moore Threads completed the deployment of the DeepSeek-R1-Distill-Qwen-7B distillation model and demonstrated excellent performance in a variety of Chinese tasks, verifying the versatility and CUDA compatibility of Moore Threads' self-developed full-featured GPU." Exact performance figures, benchmark results and technical details were not disclosed to the Chinese public, so Moore Threads appears to be teasing the prowess of its MTT GPU designs. ITHome reported that: "users can also perform inference deployment of the DeepSeek-R1 distillation model based on MTT S80 and MTT S4000. Some users have previously completed the practice manually on MTT S80." Moore Threads believes that its: "self-developed high-performance inference engine, combined with software and hardware co-optimization technology, significantly improves the model's computing efficiency and resource utilization through customized operator acceleration and memory management. This engine not only supports the efficient operation of the DeepSeek distillation model, but also provides technical support for the deployment of more large-scale models in the future."

ASUS AI POD With NVIDIA GB200 NVL72 Platform Ready to Ramp-Up Production for Scheduled Shipment in March

Press Release by

Feb 4th, 2025 04:53 Discuss (0 Comments)

ASUS is proud to announce that ASUS AI POD, featuring the NVIDIA GB200 NVL72 platform, is ready to ramp-up production for a scheduled shipping date of March 2025. ASUS remains dedicated to providing comprehensive end-to-end solutions and software services, encompassing everything from AI supercomputing to cloud services. With a strong focus on fostering AI adoption across industries, ASUS is positioned to empower clients in accelerating their time to market by offering a full spectrum of solutions.

Proof of concept, funded by ASUS
Honoring the commitment to delivering exceptional value to clients, ASUS is set to launch a proof of concept (POC) for the groundbreaking ASUS AI POD, powered by the NVIDIA Blackwell platform. This exclusive opportunity is now open to a select group of innovators who are eager to harness the full potential of AI computing. Innovators and enterprises can experience firsthand the full potential of AI and deep learning solutions at exceptional scale. To take advantage of this limited-time offer, please complete this surveyi at: forms.office.com/r/FrAbm5BfH2. The expert ASUS team of NVIDIA GB200 specialists will guide users through the next steps.

Read full story

NVIDIA GeForce RTX 50 Series AI PCs Accelerate DeepSeek Reasoning Models

Press Release by

Feb 3rd, 2025 08:54 Discuss (24 Comments)

The recently released DeepSeek-R1 model family has brought a new wave of excitement to the AI community, allowing enthusiasts and developers to run state-of-the-art reasoning models with problem-solving, math and code capabilities, all from the privacy of local PCs. With up to 3,352 trillion operations per second of AI horsepower, NVIDIA GeForce RTX 50 Series GPUs can run the DeepSeek family of distilled models faster than anything on the PC market.

A New Class of Models That Reason
Reasoning models are a new class of large language models (LLMs) that spend more time on "thinking" and "reflecting" to work through complex problems, while describing the steps required to solve a task. The fundamental principle is that any problem can be solved with deep thought, reasoning and time, just like how humans tackle problems. By spending more time—and thus compute—on a problem, the LLM can yield better results. This phenomenon is known as test-time scaling, where a model dynamically allocates compute resources during inference to reason through problems. Reasoning models can enhance user experiences on PCs by deeply understanding a user's needs, taking actions on their behalf and allowing them to provide feedback on the model's thought process—unlocking agentic workflows for solving complex, multi-step tasks such as analyzing market research, performing complicated math problems, debugging code and more.

Read full story

DeepSeek-R1 Goes Live on NVIDIA NIM

Press Release by

Jan 31st, 2025 10:59 Discuss (9 Comments)

DeepSeek-R1 is an open model with state-of-the-art reasoning capabilities. Instead of offering direct responses, reasoning models like DeepSeek-R1 perform multiple inference passes over a query, conducting chain-of-thought, consensus and search methods to generate the best answer. Performing this sequence of inference passes—using reason to arrive at the best answer—is known as test-time scaling. DeepSeek-R1 is a perfect example of this scaling law, demonstrating why accelerated computing is critical for the demands of agentic AI inference.

As models are allowed to iteratively "think" through the problem, they create more output tokens and longer generation cycles, so model quality continues to scale. Significant test-time compute is critical to enable both real-time inference and higher-quality responses from reasoning models like DeepSeek-R1, requiring larger inference deployments. R1 delivers leading accuracy for tasks demanding logical inference, reasoning, math, coding and language understanding while also delivering high inference efficiency.

Read full story

KIOXIA Releases AiSAQ as Open-Source Software to Reduce DRAM Needs in AI Systems

Press Release by

Jan 28th, 2025 05:16 Discuss (2 Comments)

Kioxia Corporation, a world leader in memory solutions, today announced the open-source release of its new All-in-Storage ANNS with Product Quantization (AiSAQ) technology. A novel "approximate nearest neighbor" search (ANNS) algorithm optimized for SSDs, KIOXIA AiSAQ software delivers scalable performance for retrieval-augmented generation (RAG) without placing index data in DRAM - and instead searching directly on SSDs.

Generative AI systems demand significant compute, memory and storage resources. While they have the potential to drive transformative breakthroughs across various industries, their deployment often comes with high costs. RAG is a critical phase of AI that refines large language models (LLMs) with data specific to the company or application.

Read full story

NVIDIA Outlines Cost Benefits of Inference Platform

Press Release by

Jan 27th, 2025 08:54 Discuss (8 Comments)

Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform—a full stack comprising world-class silicon, systems and software—is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost. NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.

AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience. But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system—and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task. Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.

Read full story

Return to Keyword Browsing

Apr 16th, 2025 23:32 EDT change timezone

Latest GPU Drivers

New Forum Posts

23:26 by Todestrieb
Last game you purchased? (782)
23:26 by ProStock
Compatibility With Alphacool Core RX 9070 XT Taichi GPU WaterBlock ?? (3)
23:21 by Dr. Dro
Need advice RAM for Asus Maximus hero z890 + core ultra 7 265k (30)
22:59 by Divide Overflow
I tried to use AMD Auto Overclock, and now my PC has been freezing up sometimes. Afterwards, the screen goes black or displays artifacts. (25)
22:47 by TurboGlitch
Intel Iris XE Graphics Driver Issue. (5)
22:42 by eidairaman1
Flash another RTX 4060 vBIOS to get zero rpm fan support (10)
22:40 by eidairaman1
Bought a Sapphire RX 7900 XT and..... (11)
21:51 by Dr. Dro
RTX5000 Series Owners Club (171)
21:46 by eidairaman1
Bios RX 570 (21)
21:29 by ShrimpBrime
Help me identify Chip of this DDR4 RAM (25)

Popular Reviews

Apr 14th, 2025 G.SKILL Trident Z5 NEO RGB DDR5-6000 32 GB CL26 Review - AMD EXPO
Apr 16th, 2025 ASUS GeForce RTX 5060 Ti TUF OC 16 GB Review
Apr 11th, 2025 ASUS GeForce RTX 5080 TUF OC Review
Apr 16th, 2025 NVIDIA GeForce RTX 5060 Ti PCI-Express x8 Scaling
Apr 14th, 2025 DAREU A950 Wing Review
Apr 16th, 2025 Palit GeForce RTX 5060 Ti Infinity 3 16 GB Review
Apr 7th, 2025 The Last Of Us Part 2 Performance Benchmark Review - 30 GPUs Compared
Apr 16th, 2025 ASUS GeForce RTX 5060 Ti Prime OC 16 GB Review
Apr 16th, 2025 Zotac GeForce RTX 5060 Ti AMP 16 GB Review
Mar 26th, 2025 Sapphire Radeon RX 9070 XT Pulse Review

Controversial News Posts