• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

DeepSeek-R1 Goes Live on NVIDIA NIM

T0@st

News Editor
Joined
Mar 7, 2023
Messages
2,358 (3.35/day)
Location
South East, UK
DeepSeek-R1 is an open model with state-of-the-art reasoning capabilities. Instead of offering direct responses, reasoning models like DeepSeek-R1 perform multiple inference passes over a query, conducting chain-of-thought, consensus and search methods to generate the best answer. Performing this sequence of inference passes—using reason to arrive at the best answer—is known as test-time scaling. DeepSeek-R1 is a perfect example of this scaling law, demonstrating why accelerated computing is critical for the demands of agentic AI inference.

As models are allowed to iteratively "think" through the problem, they create more output tokens and longer generation cycles, so model quality continues to scale. Significant test-time compute is critical to enable both real-time inference and higher-quality responses from reasoning models like DeepSeek-R1, requiring larger inference deployments. R1 delivers leading accuracy for tasks demanding logical inference, reasoning, math, coding and language understanding while also delivering high inference efficiency.




To help developers securely experiment with these capabilities and build their own specialized agents, the 671-billion-parameter DeepSeek-R1 model is now available as an NVIDIA NIM microservice preview on build.nvidia.com. The DeepSeek-R1 NIM microservice can deliver up to 3,872 tokens per second on a single NVIDIA HGX H200 system. Developers can test and experiment with the application programming interface (API), which is expected to be available soon as a downloadable NIM microservice, part of the NVIDIA AI Enterprise software platform.



The DeepSeek-R1 NIM microservice simplifies deployments with support for industry-standard APIs. Enterprises can maximize security and data privacy by running the NIM microservice on their preferred accelerated computing infrastructure. Using NVIDIA AI Foundry with NVIDIA NeMo software, enterprises will also be able to create customized DeepSeek-R1 NIM microservices for specialized AI agents.

DeepSeek-R1—a Perfect Example of Test-Time Scaling
DeepSeek-R1 is a large mixture-of-experts (MoE) model. It incorporates an impressive 671 billion parameters—10x more than many other popular open-source LLMs—supporting a large input context length of 128,000 tokens. The model also uses an extreme number of experts per layer. Each layer of R1 has 256 experts, with each token routed to eight separate experts in parallel for evaluation.

Delivering real-time answers for R1 requires many GPUs with high compute performance, connected with high-bandwidth and low-latency communication to route prompt tokens to all the experts for inference. Combined with the software optimizations available in the NVIDIA NIM microservice, a single server with eight H200 GPUs connected using NVLink and NVLink Switch can run the full, 671-billion-parameter DeepSeek-R1 model at up to 3,872 tokens per second. This throughput is made possible by using the NVIDIA Hopper architecture's FP8 Transformer Engine at every layer—and the 900 GB/s of NVLink bandwidth for MoE expert communication.



Getting every floating point operation per second (FLOPS) of performance out of a GPU is critical for real-time inference. The next-generation NVIDIA Blackwell architecture will give test-time scaling on reasoning models like DeepSeek-R1 a giant boost with fifth-generation Tensor Cores that can deliver up to 20 petaflops of peak FP4 compute performance and a 72-GPU NVLink domain specifically optimized for inference.

Get Started Now With the DeepSeek-R1 NIM Microservice
Developers can experience the DeepSeek-R1 NIM microservice, now available on build.nvidia.com. Watch how it works:


With NVIDIA NIM, enterprises can deploy DeepSeek-R1 with ease and ensure they get the high efficiency needed for agentic AI systems.

See notice regarding software product information.

View at TechPowerUp Main Site | Source
 
Joined
Feb 10, 2006
Messages
32 (0.00/day)
Nvidia in damage control mode. Its already been shown that AMD inference performance in Deepseek is superior. 7900XTX besting both the 4080 and 4090.
 
Joined
May 10, 2023
Messages
592 (0.92/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
Nvidia in damage control mode. Its already been shown that AMD inference performance in Deepseek is superior. 7900XTX besting both the 4080 and 4090.
The model than ran on the 7900xtx has nothing to do with the one being shown in the OP.
The demo AMD showed also used LM Studio, which is a subpar way to run any LLM (albeit a really easy one for the mainstream consumer).
 
Joined
Oct 22, 2014
Messages
14,279 (3.80/day)
Location
Sunshine Coast
System Name H7 Flow 2024
Processor AMD 5800X3D
Motherboard Asus X570 Tough Gaming
Cooling Custom liquid
Memory 32 GB DDR4
Video Card(s) Intel ARC A750
Storage Crucial P5 Plus 2TB.
Display(s) AOC 24" Freesync 1m.s. 75Hz
Mouse Lenovo
Keyboard Eweadn Mechanical
Software W11 Pro 64 bit
No mention of the inherent security risks using Deepseek in comparison to other Ai models?
Where's the protections?
 
Joined
Jun 22, 2012
Messages
316 (0.07/day)
Processor Intel i7-12700K
Motherboard MSI PRO Z690-A WIFI
Cooling Noctua NH-D15S
Memory Corsair Vengeance 4x16 GB (64GB) DDR4-3600 C18
Video Card(s) MSI GeForce RTX 3090 GAMING X TRIO 24G
Storage Samsung 980 Pro 1TB, SK hynix Platinum P41 2TB
Case Fractal Define C
Power Supply Corsair RM850x
Mouse Logitech G203
Software openSUSE Tumbleweed
If you're mentioning these "security risks": https://archive.is/3Rboh
I'd argue it's a good thing, because it does what you're asking; it's not a security risk.
 
Joined
Jan 11, 2022
Messages
1,079 (0.96/day)
Nvidia in damage control mode. Its already been shown that AMD inference performance in Deepseek is superior. 7900XTX besting both the 4080 and 4090.
Nope, it even close

1738568729955.png
 
Joined
Jan 12, 2023
Messages
262 (0.35/day)
System Name IZALITH (or just "Lith")
Processor AMD Ryzen 7 7800X3D (4.2Ghz base, 5.0Ghz boost, -30 PBO offset)
Motherboard Gigabyte X670E Aorus Master Rev 1.0
Cooling Deepcool Gammaxx AG400 Single Tower
Memory Corsair Vengeance 64GB (2x32GB) 6000MHz CL40 DDR5 XMP (XMP enabled)
Video Card(s) PowerColor Radeon RX 7900 XTX Red Devil OC 24GB (2.39Ghz base, 2.99Ghz boost, -30 core offset)
Storage 2x1TB SSD, 2x2TB SSD, 2x 8TB HDD
Display(s) Samsung Odyssey G51C 27" QHD (1440p 165Hz) + Samsung Odyssey G3 24" FHD (1080p 165Hz)
Case Corsair 7000D Airflow Full Tower
Audio Device(s) Corsair HS55 Surround Wired Headset/LG Z407 Speaker Set
Power Supply Corsair HX1000 Platinum Modular (1000W)
Mouse Logitech G502 X LIGHTSPEED Wireless Gaming Mouse
Keyboard Keychron K4 Wireless Mechanical Keyboard
Software Arch Linux
I'm a bit of a layman, can someone please give a crash course as to what Nvidia NIM is and does?
 
Joined
Sep 15, 2011
Messages
6,867 (1.40/day)
Processor Intel® Core™ i7-13700K
Motherboard Gigabyte Z790 Aorus Elite AX
Cooling Noctua NH-D15
Memory 32GB(2x16) DDR5@6600MHz G-Skill Trident Z5
Video Card(s) ZOTAC GAMING GeForce RTX 3080 AMP Holo
Storage 2TB SK Platinum P41 SSD + 4TB SanDisk Ultra SSD + 500GB Samsung 840 EVO SSD
Display(s) Acer Predator X34 3440x1440@100Hz G-Sync
Case NZXT PHANTOM410-BK
Audio Device(s) Creative X-Fi Titanium PCIe
Power Supply Corsair 850W
Mouse Logitech Hero G502 SE
Software Windows 11 Pro - 64bit
Benchmark Scores 30FPS in NFS:Rivals
Nvidia in damage control mode. Its already been shown that AMD inference performance in Deepseek is superior. 7900XTX besting both the 4080 and 4090.
source?
 
Joined
Jan 11, 2022
Messages
1,079 (0.96/day)

as stated earlier, the details of this in other publications much exaggerated feat have since been corrected.
nvidia vastly Outperforms them
 
Joined
May 10, 2023
Messages
592 (0.92/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
I'm a bit of a layman, can someone please give a crash course as to what Nvidia NIM is and does?
It's basically cloud offerings with easy-to-use APIs for many AI stuff, like image generation, text stuff and whatnot, without you needing to worry about the underlying infra.
 
Top