• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA Announces DGX Spark and DGX Station Personal AI Computers

Nomad76

News Editor
Staff member
Joined
May 21, 2024
Messages
1,075 (3.55/day)
NVIDIA today unveiled NVIDIA DGX personal AI supercomputers powered by the NVIDIA Grace Blackwell platform. DGX Spark—formerly Project DIGITS—and DGX Station, a new high-performance NVIDIA Grace Blackwell desktop supercomputer powered by the NVIDIA Blackwell Ultra platform, enable AI developers, researchers, data scientists and students to prototype, fine-tune and inference large models on desktops. Users can run these models locally or deploy them on NVIDIA DGX Cloud or any other accelerated cloud or data center infrastructure.

DGX Spark and DGX Station bring the power of the Grace Blackwell architecture, previously only available in the data center, to the desktop. Global system builders to develop DGX Spark and DGX Station include ASUS, Dell, HP Inc. and Lenovo.



"AI has transformed every layer of the computing stack. It stands to reason a new class of computers would emerge—designed for AI-native developers and to run AI-native applications," said Jensen Huang, founder and CEO of NVIDIA. "With these new DGX personal AI computers, AI can span from cloud services to desktop and edge applications."

Igniting Innovation With DGX Spark
DGX Spark is the world's smallest AI supercomputer, empowering millions of researchers, data scientists, robotics developers and students to push the boundaries of generative and physical AI with massive performance and capabilities.

At the heart of DGX Spark is the NVIDIA GB10 Grace Blackwell Superchip, optimized for a desktop form factor. GB10 features a powerful NVIDIA Blackwell GPU with fifth-generation Tensor Cores and FP4 support, delivering up to 1,000 trillion operations per second of AI compute for fine-tuning and inference with the latest AI reasoning models, including the NVIDIA Cosmos Reason world foundation model and NVIDIA GR00T N1 robot foundation model.

The GB10 Superchip uses NVIDIA NVLink -C2C interconnect technology to deliver a CPU+GPU-coherent memory model with 5x the bandwidth of fifth-generation PCIe. This lets the superchip access data between a GPU and CPU to optimize performance for memory-intensive AI developer workloads.

NVIDIA's full-stack AI platform enables DGX Spark users to seamlessly move their models from their desktops to DGX Cloud or any accelerated cloud or data center infrastructure—with virtually no code changes—making it easier than ever to prototype, fine-tune and iterate on their workflows.

Full Speed Ahead With DGX Station
NVIDIA DGX Station brings data-center-level performance to desktops for AI development. The first desktop system to be built with the NVIDIA GB300 Grace Blackwell Ultra Desktop Superchip, DGX Station features a massive 784 GB of coherent memory space to accelerate large-scale training and inferencing workloads. The GB300 Desktop Superchip features an NVIDIA Blackwell Ultra GPU with latest-generation Tensor Cores and FP4 precision—connected to a high-performance NVIDIA Grace CPU via NVLink-C2C—delivering best-in-class system communication and performance.

DGX Station also features the NVIDIA ConnectX -8 SuperNIC, optimized to supercharge hyperscale AI computing workloads. With support for networking at up to 800 Gb/s, the ConnectX-8 SuperNIC delivers extremely fast, efficient network connectivity, enabling high-speed connectivity of multiple DGX Stations for even larger workloads, and network-accelerated data transfers for AI workloads.

Combining these state-of-the-art DGX Station capabilities with the NVIDIA CUDA-X AI platform, teams can achieve exceptional desktop AI development performance.

In addition, users gain access to NVIDIA NIM microservices with the NVIDIA AI Enterprise software platform, which offers highly optimized, easy-to-deploy inference microservices backed by enterprise support.

Availability
Reservations for DGX Spark systems open today at nvidia.com.

DGX Station is expected to be available from manufacturing partners like ASUS, BOXX, Dell, HP, Lambda and Supermicro later this year.



View at TechPowerUp Main Site | Source
 
Joined
Sep 21, 2019
Messages
45 (0.02/day)
Location
Delaware
System Name Typhon
Processor AMD Ryzen 9 5900x
Motherboard Asus Crosshair VIII Hero WiFi
Cooling Deepcool Castle 360EX Push Top Mount
Memory G.Skill Trident Z NEO Series 64GB - F4-3600C16Q-64GTZNC
Video Card(s) EVGA GeForce RTX 3080 Ti FTW3 Ultra Gaming
Storage 1TB Samsung 970 PRO, 1TB Addlink S20, 2TB MX500
Display(s) Acer Predator X34 34" Curved + 27" ASUS ROG PG279Q
Case Antec DF700 FLUX
Audio Device(s) ROG SupremeFX8-Channel High Definition Audio CODEC
Power Supply Corsair HX1000i (2023)
Mouse Logitech G604 Hero 25k
Keyboard EVGA Z20 Linear
Software Win10 22H2 / Mint 21.1
Benchmark Scores https://valid.x86.fr/ac31ky https://www.3dmark.com/fs/27840834 https://www.3dmark.com/spy/28533826
Oooo nice, is that 3x 12VHPWR? 3x chance of fire!
 
Joined
Jan 12, 2023
Messages
337 (0.42/day)
System Name IZALITH (or just "Lith")
Processor AMD Ryzen 7 7800X3D (4.2Ghz base, 5.0Ghz boost, -30 PBO offset)
Motherboard Gigabyte X670E Aorus Master Rev 1.0
Cooling Deepcool Gammaxx AG400 Single Tower
Memory Corsair Vengeance 64GB (2x32GB) 6000MHz CL40 DDR5 XMP (XMP enabled)
Video Card(s) PowerColor Radeon RX 7900 XTX Red Devil OC 24GB (2.39Ghz base, 2.99Ghz boost, -30 core offset)
Storage 2x1TB SSD, 2x2TB SSD, 2x 8TB HDD
Display(s) Samsung Odyssey G51C 27" QHD (1440p 165Hz) + Samsung Odyssey G3 24" FHD (1080p 165Hz)
Case Corsair 7000D Airflow Full Tower
Audio Device(s) Corsair HS55 Surround Wired Headset/LG Z407 Speaker Set
Power Supply Corsair HX1000 Platinum Modular (1000W)
Mouse Logitech G502 X LIGHTSPEED Wireless Gaming Mouse
Keyboard Keychron K4 Wireless Mechanical Keyboard
Software Arch Linux
I'm a layman for the Nvidia AI side of things: what's the better OS for AI workloads? Windows or Linux? I know on the AMD side it's Linux because of the ROCm support.
 
Joined
Jun 26, 2023
Messages
96 (0.15/day)
Processor 7800X3D @ Curve Optimizer: All Core: -25
Motherboard TUF Gaming B650-Plus
Memory 2xKSM48E40BD8KM-32HM ECC RAM (ECC enabled in BIOS)
Video Card(s) 4070 @ 110W
Display(s) SAMSUNG S95B 55" QD-OLED TV
Power Supply RM850x
Many say that for 3000 $ the DGX Spark (128GB RAM at 273 GB/s memory bandwidth) is DOA/obsolete already:
  • Arm CPU, not x86 CPU, so playing (certain) games is going to be an issue
  • Supposedly requires a specialized NVIDIA Arm OS
  • Can't run serious models like the real DeepSeek-R1 quants, because can't fit even the smallest quant ("DeepSeek-R1-IQ1_S.gguf" is 133.56GB)
  • The overpriced Framework Desktop using AMD Max+ 395 128GB is still 1000 $ cheaper for similar bandwidth (256 GB/s) and has all the advantages of x86 compatibility
  • It's a given that it can't be used for training (maybe models in the millions of parameters, not billions), but even fine-tuning it isn't going to be great with the 273 GB/s memory bandwidth (maybe small models)
  • Soon there are going to be many AMD Max+ 395 128GB x86 mini-PC devices at half the price with all the compatibility advantages that x86 brings
What would be really needed are 256GB RAM devices with, say, at least twice the memory bandwidth (>500GB/s) to be able to run serious LLM like the real DeepSeek-R1 (luckily as a MOE model, it runs faster than dense models at the same size).
Maybe when AMD and NVIDIA designed the 128 GB RAM, 256-bit, quad-channel, SOCs (and to also catch up no APPLE's offerings), DeepSeek-R1 wasn't a thing (though maybe at that time there was Llama-3.1-405B, but as a dense model it would run much slower), but it is now.
 
Joined
May 10, 2023
Messages
761 (1.12/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
Should've been RISC-V if they really wanted to be wild with it.
I don't think they wanted to be wild, just have good software support and slap their custom interconnect into a CPU. ARM already has a pretty good software support out there, especially in the python ecosystem.
Vera seems to be a custom core as well, unlike grace which was a neoverse design.
I'm a layman for the Nvidia AI side of things: what's the better OS for AI workloads? Windows or Linux? I know on the AMD side it's Linux because of the ROCm support.
Linux for sure. Even on windows Nvidia is recommending people to use WSL instead of trying to do things natively.
Many say that for 3000 $ the DGX Spark (128GB RAM at 273 GB/s memory bandwidth) is DOA/obsolete already:
  • Arm CPU, not x86 CPU, so playing (certain) games is going to be an issue
  • Supposedly requires a specialized NVIDIA Arm OS
  • Can't run serious models like the real DeepSeek-R1 quants, because can't fit even the smallest quant ("DeepSeek-R1-IQ1_S.gguf" is 133.56GB)
  • The overpriced Framework Desktop using AMD Max+ 395 128GB is still 1000 $ cheaper for similar bandwidth (256 GB/s) and has all the advantages of x86 compatibility
  • It's a given that it can't be used for training (maybe models in the millions of parameters, not billions), but even fine-tuning it isn't going to be great with the 273 GB/s memory bandwidth (maybe small models)
  • Soon there are going to be many AMD Max+ 395 128GB x86 mini-PC devices at half the price with all the compatibility advantages that x86 brings
What would be really needed are 256GB RAM devices with, say, at least twice the memory bandwidth (>500GB/s) to be able to run serious LLM like the real DeepSeek-R1 (luckily as a MOE model, it runs faster than dense models at the same size).
Maybe when AMD and NVIDIA designed the 128 GB RAM, 256-bit, quad-channel, SOCs (and to also catch up no APPLE's offerings), DeepSeek-R1 wasn't a thing (though maybe at that time there was Llama-3.1-405B, but as a dense model it would run much slower), but it is now.
Only saving point I see for spark is if you want to cluster multiple of those, given that it has connect-x. Not sure why one would spend that amount of money in such a setup, but it's an option.
 
Joined
Nov 23, 2023
Messages
269 (0.56/day)
I don't think they wanted to be wild, just have good software support and slap their custom interconnect into a CPU. ARM already has a pretty good software support out there, especially in the python ecosystem.
Vera seems to be a custom core as well, unlike grace which was a neoverse design.

Linux for sure. Even on windows Nvidia is recommending people to use WSL instead of trying to do things natively.

Only saving point I see for spark is if you want to cluster multiple of those, given that it has connect-x. Not sure why one would spend that amount of money in such a setup, but it's an option.
The networking's totally useless aside from loading/offloading onto a remote machine... which is pretty important when you're loading 100+ GB models, but it's something you only have to do once. For most people it's not worth the price.
 
Joined
May 10, 2023
Messages
761 (1.12/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
The networking's totally useless aside from loading/offloading onto a remote machine... which is pretty important when you're loading 100+ GB models, but it's something you only have to do once. For most people it's not worth the price.
I don't disagree with that. That level of networking connectivity (400Gbps on CX7) is pretty close to a full PCIe 5.0 x16 bandwidth, so it'd be somewhat similar to just slapping another 128GB GPU into your mix.
But yeah, at this price point it may make more sense to go with a proper GPU setup.
 
Joined
Jun 18, 2021
Messages
2,720 (1.98/day)
Should've been RISC-V if they really wanted to be wild with it.

RISC-V still has a LOT to mature, both in support and performance. Using it here would be the wrong kind of wild - as in completely nuts.

The overpriced Framework Desktop using AMD Max+ 395 128GB is still 1000 $ cheaper for similar bandwidth (256 GB/s) and has all the advantages of x86 compatibility

Counterpoint: CUDA and TensorRT. I think you make great points but I feel nvidia is leveraging their software stack and current market dominance to penny pinch their customers. AMD has a real chance to grab some market share but it still needs to just freaking do it!
 
Joined
Nov 23, 2023
Messages
269 (0.56/day)
RISC-V still has a LOT to mature, both in support and performance. Using it here would be the wrong kind of wild - as in completely nuts.
It's a specialized part anyways and ARM is also pretty nuts for a part as expensive as this.
Counterpoint: CUDA and TensorRT. I think you make great points but I feel nvidia is leveraging their software stack and current market dominance to penny pinch their customers. AMD has a real chance to grab some market share but it still needs to just freaking do it!
This is made for LLMs, they're not really for imagegen so CUDA doesn't have much benefit here unless there's some sort of hardware deficiency with RDNA 3.5. I'm sure some renderers will love the huge memory but otherwise I don't know of any AI application that needs this much memory that ROCm doesn't cover already...
 
Joined
May 10, 2023
Messages
761 (1.12/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
It's a specialized part anyways and ARM is also pretty nuts for a part as expensive as this.
I don't think that CPU is that "wild". The problem with risc-v is that there's simply no proper support for all the packages in the AI ecosystem.
There are no off-the-shelf wheels for pytorch or tensorflow, and that applies to most of the packages on pypi, as an example.
This is made for LLMs, they're not really for imagegen so CUDA doesn't have much benefit here unless there's some sort of hardware deficiency with RDNA 3.5. I'm sure some renderers will love the huge memory but otherwise I don't know of any AI application that needs this much memory that ROCm doesn't cover already...
CUDA makes thing way easier. ROCm is not that easy to deal with, and has major annoyances and lack of performance when it comes to the major frameworks/engines.
 
Joined
Nov 23, 2023
Messages
269 (0.56/day)
I don't think that CPU is that "wild". The problem with risc-v is that there's simply no proper support for all the packages in the AI ecosystem.
There are no off-the-shelf wheels for pytorch or tensorflow, and that applies to most of the packages on pypi, as an example.
Alright. You're right, but I still don't think using ARM was a great decision.
CUDA makes thing way easier. ROCm is not that easy to deal with, and has major annoyances and lack of performance when it comes to the major frameworks/engines.
On the AI side there's not really any problems with ROCm if the hardware's there. The hardware not being there is the actual problem, I'm not sure if 3.5 even has tensor cores. Other than that the features are all there, ROCm has flash attention, xformers, etc. For LLMs ROCm and CUDA are at parity now. I'd need some examples.
 
Top