• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Announces FirePro S9170 32GB GPU Compute Card

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,244 (7.54/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
AMD today announced the new AMD FirePro S9170 server GPU, the world's first and fastest 32GB single-GPU server card for DGEMM heavy double-precision workloads, with support for OpenCL 2.0. Based on the second-generation AMD Graphics Core Next (GCN) GPU architecture, this new addition to the AMD FirePro server GPU family is capable of delivering up to 5.24 TFLOPS of peak single precision compute performance while enabling full throughput double precision performance, providing up to 2.62 TFLOPS of peak double precision performance. Designed with compute-intensive workflows in mind, the AMD FirePro S9170 server GPU is ideal for data center managers who oversee clusters within academic or government bodies, oil and gas industries, or deep neural network compute cluster development.

"AMD is recognized as an HPC industry innovator as the graphics provider with the top spot on the November 2014 Green500 List. Today the best GPU for compute just got better with the introduction of the AMD FirePro S9170 server GPU to complement AMD's impressive array of server graphics offerings for high performance compute environments," said Sean Burke, corporate vice president and general manager, AMD Professional Graphics group. "The AMD FirePro S9170 server GPU can accelerate complex workloads in scientific computing, data analytics, or seismic processing, wielding an industry-leading 32GB of memory. We designed the new offering for supercomputers to achieve massive compute performance while maximizing available power budgets."



"There are some HPC workloads which require as much data as possible to stay resident on the device, and so the 32GB of memory provided by AMD FirePro S9170, the largest available on a single GPU, will enable the acceleration of scientific calculations that were previously impossible," said Simon McIntosh-Smith, head of the Microelectronics Research Group at the University of Bristol. "For example, our new OpenCL version of the SNAP transport code from Los Alamos National Laboratory needs to keep as much data resident on the device as possible, and so the 32GB of memory will let us run problems of a much more interesting size faster than ever before. The large memory, combined with the 320GB/s memory bandwidth and double precision floating point performance, will make the AMD FirePro S9170 server GPU a 'killer' solution device for many HPC applications."

The AMD FirePro S9170 offers unparalleled GPU compute performance and power efficiency with the following benefits:
  • With up to 2.62 TFLOPS of peak double precision performance, the AMD FirePro S9170 is the fastest single-GPU server card available for DGEMM heavy workloads, delivering up to 40% more performance than the competitive solution;
  • Support for 40% better double precision performance, while using 10% less power than the competition;
  • The AMD FirePro S9170 is the industry's first server GPU with 32GB ultra-fast GDDR5 on-board memory and features a 512-bit memory interface for 320 GB/s of memory bandwidth;
  • Equipped with 32GB of GDDR5 memory, the AMD FirePro S9170 GPU can accelerate memory-intensive applications and process larger and more computationally complex workloads with ease; and
  • The AMD FirePro S9170 GPU features 33% more memory than the competitive GPU, helping to improve overall workload speed and system responsiveness, especially when working with large amounts of data.
"Based on our company's development and integration of leading-edge high-performance solutions to support design optimization, testing operation, simulation and maintenance of complex systems, SILKAN can attest that the AMD FirePro S9170 server GPU is the most powerful on the market," said François Guérinea, chairman and CEO at SILKAN. "What makes the difference is the amount of memory, and accordingly we have chosen the AMD FirePro S9170 GPU with 32GB of memory as a necessity for a client's major project."

"By combining the new AMD FirePro S9170 server GPU with our G250 Series of 2U eight GPU systems, we are able to deliver HPC servers with up to 256GB of GPU memory," said Alex Liu, product marketing executive of the GIGABYTE Network & Comm. Division. "This unprecedented density of GDDR5 memory will help our customers offer very attractive machines to industries processing large datasets."

"ASUS is pleased to see the AMD FirePro S9170 server GPU complement the AMD FirePro S9150 server GPU, the November 2014 Green500 List-winning solution," said Robert Chin, General Manager, ASUS Server Business Unit. "We are seeing customers not only obtain significant results in the HPC arena, but also achieve power efficiency for more green environments. The arrival of the AMD FirePro S9170 server GPU sets a new milestone for the HPC market segment. As a top Green500 List provider and one of the greenest HPC building block providers, we're excited to see the AMD FirePro S9170 GPU featured in our top ASUS ESC product family to enable excellent performance matched with significant savings in total cost of ownership."

"We have been developing a fully-parallel computational tool based on the AMD GPU heterogeneous computing platform and OpenCL," said Omid Mahahadi, co-founder and director, Geomechanica Inc. "This tool accurately captures the complex physics of massive mines plus oil and gas fields rapidly and reliably. Thanks to the impressive 32GB of memory of the new cards, we expect to run computations on massive data structures containing tens of millions of data elements. The combination of rapid double-precision operations with the large memory capacity enables accurate, detailed, and reliable computations. A similar performance using CPUs would likely require much higher capital and maintenance costs. Moving forward, we plan to take advantage of the recent features of the OpenCL 2.0 open API to further enhance the performance of our software."

To address future needs for HPC development, AMD FirePro S9170 server GPU supports the latest version of OpenCL while ready for the OpenMP and OpenACC developer tools with planned availability in Q3 2015.

View at TechPowerUp Main Site
 

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,244 (7.54/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
"Grenada" based, 2816 SP, 32 GB 512-bit GDDR5, 2.6 TFLOP/s DPFP, 235W.
 
Joined
Nov 8, 2005
Messages
47 (0.01/day)
Processor Haswell i7 4770
Motherboard Asus Z87-PRO
Memory 32GB DDR3-2133 10-10-10-30
Video Card(s) 2x Radeon R9 390X
Storage Samsung SSD M840 Pro 256GB, 4x320GB mechanical RAID 5
Holy balls... 32GB would make life so easy right about now. Currently struggling to fit into 16GB spread across two 390Xs.
 
Joined
Jul 7, 2014
Messages
152 (0.04/day)
Location
Columbia, SC
Processor Intel 2500k OCed at 4.6ghz
Motherboard Intel Z77
Cooling Thermalright Macho Rev.A
Memory 8GB G.Skill 2133
Video Card(s) Gigabyte GTX 670 Windforce 3X OCed at 1050mhz base and 1600mhz vram
Storage Mushkin Enhance 256gb SSD, Western Digital 750gb and 3TB HHDs
Display(s) Asus 24" 1080p
Case Lian-Li Mid Tower
Audio Device(s) Mobo sound
Power Supply SeaSonic 560 watt gold
Mouse Logitec 3 button laser mouse
Keyboard Das Keyboard Model S (the blank key model)
Software Windows 8.1 64 bit
Holy balls... 32GB would make life so easy right about now. Currently struggling to fit into 16GB spread across two 390Xs.

What are you doing what, exactly, that struggles with 16GB (which is really only 8GB, since DX11 only addresses the memory of one card in xfire/sli), much less 32GB, of VRAM?
 
Joined
Nov 8, 2005
Messages
47 (0.01/day)
Processor Haswell i7 4770
Motherboard Asus Z87-PRO
Memory 32GB DDR3-2133 10-10-10-30
Video Card(s) 2x Radeon R9 390X
Storage Samsung SSD M840 Pro 256GB, 4x320GB mechanical RAID 5
What are you doing what, exactly, that struggles with 16GB (which is really only 8GB, since DX11 only addresses the memory of one card in xfire/sli), much less 32GB, of VRAM?

3-D FDTD simulations of over-the-horizon radar propagation under different sea/atmospheric conditions using OpenCL. I actually haven't even run any games on them yet :(
 
Joined
Nov 18, 2010
Messages
7,562 (1.47/day)
Location
Rīga, Latvia
System Name HELLSTAR
Processor AMD RYZEN 9 5950X
Motherboard ASUS Strix X570-E
Cooling 2x 360 + 280 rads. 3x Gentle Typhoons, 3x Phanteks T30, 2x TT T140 . EK-Quantum Momentum Monoblock.
Memory 4x8GB G.SKILL Trident Z RGB F4-4133C19D-16GTZR 14-16-12-30-44
Video Card(s) Sapphire Pulse RX 7900XTX. Water block. Crossflashed.
Storage Optane 900P[Fedora] + WD BLACK SN850X 4TB + 750 EVO 500GB + 1TB 980PRO+SN560 1TB(W11)
Display(s) Philips PHL BDM3270 + Acer XV242Y
Case Lian Li O11 Dynamic EVO
Audio Device(s) SMSL RAW-MDA1 DAC
Power Supply Fractal Design Newton R3 1000W
Mouse Razer Basilisk
Keyboard Razer BlackWidow V3 - Yellow Switch
Software FEDORA 41
3-D FDTD simulations of over-the-horizon radar propagation under different sea/atmospheric conditions using OpenCL. I actually haven't even run any games on them yet :(

Does doing the job faster with such professional product even and justify the usual monster price?
 
Joined
Nov 8, 2005
Messages
47 (0.01/day)
Processor Haswell i7 4770
Motherboard Asus Z87-PRO
Memory 32GB DDR3-2133 10-10-10-30
Video Card(s) 2x Radeon R9 390X
Storage Samsung SSD M840 Pro 256GB, 4x320GB mechanical RAID 5
Does doing the job faster with such professional product even and justify the usual monster price?

TLDR: Due to time or complexity constraints, sometimes it makes sense to spend thousands of dollars on cards like this.

As with most things, it's usually situation-dependent. If a problem can't fit into a given amount of RAM, and that's all you have to work with, then the only solution is to get creative with memory management. Getting creative with memory management takes a long time to implement with large or complicated programs, and also adds additional complexity to the program. When working with both CPU and (multi) GPU memory spaces, it's not necessarily a trivial feat to manage memory. If you have an imminent deadline to meet for a customer, it may be more advantageous to save the more sophisticated programming to do during a period of downtime and just throw more brute force at it.

I currently have no plans to spend something like $5k-$10k on a GPU any time soon haha, but I can give an example of how it might help with my current project... Elevator pitch time...

I use a domain decomposition method to divide up a HUGE problem space into smaller sections. The whole space would require about 10.8 petabytes of RAM to store, and requires something in the range of 4.6*10^25 total math operations. Since FDTD takes discrete steps in time in order to propagate waves each subdomain must be simulated for a certain amount of time, and the time-history of the waves exiting the subdomain must be collected for injection into the neighboring subdomains (requires extra time steps to make sure all exiting waves are collected). While each subdomain require a large amount of storage relative to 8GB of RAM on my current cards, each time step does not require a huge number of calculations (but there are millions of time steps). Quadrupling the VRAM would allow for more optimal subdomain size in terms of keeping the GPU busy and reducing the cross-communication required between subdomains, which with 32GB of VRAM would speed up the total simulation time by about 40% in this particular case over a single 8GB card with the same GPU.

The remainder of this project is slated for 3 years, and there are many test cases to run. However the only way we'd ever spend 10's of thousands of dollars on multiple cards is if we somehow came in under budget and had to spend the money on something, or otherwise lose it.
 
Joined
Jul 7, 2014
Messages
152 (0.04/day)
Location
Columbia, SC
Processor Intel 2500k OCed at 4.6ghz
Motherboard Intel Z77
Cooling Thermalright Macho Rev.A
Memory 8GB G.Skill 2133
Video Card(s) Gigabyte GTX 670 Windforce 3X OCed at 1050mhz base and 1600mhz vram
Storage Mushkin Enhance 256gb SSD, Western Digital 750gb and 3TB HHDs
Display(s) Asus 24" 1080p
Case Lian-Li Mid Tower
Audio Device(s) Mobo sound
Power Supply SeaSonic 560 watt gold
Mouse Logitec 3 button laser mouse
Keyboard Das Keyboard Model S (the blank key model)
Software Windows 8.1 64 bit
3-D FDTD simulations of over-the-horizon radar propagation under different sea/atmospheric conditions using OpenCL. I actually haven't even run any games on them yet :(

Neat. You're lucky that your applications don't need the driver support of the Fire/Quadro cards. That saves a ton of scratch.
 
Joined
Feb 7, 2006
Messages
739 (0.11/day)
Location
Austin, TX
System Name WAZAAM!
Processor AMD Ryzen 3900x
Motherboard ASRock Fatal1ty X370 Pro Gaming
Cooling Kraken x62
Memory G.Skill 16GB 3200 MHz
Video Card(s) EVGA GeForce GTX 1070 8GB SC
Storage Micron 9200 Max
Display(s) Samsung 49" 5120x1440 120hz
Case Corsair 600D
Audio Device(s) Onboard - Bose Companion 2 Speakers
Power Supply CORSAIR Professional Series HX850
Keyboard Corsair K95 RGB
Software Windows 10 Pro
Does doing the job faster with such professional product even and justify the usual monster price?

Point of Interest: The big difference between the S9170 and the R9 390X is going from 1/8 DP to 1/2 DP. To handle the increased DP calcuations AMD (and nVidia for their cards) uses the best binned chips. A card running 100% DP calcs is very close to what FurMark does and will need to stay within thermal limits.

Intel has the same issue with AVX. When AVX is being used heavily the thermal load goes way up. On the latest Xeon Haswell chips Intel has limited the AVX speed to something less than the full speed of the rest of the core. This is to keep the chip within it's advertised thermals when used in this manner.

So regarding the original question, the pro version has 4x the DP throughput of the consumer card. So if that's your limiting factor it's easy to see how the S9170 would benefit your workload, only question is whether you've got budget for more hardware.
 
Joined
Apr 19, 2013
Messages
296 (0.07/day)
System Name Darkside
Processor R7 3700X
Motherboard Aorus Elite X570
Cooling Deepcool Gammaxx l240
Memory Thermaltake Toughram DDR4 3600MHz CL18
Video Card(s) Gigabyte RX Vega 64 Gaming OC
Storage ADATA & WD 500GB NVME PCIe 3.0, many WD Black 1-3TB HD
Display(s) Samsung C27JG5x
Case Thermaltake Level 20 XL
Audio Device(s) iFi xDSD / micro iTube2 / micro iCAN SE
Power Supply EVGA 750W G2
Mouse Corsair M65
Keyboard Corsair K70 LUX RGB
Benchmark Scores Not sure, don't care
But can it run Crysis? :p
 
Joined
Oct 30, 2008
Messages
1,768 (0.30/day)
System Name Lailalo
Processor Ryzen 9 5900X Boosts to 4.95Ghz
Motherboard Asus TUF Gaming X570-Plus (WIFI
Cooling Noctua
Memory 32GB DDR4 3200 Corsair Vengeance
Video Card(s) XFX 7900XT 20GB
Storage Samsung 970 Pro Plus 1TB, Crucial 1TB MX500 SSD, Segate 3TB
Display(s) LG Ultrawide 29in @ 2560x1080
Case Coolermaster Storm Sniper
Power Supply XPG 1000W
Mouse G602
Keyboard G510s
Software Windows 10 Pro / Windows 10 Home
But can it run Crysis? :p

Not as well as a consumer card. Which is how pro cards work. They aren't made for gaming, even though they can game.
 
Joined
Aug 22, 2014
Messages
39 (0.01/day)
System Name Bird's Monolith
Processor Intel i7-4770k 4.6 GHz (liquid metal)
Motherboard Asrock Z87 Extreme3
Cooling Noctua NH-D14, Noctua 140mm Case Fans
Memory 16 GB G-Skill Trident-X DDR3 2400 CAS 9
Video Card(s) EVGA 1080ti SC2 Hybrid
Storage 2 TB Mushkin Triactor 3D (RAID 0)
Display(s) Dell S2716DG / Samsung Q80R QLED TV
Case Fractal Design Define R4
Audio Device(s) Audio Engine D1 DAC, A5+ Speakers, SteelSeries Arctis 7
Power Supply Seasonic Platinum 660 W
Mouse SteelSeries Rival
Keyboard SteelSeries Apex
Software Windows 10 Pro x64
What are you doing what, exactly, that struggles with 16GB (which is really only 8GB, since DX11 only addresses the memory of one card in xfire/sli), much less 32GB, of VRAM?

You can use all the RAM in compute scenarios with multiple cards, it's not limited to the same resource pool like traditional alternate frame rendering.
 
Joined
Feb 18, 2012
Messages
2,715 (0.58/day)
System Name MSI GP76
Processor intel i7 11800h
Cooling 2 laptop fans
Memory 32gb of 3000mhz DDR4
Video Card(s) Nvidia 3070
Storage x2 PNY 8tb cs2130 m.2 SSD--16tb of space
Display(s) 17.3" IPS 1920x1080 240Hz
Power Supply 280w laptop power supply
Mouse Logitech m705
Keyboard laptop keyboard
Software lots of movies and Windows 10 with win 7 shell
Benchmark Scores Good enough for me

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,244 (7.54/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
But can it run Crysis? :p

Although it lacks display outputs, you can CrossFire a 390X with this card, and use it like a blind-hood*.

*in railways jargon, a blind-hood is a diesel-electric locomotive that lacks a driver's cab, and is solely built for multi-unit configs. It needs to be slaved to a loco with a cab, or it doesn't move.
 
Joined
Dec 16, 2010
Messages
1,668 (0.33/day)
Location
State College, PA, US
System Name My Surround PC
Processor AMD Ryzen 9 7950X3D
Motherboard ASUS STRIX X670E-F
Cooling Swiftech MCP35X / EK Quantum CPU / Alphacool GPU / XSPC 480mm w/ Corsair Fans
Memory 96GB (2 x 48 GB) G.Skill DDR5-6000 CL30
Video Card(s) MSI NVIDIA GeForce RTX 4090 Suprim X 24GB
Storage WD SN850 2TB, Samsung PM981a 1TB, 4 x 4TB + 1 x 10TB HGST NAS HDD for Windows Storage Spaces
Display(s) 2 x Viotek GFI27QXA 27" 4K 120Hz + LG UH850 4K 60Hz + HMD
Case NZXT Source 530
Audio Device(s) Sony MDR-7506 / Logitech Z-5500 5.1
Power Supply Corsair RM1000x 1 kW
Mouse Patriot Viper V560
Keyboard Corsair K100
VR HMD HP Reverb G2
Software Windows 11 Pro x64
Benchmark Scores Mellanox ConnectX-3 10 Gb/s Fiber Network Card
What is the memory configuration? I thought GDDR5 chips had a minimum of 16-bit width (clamshell), and the highest density chips are 4 Gbit, so on a 512-bit bus you can put at most (512/16)*4 = 64 Gbit = 16 GB. How do they get to 32 GB?

Edit: Never mind, I didn't realize that Samsung was selling 8Gbit chips.
 
Joined
Apr 3, 2012
Messages
4,373 (0.95/day)
Location
St. Paul, MN
System Name Bay2- Lowerbay/ HP 3770/T3500-2+T3500-3+T3500-4/ Opti-Con/Orange/White/Grey
Processor i3 2120's/ i7 3770/ x5670's/ i5 2400/Ryzen 2700/Ryzen 2700/R7 3700x
Motherboard HP UltraSlim's/ HP mid size/ Dell T3500 workstation's/ Dell 390/B450 AorusM/B450 AorusM/B550 AorusM
Cooling All stock coolers/Grey has an H-60
Memory 2GB/ 4GB/ 12 GB 3 chan/ 4GB sammy/T-Force 16GB 3200/XPG 16GB 3000/Ballistic 3600 16GB
Video Card(s) HD2000's/ HD 2000/ 1 MSI GT710,2x MSI R7 240's/ HD4000/ Red Dragon 580/Sapphire 580/Sapphire 580
Storage ?HDD's/ 500 GB-er's/ 500 GB/2.5 Samsung 500GB HDD+WD Black 1TB/ WD Black 500GB M.2/Corsair MP600 M.2
Display(s) 1920x1080/ ViewSonic VX24568 between the rest/1080p TV-Grey
Case HP 8200 UltraSlim's/ HP 8200 mid tower/Dell T3500's/ Dell 390/SilverStone Kublai KL06/NZXT H510 W x2
Audio Device(s) Sonic Master/ onboard's/ Beeper's!
Power Supply 19.5 volt bricks/ Dell PSU/ 525W sumptin/ same/Seasonic 750 80+Gold/EVGA 500 80+/Antec 650 80+Gold
Mouse cheap GigaWire930, CMStorm Havoc + Logitech M510 wireless/iGear usb x2/MX 900 wireless kit 4 Grey
Keyboard Dynex, 2 no name, SYX and a Logitech. All full sized and USB. MX900 kit for Grey
Software Mint 18 Sylvia/ Opti-Con Mint KDE/ T3500's on Kubuntu/HP 3770 is Win 10/Win 10 Pro/Win 10 Pro/Win10
Benchmark Scores World Community Grid is my benchmark!!
I wonder how well one of these would do F@H?
:lovetpu:
 

confuzyon

New Member
Joined
Jun 7, 2013
Messages
4 (0.00/day)
Most of the professional cards can't be used for F@H or if they can, they aren't optimized. The general idea being why create cores for cards nobody will use?
 
Top