Wednesday, July 8th 2015

AMD Announces FirePro S9170 32GB GPU Compute Card

AMD today announced the new AMD FirePro S9170 server GPU, the world's first and fastest 32GB single-GPU server card for DGEMM heavy double-precision workloads, with support for OpenCL 2.0. Based on the second-generation AMD Graphics Core Next (GCN) GPU architecture, this new addition to the AMD FirePro server GPU family is capable of delivering up to 5.24 TFLOPS of peak single precision compute performance while enabling full throughput double precision performance, providing up to 2.62 TFLOPS of peak double precision performance. Designed with compute-intensive workflows in mind, the AMD FirePro S9170 server GPU is ideal for data center managers who oversee clusters within academic or government bodies, oil and gas industries, or deep neural network compute cluster development.

"AMD is recognized as an HPC industry innovator as the graphics provider with the top spot on the November 2014 Green500 List. Today the best GPU for compute just got better with the introduction of the AMD FirePro S9170 server GPU to complement AMD's impressive array of server graphics offerings for high performance compute environments," said Sean Burke, corporate vice president and general manager, AMD Professional Graphics group. "The AMD FirePro S9170 server GPU can accelerate complex workloads in scientific computing, data analytics, or seismic processing, wielding an industry-leading 32GB of memory. We designed the new offering for supercomputers to achieve massive compute performance while maximizing available power budgets."
"There are some HPC workloads which require as much data as possible to stay resident on the device, and so the 32GB of memory provided by AMD FirePro S9170, the largest available on a single GPU, will enable the acceleration of scientific calculations that were previously impossible," said Simon McIntosh-Smith, head of the Microelectronics Research Group at the University of Bristol. "For example, our new OpenCL version of the SNAP transport code from Los Alamos National Laboratory needs to keep as much data resident on the device as possible, and so the 32GB of memory will let us run problems of a much more interesting size faster than ever before. The large memory, combined with the 320GB/s memory bandwidth and double precision floating point performance, will make the AMD FirePro S9170 server GPU a 'killer' solution device for many HPC applications."

The AMD FirePro S9170 offers unparalleled GPU compute performance and power efficiency with the following benefits:
  • With up to 2.62 TFLOPS of peak double precision performance, the AMD FirePro S9170 is the fastest single-GPU server card available for DGEMM heavy workloads, delivering up to 40% more performance than the competitive solution;
  • Support for 40% better double precision performance, while using 10% less power than the competition;
  • The AMD FirePro S9170 is the industry's first server GPU with 32GB ultra-fast GDDR5 on-board memory and features a 512-bit memory interface for 320 GB/s of memory bandwidth;
  • Equipped with 32GB of GDDR5 memory, the AMD FirePro S9170 GPU can accelerate memory-intensive applications and process larger and more computationally complex workloads with ease; and
  • The AMD FirePro S9170 GPU features 33% more memory than the competitive GPU, helping to improve overall workload speed and system responsiveness, especially when working with large amounts of data.
"Based on our company's development and integration of leading-edge high-performance solutions to support design optimization, testing operation, simulation and maintenance of complex systems, SILKAN can attest that the AMD FirePro S9170 server GPU is the most powerful on the market," said François Guérinea, chairman and CEO at SILKAN. "What makes the difference is the amount of memory, and accordingly we have chosen the AMD FirePro S9170 GPU with 32GB of memory as a necessity for a client's major project."

"By combining the new AMD FirePro S9170 server GPU with our G250 Series of 2U eight GPU systems, we are able to deliver HPC servers with up to 256GB of GPU memory," said Alex Liu, product marketing executive of the GIGABYTE Network & Comm. Division. "This unprecedented density of GDDR5 memory will help our customers offer very attractive machines to industries processing large datasets."

"ASUS is pleased to see the AMD FirePro S9170 server GPU complement the AMD FirePro S9150 server GPU, the November 2014 Green500 List-winning solution," said Robert Chin, General Manager, ASUS Server Business Unit. "We are seeing customers not only obtain significant results in the HPC arena, but also achieve power efficiency for more green environments. The arrival of the AMD FirePro S9170 server GPU sets a new milestone for the HPC market segment. As a top Green500 List provider and one of the greenest HPC building block providers, we're excited to see the AMD FirePro S9170 GPU featured in our top ASUS ESC product family to enable excellent performance matched with significant savings in total cost of ownership."

"We have been developing a fully-parallel computational tool based on the AMD GPU heterogeneous computing platform and OpenCL," said Omid Mahahadi, co-founder and director, Geomechanica Inc. "This tool accurately captures the complex physics of massive mines plus oil and gas fields rapidly and reliably. Thanks to the impressive 32GB of memory of the new cards, we expect to run computations on massive data structures containing tens of millions of data elements. The combination of rapid double-precision operations with the large memory capacity enables accurate, detailed, and reliable computations. A similar performance using CPUs would likely require much higher capital and maintenance costs. Moving forward, we plan to take advantage of the recent features of the OpenCL 2.0 open API to further enhance the performance of our software."

To address future needs for HPC development, AMD FirePro S9170 server GPU supports the latest version of OpenCL while ready for the OpenMP and OpenACC developer tools with planned availability in Q3 2015.
Add your own comment

17 Comments on AMD Announces FirePro S9170 32GB GPU Compute Card

#1
btarunr
Editor & Senior Moderator
"Grenada" based, 2816 SP, 32 GB 512-bit GDDR5, 2.6 TFLOP/s DPFP, 235W.
Posted on Reply
#2
Death Star
Holy balls... 32GB would make life so easy right about now. Currently struggling to fit into 16GB spread across two 390Xs.
Posted on Reply
#3
Petey Plane
Death StarHoly balls... 32GB would make life so easy right about now. Currently struggling to fit into 16GB spread across two 390Xs.
What are you doing what, exactly, that struggles with 16GB (which is really only 8GB, since DX11 only addresses the memory of one card in xfire/sli), much less 32GB, of VRAM?
Posted on Reply
#4
Death Star
Petey PlaneWhat are you doing what, exactly, that struggles with 16GB (which is really only 8GB, since DX11 only addresses the memory of one card in xfire/sli), much less 32GB, of VRAM?
3-D FDTD simulations of over-the-horizon radar propagation under different sea/atmospheric conditions using OpenCL. I actually haven't even run any games on them yet :(
Posted on Reply
#5
Ferrum Master
Death Star3-D FDTD simulations of over-the-horizon radar propagation under different sea/atmospheric conditions using OpenCL. I actually haven't even run any games on them yet :(
Does doing the job faster with such professional product even and justify the usual monster price?
Posted on Reply
#6
Death Star
Ferrum MasterDoes doing the job faster with such professional product even and justify the usual monster price?
TLDR: Due to time or complexity constraints, sometimes it makes sense to spend thousands of dollars on cards like this.

As with most things, it's usually situation-dependent. If a problem can't fit into a given amount of RAM, and that's all you have to work with, then the only solution is to get creative with memory management. Getting creative with memory management takes a long time to implement with large or complicated programs, and also adds additional complexity to the program. When working with both CPU and (multi) GPU memory spaces, it's not necessarily a trivial feat to manage memory. If you have an imminent deadline to meet for a customer, it may be more advantageous to save the more sophisticated programming to do during a period of downtime and just throw more brute force at it.

I currently have no plans to spend something like $5k-$10k on a GPU any time soon haha, but I can give an example of how it might help with my current project... Elevator pitch time...

I use a domain decomposition method to divide up a HUGE problem space into smaller sections. The whole space would require about 10.8 petabytes of RAM to store, and requires something in the range of 4.6*10^25 total math operations. Since FDTD takes discrete steps in time in order to propagate waves each subdomain must be simulated for a certain amount of time, and the time-history of the waves exiting the subdomain must be collected for injection into the neighboring subdomains (requires extra time steps to make sure all exiting waves are collected). While each subdomain require a large amount of storage relative to 8GB of RAM on my current cards, each time step does not require a huge number of calculations (but there are millions of time steps). Quadrupling the VRAM would allow for more optimal subdomain size in terms of keeping the GPU busy and reducing the cross-communication required between subdomains, which with 32GB of VRAM would speed up the total simulation time by about 40% in this particular case over a single 8GB card with the same GPU.

The remainder of this project is slated for 3 years, and there are many test cases to run. However the only way we'd ever spend 10's of thousands of dollars on multiple cards is if we somehow came in under budget and had to spend the money on something, or otherwise lose it.
Posted on Reply
#7
Unregistered
btarunr"Grenada" based, 2816 SP, 32 GB 512-bit GDDR5, 2.6 TFLOP/s DPFP, 235W.
And we were hearing GDDR5 has more increased power consumptions than HBM.... huh!
#8
Petey Plane
Death Star3-D FDTD simulations of over-the-horizon radar propagation under different sea/atmospheric conditions using OpenCL. I actually haven't even run any games on them yet :(
Neat. You're lucky that your applications don't need the driver support of the Fire/Quadro cards. That saves a ton of scratch.
Posted on Reply
#9
theeldest
Ferrum MasterDoes doing the job faster with such professional product even and justify the usual monster price?
Point of Interest: The big difference between the S9170 and the R9 390X is going from 1/8 DP to 1/2 DP. To handle the increased DP calcuations AMD (and nVidia for their cards) uses the best binned chips. A card running 100% DP calcs is very close to what FurMark does and will need to stay within thermal limits.

Intel has the same issue with AVX. When AVX is being used heavily the thermal load goes way up. On the latest Xeon Haswell chips Intel has limited the AVX speed to something less than the full speed of the rest of the core. This is to keep the chip within it's advertised thermals when used in this manner.

So regarding the original question, the pro version has 4x the DP throughput of the consumer card. So if that's your limiting factor it's easy to see how the S9170 would benefit your workload, only question is whether you've got budget for more hardware.
Posted on Reply
#10
Bansaku
But can it run Crysis? :p
Posted on Reply
#11
NC37
BansakuBut can it run Crysis? :p
Not as well as a consumer card. Which is how pro cards work. They aren't made for gaming, even though they can game.
Posted on Reply
#12
ShredBird
Petey PlaneWhat are you doing what, exactly, that struggles with 16GB (which is really only 8GB, since DX11 only addresses the memory of one card in xfire/sli), much less 32GB, of VRAM?
You can use all the RAM in compute scenarios with multiple cards, it's not limited to the same resource pool like traditional alternate frame rendering.
Posted on Reply
#13
yotano211
BansakuBut can it run Crysis? :p
*FIXED

But can it wash my Prius?
Posted on Reply
#14
btarunr
Editor & Senior Moderator
BansakuBut can it run Crysis? :p
Although it lacks display outputs, you can CrossFire a 390X with this card, and use it like a blind-hood*.

*in railways jargon, a blind-hood is a diesel-electric locomotive that lacks a driver's cab, and is solely built for multi-unit configs. It needs to be slaved to a loco with a cab, or it doesn't move.
Posted on Reply
#15
The Von Matrices
What is the memory configuration? I thought GDDR5 chips had a minimum of 16-bit width (clamshell), and the highest density chips are 4 Gbit, so on a 512-bit bus you can put at most (512/16)*4 = 64 Gbit = 16 GB. How do they get to 32 GB?

Edit: Never mind, I didn't realize that Samsung was selling 8Gbit chips.
Posted on Reply
#16
Arjai
I wonder how well one of these would do F@H?
:lovetpu:
Posted on Reply
#17
confuzyon
Most of the professional cards can't be used for F@H or if they can, they aren't optimized. The general idea being why create cores for cards nobody will use?
Posted on Reply
Add your own comment
Nov 13th, 2024 03:46 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts