• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Unveils World's First 7 nm GPUs - Radeon Instinct MI60, Instinct MI50

Raevenlord

News Editor
Joined
Aug 12, 2016
Messages
3,755 (1.24/day)
Location
Portugal
System Name The Ryzening
Processor AMD Ryzen 9 5900X
Motherboard MSI X570 MAG TOMAHAWK
Cooling Lian Li Galahad 360mm AIO
Memory 32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s) Gigabyte RTX 3070 Ti
Storage Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s) Acer Nitro VG270UP (1440p 144 Hz IPS)
Case Lian Li O11DX Dynamic White
Audio Device(s) iFi Audio Zen DAC
Power Supply Seasonic Focus+ 750 W
Mouse Cooler Master Masterkeys Lite L
Keyboard Cooler Master Masterkeys Lite L
Software Windows 10 x64
AMD today announced the AMD Radeon Instinct MI60 and MI50 accelerators, the world's first 7nm datacenter GPUs, designed to deliver the compute performance required for next-generation deep learning, HPC, cloud computing and rendering applications. Researchers, scientists and developers will use AMD Radeon Instinct accelerators to solve tough and interesting challenges, including large-scale simulations, climate change, computational biology, disease prevention and more.

"Legacy GPU architectures limit IT managers from effectively addressing the constantly evolving demands of processing and analyzing huge datasets for modern cloud datacenter workloads," said David Wang, senior vice president of engineering, Radeon Technologies Group at AMD. "Combining world-class performance and a flexible architecture with a robust software platform and the industry's leading-edge ROCm open software ecosystem, the new AMD Radeon Instinct accelerators provide the critical components needed to solve the most difficult cloud computing challenges today and into the future."





The AMD Radeon Instinct MI60 and MI50 accelerators feature flexible mixed-precision capabilities, powered by high-performance compute units that expand the types of workloads these accelerators can address, including a range of HPC and deep learning applications. The new AMD Radeon Instinct MI60 and MI50 accelerators were designed to efficiently process workloads such as rapidly training complex neural networks, delivering higher levels of floating-point performance, greater efficiencies and new features for datacenter and departmental deployments.

The AMD Radeon Instinct MI60 and MI50 accelerators provide ultra-fast floating-point performance and hyper-fast HBM2 (second-generation High-Bandwidth Memory) with up to 1 TB/s memory bandwidth speeds. They are also the first GPUs capable of supporting next-generation PCIe 4.02 interconnect, which is up to 2X faster than other x86 CPU-to-GPU interconnect technologies, and feature AMD Infinity Fabric Link GPU interconnect technology that enables GPU-to-GPU communications that are up to 6X faster than PCIe Gen 3 interconnect speeds.

AMD also announced a new version of the ROCm open software platform for accelerated computing that supports the architectural features of the new accelerators, including optimized deep learning operations (DLOPS) and the AMD Infinity Fabric Link GPU interconnect technology. Designed for scale, ROCm allows customers to deploy high-performance, energy-efficient heterogeneous computing systems in an open environment.

"Google believes that open source is good for everyone," said Rajat Monga, engineering director, TensorFlow, Google. "We've seen how helpful it can be to open source machine learning technology, and we're glad to see AMD embracing it. With the ROCm open software platform, TensorFlow users will benefit from GPU acceleration and a more robust open source machine learning ecosystem."

Key features of the AMD Radeon Instinct MI60 and MI50 accelerators include:
  • Optimized Deep Learning Operations: Provides flexible mixed-precision FP16, FP32 and INT4/INT8 capabilities to meet growing demand for dynamic and ever-changing workloads, from training complex neural networks to running inference against those trained networks.
  • World's Fastest Double Precision PCIe 2 Accelerator5: The AMD Radeon Instinct MI60 is the world's fastest double precision PCIe 4.0 capable accelerator, delivering up to 7.4 TFLOPS peak FP64 performance5 allowing scientists and researchers to more efficiently process HPC applications across a range of industries including life sciences, energy, finance, automotive, aerospace, academics, government, defense and more. The AMD Radeon Instinct MI50 delivers up to 6.7 TFLOPS FP64 peak performance1, while providing an efficient, cost-effective solution for a variety of deep learning workloads, as well as enabling high reuse in Virtual Desktop Infrastructure (VDI), Desktop-as-a-Service (DaaS) and cloud environments.
  • Up to 6X Faster Data Transfer: Two Infinity Fabric Links per GPU deliver up to 200 GB/s of peer-to-peer bandwidth - up to 6X faster than PCIe 3.0 alone4 - and enable the connection of up to 4 GPUs in a hive ring configuration (2 hives in 8 GPU servers).
  • Ultra-Fast HBM2 Memory: The AMD Radeon Instinct MI60 provides 32GB of HBM2 Error-correcting code (ECC) memory6, and the Radeon Instinct MI50 provides 16GB of HBM2 ECC memory. Both GPUs provide full-chip ECC and Reliability, Accessibility and Serviceability (RAS)7 technologies, which are critical to deliver more accurate compute results for large-scale HPC deployments.
  • Secure Virtualized Workload Support: AMD MxGPU Technology, the industry's only hardware-based GPU virtualization solution, which is based on the industry-standard SR-IOV (Single Root I/O Virtualization) technology, makes it difficult for hackers to attack at the hardware level, helping provide security for virtualized cloud deployments.

Updated ROCm Open Software Platform
AMD today also announced a new version of its ROCm open software platform designed to speed development of high-performance, energy-efficient heterogeneous computing systems. In addition to support for the new Radeon Instinct accelerators, ROCm software version 2.0 provides updated math libraries for the new DLOPS; support for 64-bit Linux operating systems including CentOS, RHEL and Ubuntu; optimizations of existing components; and support for the latest versions of the most popular deep learning frameworks, including TensorFlow 1.11, PyTorch (Caffe) and others. Learn more about ROCm 2.0 software here.

Availability
The AMD Radeon Instinct MI60 accelerator is expected to ship to datacenter customers by the end of 2018. The AMD Radeon Instinct MI50 accelerator is expected to begin shipping to data center customers by the end of Q1 2019. The ROCm 2.0 open software platform is expected to be available by the end of 2018.

View at TechPowerUp Main Site
 
Joined
Apr 18, 2015
Messages
234 (0.07/day)
7.4 TFlops, FP64, this normally doubles for FP32, so 14.8 Tflops for FP32.
This is pretty good, and pretty similar to 16.3 produced by Quadro RTX and slightly better than 2080Ti which has 13.4 TFlops.

BTW FP64 of the quadro according to TPU DB is 0.5TFlops, so this thing will compete in 32 bit calculations but run in circles around green camp in 64 bit.
 
Joined
Jul 10, 2011
Messages
797 (0.16/day)
Processor Intel
Motherboard MSI
Cooling Cooler Master
Memory Corsair
Video Card(s) Nvidia
Storage Western Digital/Kingston
Display(s) Samsung
Case Thermaltake
Audio Device(s) On Board
Power Supply Seasonic
Mouse Glorious
Keyboard UniKey
Software Windows 10 x64
7.4 TFlops, FP64, this normally doubles for FP32, so 14.8 Tflops for FP32.
This is pretty good, and pretty similar to 16.3 produced by Quadro RTX and slightly better than 2080Ti which has 13.4 TFlops.

BTW FP64 of the quadro according to TPU DB is 0.5TFlops, so this thing will compete in 32 bit calculations but run in circles around green camp in 64 bit.

Indeed it will run in circles like blind chicken without display outputs. Let's see how it performs against V100 or T4.
 
Joined
Feb 11, 2009
Messages
5,550 (0.96/day)
System Name Cyberline
Processor Intel Core i7 2600k -> 12600k
Motherboard Asus P8P67 LE Rev 3.0 -> Gigabyte Z690 Auros Elite DDR4
Cooling Tuniq Tower 120 -> Custom Watercoolingloop
Memory Corsair (4x2) 8gb 1600mhz -> Crucial (8x2) 16gb 3600mhz
Video Card(s) AMD RX480 -> RX7800XT
Storage Samsung 750 Evo 250gb SSD + WD 1tb x 2 + WD 2tb -> 2tb MVMe SSD
Display(s) Philips 32inch LPF5605H (television) -> Dell S3220DGF
Case antec 600 -> Thermaltake Tenor HTCP case
Audio Device(s) Focusrite 2i4 (USB)
Power Supply Seasonic 620watt 80+ Platinum
Mouse Elecom EX-G
Keyboard Rapoo V700
Software Windows 10 Pro 64bit
Indeed it will run in circles like blind chicken without display outputs. Let's see how it performs against V100 or T4.

Im sorry, what?
 
Joined
Nov 4, 2005
Messages
11,982 (1.72/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
Indeed it will run in circles like blind chicken without display outputs. Let's see how it performs against V100 or T4.


Nvidia V100 = 7.5 Tflop FP64
Nvidia T4 = 242Gflop FP64

So this is equal to a V100 or 32X faster in FP64 than a T4.
 
Joined
Jul 19, 2015
Messages
999 (0.29/day)
Processor Ryzen 5 5600 @ 4.65GHz CO -30
Motherboard AsRock X370 Taichi
Cooling Asus ROG Strix LC 240
Memory 32GB 4x8 G.SKILL Trident Z 3200 CL14 1.35V
Video Card(s) PCWINMAX RTX 3060 6GB Laptop GPU (80W)
Storage 1TB Kingston NV2
Display(s) LG 25UM57-P @ 75Hz OC
Case Fractal Design Arc XL
Audio Device(s) ATH-M20x
Power Supply Evga SuperNova 1300 G2
Mouse Evga Torq X3
Keyboard Thermaltake Challenger
Software Win 11 Pro 64-Bit
Dang, I wonder what kind of PPD these could get crunching Milkyway@Home with 7.5 TFLOPS FP64!
My three 7950's had around 3 TFLOPS put together. :laugh:
 
Joined
Jul 10, 2011
Messages
797 (0.16/day)
Processor Intel
Motherboard MSI
Cooling Cooler Master
Memory Corsair
Video Card(s) Nvidia
Storage Western Digital/Kingston
Display(s) Samsung
Case Thermaltake
Audio Device(s) On Board
Power Supply Seasonic
Mouse Glorious
Keyboard UniKey
Software Windows 10 x64
Joined
Aug 21, 2013
Messages
1,898 (0.46/day)
7.4 TFlops, FP64, this normally doubles for FP32, so 14.8 Tflops for FP32.
This is pretty good, and pretty similar to 16.3 produced by Quadro RTX and slightly better than 2080Ti which has 13.4 TFlops.

BTW FP64 of the quadro according to TPU DB is 0.5TFlops, so this thing will compete in 32 bit calculations but run in circles around green camp in 64 bit.
It it were only so. AMD GCN has always had pure FP32 troughput advantage over Nvidia but failed to convert i to meaningful performance advantage in games. For example RX580 has higher FP32 than GTX 1060 and only after years of driver releases it has become as fast as a GTX 1060. Considering pure FP32 numbers RX580 should compete with GTX 1070.
 
Joined
Feb 3, 2017
Messages
3,753 (1.32/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
7.4 TFlops, FP64, this normally doubles for FP32, so 14.8 Tflops for FP32.
This is pretty good, and pretty similar to 16.3 produced by Quadro RTX and slightly better than 2080Ti which has 13.4 TFlops.

BTW FP64 of the quadro according to TPU DB is 0.5TFlops, so this thing will compete in 32 bit calculations but run in circles around green camp in 64 bit.
AMD product/specs page is up: https://www.amd.com/en/products/professional-graphics/instinct-mi60
Other than FP64, seems to be a straightforward dieshrink of Vega 10. Main difference for the GPU itself is 20% higher peak clock - 1800MHz on MI60 instead of 1500MHz on MI25. AMD's quoted performance difference is also 20% which matches specs exactly. Twice the memory on twice as large bus is the other difference.
FP64 being 1:2 FP32 is new, Vega10 did not have that.

Btw, 2080Ti's 13.4 TFLOPs is at specced boost clock 1545 MHz... These usually boost more than that. Vegas so far tend to boost less than peak clock. We will have to wait and see how Vega20 behaves.
Quadro RTX 16.3 is Quadro RTX 6000 number at 1770 MHz which is probably a more realistic clock speed.
 
Joined
Jan 8, 2017
Messages
9,436 (3.28/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Btw, 2080Ti's 13.4 TFLOPs is at specced boost clock 1545 MHz... These usually boost more than that. Vegas so far tend to boost less than peak clock. We will have to wait and see how Vega20 behaves.
Quadro RTX 16.3 is Quadro RTX 6000 number at 1770 MHz which is probably a more realistic clock speed.

You are all comparing two products that operate within different markets and environments. Nvidia doesn't a have a Turing based Tesla equivalent, but if that would be the case it's clocks would suffer a significant downgrade as well in order to reach an optimum power consumption curve.
 
Joined
Apr 30, 2012
Messages
3,881 (0.85/day)
You are all comparing two products that operate within different markets and environments. Nvidia doesn't a have a Turing based Tesla equivalent, but if that would be the case it's clocks would suffer a significant downgrade as well in order to reach an optimum power consumption curve.

Should be compared currently to Tesla V100 PCIe then with what ever replaces that
 
Joined
Dec 22, 2011
Messages
3,890 (0.82/day)
Processor AMD Ryzen 7 3700X
Motherboard MSI MAG B550 TOMAHAWK
Cooling AMD Wraith Prism
Memory Team Group Dark Pro 8Pack Edition 3600Mhz CL16
Video Card(s) NVIDIA GeForce RTX 3080 FE
Storage Kingston A2000 1TB + Seagate HDD workhorse
Display(s) Samsung 50" QN94A Neo QLED
Case Antec 1200
Power Supply Seasonic Focus GX-850
Mouse Razer Deathadder Chroma
Keyboard Logitech UltraX
Software Windows 11
7.4 TFlops, FP64, this normally doubles for FP32, so 14.8 Tflops for FP32.
This is pretty good, and pretty similar to 16.3 produced by Quadro RTX and slightly better than 2080Ti which has 13.4 TFlops.

BTW FP64 of the quadro according to TPU DB is 0.5TFlops, so this thing will compete in 32 bit calculations but run in circles around green camp in 64 bit.

As mentioned already, Tesla V100 offered all this already quite some time ago, and from what I've been reading all within the same power envelope too.... no 7nm tech required, so nothing too exciting really, in fact rather disappointing.
 
Last edited:

crazyeyesreaper

Not a Moderator
Staff member
Joined
Mar 25, 2009
Messages
9,816 (1.71/day)
Location
04578
System Name Old reliable
Processor Intel 8700K @ 4.8 GHz
Motherboard MSI Z370 Gaming Pro Carbon AC
Cooling Custom Water
Memory 32 GB Crucial Ballistix 3666 MHz
Video Card(s) MSI RTX 3080 10GB Suprim X
Storage 3x SSDs 2x HDDs
Display(s) ASUS VG27AQL1A x2 2560x1440 8bit IPS
Case Thermaltake Core P3 TG
Audio Device(s) Samson Meteor Mic / Generic 2.1 / KRK KNS 6400 headset
Power Supply Zalman EBT-1000
Mouse Mionix NAOS 7000
Keyboard Mionix
Hmm same board power 300w between the 14nm MI25 and 7nm MI60.

So same power draw with a large clock speed bump going by FP32 and FP16 results in about a 17% uplift in theoretical performance.

Judging how similar the Vega 64 / Frontier edition is to the MI25 you can in theory apply 17% to those GPUs and that would be the perfect 100% scaling best case scenario for a 7nm Vega consumer GPU. In that best case scenario a 7nm VEGA consumer card would likely result in performance on par with a stock 1080 Ti / 2070 maybe a 2080 in AMD centric games.

That performance is not really good enough. So I would expect AMD to skip Vega 7nm for NAVI on the consumer front.
 
Last edited:
Joined
Nov 4, 2005
Messages
11,982 (1.72/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
As mentioned already, Tesla V100 offered all this already quite some time ago, and from what I've been reading all within the same power envelope too.... no 7nm tech required, so nothing too exciting really, in fact rather disappointing.


March 2018 is a long time ago? They talked about it in December of 2017, but it was only soft launched in March 2018 from what I know.
 
Joined
Dec 22, 2011
Messages
3,890 (0.82/day)
Processor AMD Ryzen 7 3700X
Motherboard MSI MAG B550 TOMAHAWK
Cooling AMD Wraith Prism
Memory Team Group Dark Pro 8Pack Edition 3600Mhz CL16
Video Card(s) NVIDIA GeForce RTX 3080 FE
Storage Kingston A2000 1TB + Seagate HDD workhorse
Display(s) Samsung 50" QN94A Neo QLED
Case Antec 1200
Power Supply Seasonic Focus GX-850
Mouse Razer Deathadder Chroma
Keyboard Logitech UltraX
Software Windows 11
March 2018 is a long time ago? They talked about it in December of 2017, but it was only soft launched in March 2018 from what I know.

And? I take it you take issue with this news. :laugh:
 

M2B

Joined
Jun 2, 2017
Messages
284 (0.10/day)
Location
Iran
Processor Intel Core i5-8600K @4.9GHz
Motherboard MSI Z370 Gaming Pro Carbon
Cooling Cooler Master MasterLiquid ML240L RGB
Memory XPG 8GBx2 - 3200MHz CL16
Video Card(s) Asus Strix GTX 1080 OC Edition 8G 11Gbps
Storage 2x Samsung 850 EVO 1TB
Display(s) BenQ PD3200U
Case Thermaltake View 71 Tempered Glass RGB Edition
Power Supply EVGA 650 P2
7.4 TFlops, FP64, this normally doubles for FP32, so 14.8 Tflops for FP32.
This is pretty good, and pretty similar to 16.3 produced by Quadro RTX and slightly better than 2080Ti which has 13.4 TFlops.

BTW FP64 of the quadro according to TPU DB is 0.5TFlops, so this thing will compete in 32 bit calculations but run in circles around green camp in 64 bit.


RTX 2070 with 7.5TFLOPS of FP32 performance and less memory bandwidth manages to beat RX Vega 64 by 10-15 percent which has 12.5TFLOPS of FP32 performance. So yeah...
 
Joined
Nov 4, 2005
Messages
11,982 (1.72/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
And? I take it you take issue with this news. :laugh:


Really I hate the "First PCIe 4.0 7Tflop bla bla bla" if it weren't for the PCIE 4.0 they couldn't say that, and PCIe 4.0 is currently unsupported in reality and isn't going to be supported before this card hits the market, so the spin on this proves they very carefully worded it and I would rather just see the numbers, is it going to take less than the 300W the Nvidia uses to get the same 7Tflop performance? Is it going to do some other fancy faster math? Is it going to do something more or the same at a lower cost.

AMD is trying really hard in the server market, but I don't think 2018 will be their year to take any crown, and neither will 2019. Maybe 2020 if they keep up with Zen2 and Navi is impressive. But that will also require thousands of hours to write the tools to make their supposed cards faster, or to make the same speed cards as fast and easy to use, which if Lisa is in the know she will already have people working on, but if not we will know the reason why they fail. Given AMD's vaporware issues, where they build hardware for software that isn't ready, or software that has great implementation of either ease of use, speed, or functionality, and you can only choose one......

I think AMD is playing their cards right for the midsize guys where a few IT guys run the show and want to save thousands to put into software development for long life peak performance, they will survive and their prosumer, gaming and server business will work out in the end, they will never be as big as Nvidia or Intel though. The same reason the Ford F-150 sells so many shitty trucks, its the king, its the classic standard of equal to the neighbors. AMD is the ful featured but still slightly odd Holden, the loud and hot Corvette versus the supercars, its second dog to the CPU and GPU business mostly due to mismanagement, Im just glad they are here to keep us from paying thousands more that Intel and Nvidia would charge if they could.
 
Joined
Mar 10, 2014
Messages
1,793 (0.46/day)
As mentioned already, Tesla V100 offered all this already quite some time ago, and from what I've been reading all within the same power envelope too.... no 7nm tech required, so nothing too exciting really, in fact rather disappointing.

Quadro GV100 has the same TFLops and 250W TDP, quite saddening that amd needs to drive 7nm ~331mm² chip at 300W tdp to equal that. They really need new arch.
 
Joined
Nov 3, 2013
Messages
2,141 (0.53/day)
Location
Serbia
Processor Ryzen 5600
Motherboard X570 I Aorus Pro
Cooling Deepcool AG400
Memory HyperX Fury 2 x 8GB 3200 CL16
Video Card(s) RX 6700 10GB SWFT 309
Storage SX8200 Pro 512 / NV2 512
Display(s) 24G2U
Case NR200P
Power Supply Ion SFX 650
Mouse G703 (TTC Gold 60M)
Keyboard Keychron V1 (Akko Matcha Green) / Apex m500 (Gateron milky yellow)
Software W10
Quadro GV100 has the same TFLops and 250W TDP, quite saddening that amd needs to drive 7nm ~331mm² chip at 300W tdp to equal that. They really need new arch.
Those would make sense if GV100 was the same size. But it isn't, it's 2.5 times larger.
 
Last edited:
Joined
Oct 1, 2006
Messages
4,931 (0.74/day)
Location
Hong Kong
Processor Core i7-12700k
Motherboard Z690 Aero G D4
Cooling Custom loop water, 3x 420 Rad
Video Card(s) RX 7900 XTX Phantom Gaming
Storage Plextor M10P 2TB
Display(s) InnoCN 27M2V
Case Thermaltake Level 20 XT
Audio Device(s) Soundblaster AE-5 Plus
Power Supply FSP Aurum PT 1200W
Software Windows 11 Pro 64-bit
Judging how similar the Vega 64 / Frontier edition is to the MI25 you can in theory apply 17% to those GPUs and that would be the perfect 100% scaling best case scenario for a 7nm Vega consumer GPU. In that best case scenario a 7nm VEGA consumer card would likely result in performance on par with a stock 1080 Ti / 2070 maybe a 2080 in AMD centric games.

That performance is not really good enough. So I would expect AMD to skip Vega 7nm for NAVI on the consumer front.
Vega 64 is already around the same performance level of the 2070.
Because the 2070 is around 1080 Non-Ti level.
 
Joined
Feb 3, 2017
Messages
3,753 (1.32/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
March 2018 is a long time ago? They talked about it in December of 2017, but it was only soft launched in March 2018 from what I know.
Tesla V100 came in summer 2017.
 
Joined
Feb 13, 2012
Messages
523 (0.11/day)
Really I hate the "First PCIe 4.0 7Tflop bla bla bla" if it weren't for the PCIE 4.0 they couldn't say that, and PCIe 4.0 is currently unsupported in reality and isn't going to be supported before this card hits the market, so the spin on this proves they very carefully worded it and I would rather just see the numbers, is it going to take less than the 300W the Nvidia uses to get the same 7Tflop performance? Is it going to do some other fancy faster math? Is it going to do something more or the same at a lower cost.

AMD is trying really hard in the server market, but I don't think 2018 will be their year to take any crown, and neither will 2019. Maybe 2020 if they keep up with Zen2 and Navi is impressive. But that will also require thousands of hours to write the tools to make their supposed cards faster, or to make the same speed cards as fast and easy to use, which if Lisa is in the know she will already have people working on, but if not we will know the reason why they fail. Given AMD's vaporware issues, where they build hardware for software that isn't ready, or software that has great implementation of either ease of use, speed, or functionality, and you can only choose one.......

They dont need to win the crown, they just simply need to increase their market share and simply become relevant in that space, and even with epyc being far superior to anythint intel has to offer, its impractical and delusional in the first place to expect it to increase market share to be over what intel capitalizes as business investments in such platforms take long time in planning. This instinct card also isnt made to win crowns in the first place, it simply extends their epyc portfolio where whoever invests in epyc can have an extensive choice of solutions in case they contract AMD. In order to win crowns and compete in market share AMD need to keep this trend of competitive portfolio consistent.

But one thing that AMD does deserve credit for is that for the past couple of years they have been doing an excellent job executing their moves, and seem to be headed in the right direction.
 
Joined
Jul 9, 2015
Messages
3,413 (1.00/day)
System Name M3401 notebook
Processor 5600H
Motherboard NA
Memory 16GB
Video Card(s) 3050
Storage 500GB SSD
Display(s) 14" OLED screen of the laptop
Software Windows 10
Benchmark Scores 3050 scores good 15-20% lower than average, despite ASUS's claims that it has uber cooling.
Joined
Feb 3, 2017
Messages
3,753 (1.32/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
So saddening AMD needs 331mm² 7nm 300W chip to take on 300W 815mm2 12nm chip.
AMD's comparison's on the slides are with the PCIe V100 - a 250W TDP card.

Conveniently, their comparisons are marketing material worthy. For example, RESNET-50 Training V100's 357 vs MI60's 334 (images per second) where MI60 has "comparable performance". I wonder what could a GPU do if it had spent some die space to add dedicated hardware units for something like that. Lets call these hardware units, say, Tensor Cores? Nvidia's RESNET-50 Training numbers for V100 are in the same range for CUDA cores and 1000-ish on Tensor Cores :D
 
Last edited:
Joined
Jul 9, 2015
Messages
3,413 (1.00/day)
System Name M3401 notebook
Processor 5600H
Motherboard NA
Memory 16GB
Video Card(s) 3050
Storage 500GB SSD
Display(s) 14" OLED screen of the laptop
Software Windows 10
Benchmark Scores 3050 scores good 15-20% lower than average, despite ASUS's claims that it has uber cooling.
Conveniently... [mental gymnastics on nVidia greatness}...
Yes, Huang is great, Amen to that.

But back to the point, it's 337mm² 7nm chip vs 815mm² 12nm chip, both have similar TDP, there is nothing to be sad about.

die space to add dedicated hardware units for something
Yeah. For something. But apparently, that's not much die space, GV100 has 1.4 times more CUDA cores, with 33% bigger die (and only a tiny bit improved process):

1541584343831.png


1000-ish on Tensor Core
Yeah, brought to you by "1060 is muh faster than 480". Actual tests show something like this:

1541584631251.png
 
Last edited:
Top