• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

AMD Corporate Fellow Phil Rogers Jumps Ship to NVIDIA

Hehehe...

8.2 TFLOPs double-precision from 100w is pretty phenomenal. Xeon Phi draws a lot more power for only 1 TFLOP double-precision. Fury X is 8.6 single-precision weighing in at, what, 275w? Granted, these numbers will naturally improve with the move to 14-16nm.
It's like reading about car running on nuclear energy.
 
I love these threads, they bring out all the strange fanboys...It's like watching pro wrestling, but with a lot less brains.

AMD is in trouble, no doubt about that, but they've been there before. (they've pulled a rabbit out the hat before)
Nvidia has got the better product, at the moment, but that does not make them infallible. (they've screwed up before)

See, this what a non-fanboy reads from the present situation...but please do keep up the red vs green fight, it's more fun than watching kindergarteners fight :D
 
Hehehe...

8.2 TFLOPs double-precision from 100w is pretty phenomenal. Xeon Phi draws a lot more power for only 1 TFLOP double-precision. Fury X is 8.6 single-precision weighing in at, what, 275w? Granted, these numbers will naturally improve with the move to 14-16nm.

20watt ceiling more then their current chip PEZY-SC with x3 the cores.

Logic Cores(PE) 1,024
Core Frequency 733MHz
Peak Performance Floating Point  Single 3.0TFlops / Double 1.5TFlops
Host Interface PCI Express GEN3.0 x8Lane x 4Port (x16 bifurcation available)
JESD204B Protocol support
DRAM Interface
DDR4, DDR3 combo 64bit x 8Port Max B/W 1533.6GB/s
+Ultra WIDE IO SDRAM (2,048bit) x 2Port Max B/W 102.4GB/s

They announced it in Feb 2015 for a release date in 2016.

They also have plans for a PEZY-SC3 & 4

PEZY-SC3
8192 core in 10nm technology 2018

PEZY-SC4
16384 core in 7nm technology 2020
 
I love these threads, they bring out all the strange fanboys...It's like watching pro wrestling, but with a lot less brains.

AMD is in trouble, no doubt about that, but they've been there before. (they've pulled a rabbit out the hat before)
Nvidia has got the better product, at the moment, but that does not make them infallible. (they've screwed up before)

See, this what a non-fanboy reads from the present situation...but please do keep up the red vs green fight, it's more fun than watching kindergarteners fight :D
You are also a fanboy, a fanboy of yourself :p
 
8.2 TFLOPs double-precision from 100w is pretty phenomenal. Xeon Phi draws a lot more power for only 1 TFLOP double-precision. Fury X is 8.6 single-precision weighing in at, what, 275w? Granted, these numbers will naturally improve with the move to 14-16nm.
PEZY-SC isn't really comparable to Fury X or any 3D consumer graphics card. As I've previously noted, PEZY lacks a 3D graphics pipeline ( no rasterization, tessellation, geometry, hull, pixel shading etc.). Stripping out 3D functionality allows for a compute heavy - and shorter pipeline. The GK 210 evolution of GK 110 very likely points to Nvidia bifurcating their future GPU tech - one line pursuing 3D consumer/workstation graphics, one line devoted to math co-processing. As for Xeon Phi, you're still looking at x86 cores which are pretty damn big and power hungry in comparison to simple shader module blocks, MIPS cores, and ARM.
 
Yeah, which is why I mentioned Xeon Phi which is a lot similar. It's performance is much lower though. Intel might have to go back to the drawing board and trim the fat from x86 to make it competitive with ARM. x86 has always really neglected FLOPs and focused on specialized instructions.
 
Yeah, which is why I mentioned Xeon Phi which is a lot similar. It's performance is much lower though. Intel might have to go back to the drawing board and trim the fat from x86 to make it competitive with ARM. x86 has always really neglected FLOPs and focused on specialized instructions.
Yep, that x86 overhead tax is a bitch. Still hard to see Intel deviating too far from their well beaten track even with programmers complaining about complexities in regard to Xeon Phi's coding in relation to other GPGPU ecosystems.
 
You are also a fanboy, a fanboy of yourself :p
I'm just trying to add a bit of sanity to the rabid barking of crazed red team/green team jingoism, if you felt it was directed at you...well, it's no fault of mine, is it?

(most posts in these threads are factual and on point, I'm only aiming at the guys who thinks red vs green is more important than any other aspect of our hobby/living)
 
The rats are leaving the sinking ship??
 
Pascal is coming out and its more in-line with what HSA is working towards. Nvidia wants to see if it can benefit from that going forward.

Nvidia doesn't have a X86 so it will have to fight an ARMs race. Qualcomm just announced its intentions and PEZY-SC2 is coming.

Main specifications are as follows of "PEZY-SC2" being planned at the moment.
  • 製造プロセス:14-16nm FinFET Manufacturing process: 14-16nm FinFET
  • ダイサイズ:400-500mm2 Die size: 400-500mm2
  • 動作周波数:1.0GHz Operating frequency: 1.0GHz
  • 搭載独自コア数:4,096 Equipped with its own core number: 4,096
  • 演算性能:8.2TFLOPS(倍精度)/16.4TFLOPS(単精度) Computing performance: 8.2TFLOPS (double precision) /16.4TFLOPS (single precision)
  • 内臓CPU:デバッグ・管理用に加えて、新たに汎用演算用にも利用 Visceral CPU: In addition to for debugging and management, newly utilized for general-purpose computing
  • メモリインターフェース1:500GB/s(独自)* 8 ch(パッケージ内接続) Memory interface 1: 500GB / s (own) * 8 ch (package connection)
  • メモリインターフェース2:HMCまたはHBM(ch数は未定) Memory interface 2: HMC or HBM (ch number of undecided)
  • 外部インターフェース:PCIe Gen3/4 x8 * 6 Port External interface: PCIe Gen3 / 4 x8 * 6 Port
  • 消費電力:100W(プロセッサ単体でパッケージ内積層DRAM等を含まず) Power consumption: 100W (not including the package within the stacked DRAM or the like in a single processor)
The pie is shrinking for everyone.

As interesting as PEXY-SC2 is, to me it looks much more like a Xeon Phi/GPGPU competitor than a general-purpose CPU by the sheer number of cores and low clock speed. Probably also very nice for server loads as well, since those tend to be quite parallelized.

nVidia could go back out and sue Intel for x86 rights. Their original plan was for a dual-ISA architecture with Denver, which lead to them buying out Transmeta back in the day pretty much solely for the x86 license. They got close to going to court against Intel years ago, but then Intel paid them A Lot of Money to not ship the x86 ISA enabled. If AMD folded, the FTC would probably push, because let's face it, to really compete with Intel at desktop level, there's basically only nVidia and IBM (if they cut up their POWER8 cores down to 4core and 2core modules) - most the ARM vendors (Qcomm, Samsung, Apple) simply can't get single-core performance to match Sandy Bridge yet, let alone Skylake.


Indeed. Why compete in a tough commodity market (micro-servers) when you can get much, much more lucrative in the automotive market?

Yeah, which is why I mentioned Xeon Phi which is a lot similar. It's performance is much lower though. Intel might have to go back to the drawing board and trim the fat from x86 to make it competitive with ARM. x86 has always really neglected FLOPs and focused on specialized instructions.

Yep, that x86 overhead tax is a bitch. Still hard to see Intel deviating too far from their well beaten track even with programmers complaining about complexities in regard to Xeon Phi's coding in relation to other GPGPU ecosystems.

x86 vs ARM vs POWER vs MIPS, RISC vs CISC vs VLIW/EPIC is all academic wankery these days thanks to how pretty much every modern high-performance core using more than about 5W is a superscalar, out-of-order core design: the instructions are decoded into internal microcode anyways, which makes the ISA essentially an irrelevant part of the equation. In effect, for all the fat that x86 has, so do high-performance ARM, POWER and MIPS. If anything, right now x86 is arguably the most scalable architecture to ever be built, ranging from milliwatt (Quark) to hundreds of watts (Xeon Phi).

On the Phi specifically, the current Knights Corner design is very competitive to GK110 and GK210, and Knights Landing looks to be competing head to head with Pascal, though I reserve final judgement for that for after both ship.
 
We all forgot that NVIDIA is one of the big five in Open Power foundation. So they have access to Power CPU's and core design. So in fact they have the CPU they need in their hands.
 
On the Phi specifically, the current Knights Corner design is very competitive to GK110 and GK210, and Knights Landing looks to be competing head to head with Pascal, though I reserve final judgement for that for after both ship.
HPC code is tuned (often hand tuned) for specific applications and workload, so the hardware often plays second fiddle to coding and wringing out the best practical performance - which is what my point was - the same point you quoted:
Yep, that x86 overhead tax is a bitch. Still hard to see Intel deviating too far from their well beaten track even with programmers complaining about complexities in regard to Xeon Phi's coding in relation to other GPGPU ecosystems.
If KNC is very competitive with GK110 ( and the results on standard benchmarks are mixed on that to say the least (#1) (#2)) within the same power envelope -even ringing in a 300W 7120P/X doesn't seem to appreciably swing things in KNC's favour) it makes you wonder why Intel gives away Xeon Phi to grab high profile contracts, and even MSRP for individuals generally doesn't reflect the actual pricing for the most part....and why it needs major input from Intel (cash and incentives) to get clients to use it.
 
Last edited:
amd fanboys after read this new. :p
tumblr_lmhyuaCnBX1qi65k9.gif


just copying from
http://www.techpowerup.com/forums/t...ery-to-predecessor.216541/page-2#post-3353355
xD

Seriously, it's probably about money. You work to get money, don't you?
 
That is a rad smile.

Sooner than later, we'll find out just how good Zen is.

Isn't it planned in second half of 2016?
 
Isn't it planned in second half of 2016?
Delayed. AMD confirmed as much with the Keller announcement. Zen sampling in 2016, shipping for revenue in 2017.
Jim’s departure is not expected to impact our public product or technology roadmaps, and we remain on track for “Zen” sampling in 2016 with first full year of revenue in 2017.
Pretty much everything processor related got put back (or cancelled in SkyBridge's case), as this roadmap from last year shows
bIsS3Dk.jpg


Seattle has yet to really show up, and K12 is now also looking at a 2017 launch
AMDK12_678x452.jpg
 
Last edited:
We don't know, but by that picture a "smart follow" at his age (he's no spring chicken) should be looking to retire, not off undertaking a new endeavor.

I might think after 21 years and if he had AMD stock he might just not have the nest-egg to retire on. So a quick flip, that negotiates a healthy salary increase and stock options to hold-out for 4-5 more years (ca-ching). Nvidia gets a strong voice to work with the HSA Foundation and develop their program to move it up to the forefront within the organization.

While it appears to be a loss for AMD they are well entrenched on the HSA front, and at this point their work isn't gong to be impacted, unless he's able recruit more talent from AMD.

Honestly, I kind of feel sorry for him...
 
HPC code is tuned (often hand tuned) for specific applications and workload, so the hardware often plays second fiddle to coding and wringing out the best practical performance - which is what my point was - the same point you quoted:

If KNC is very competitive with GK110 ( and the results on standard benchmarks are mixed on that to say the least (#1) (#2)) within the same power envelope -even ringing in a 300W 7120P/X doesn't seem to appreciably swing things in KNC's favour) it makes you wonder why Intel gives away Xeon Phi to grab high profile contracts, and even MSRP for individuals generally doesn't reflect the actual pricing for the most part....and why it needs major input from Intel (cash and incentives) to get clients to use it.

That's why I seperated my comments on HPC vs more "standard" use-cases. Either way, the "x86 fat" was what I was disputing, since all high-performance architectures have about the same design paradigms nowadays. Sure, x86 being CISC has a huge number of instructions, but realistically, only compiler writers need to care about this, and ARM and POWER have both added in more instructions over the years, while MIPS has exited the high-performance space pretty much completely. As for SPARC, you don't hear anything about SPARC either outside of mainframe (Fujitsu) and Oracle.

For KNC vs GK110, I was looking an Tianhe-2 vs TITAN over at top500 (which tests using LINPACK), where Tianhe-2 has about twice the performance (33.86 PFLOPS vs 17.59 PFLOPS), at the cost of thrice the chip count (48 000 phis vs 18 688 K20X) and twice the power draw (17.6MW vs 8.2MW). In HPC, space isn't generally a major concern, but cooling and power is, which is why in general FLOPS/W is the better metric to use. Of course, the way large number of Phi cores make it harder to get good performance out of, but this is HPC, it's hard enough at 18k. The real reason why nVidia still mostly owns the market is because a lot of HPC programmers are quite familiar with CUDA already, and/or have an existing CUDA-based codebase; and porting to OpenCL or x86 is a non-trivial exercise.
 
Good god, this is funny.

Green team says that AMD is going to go under next month. Two people leaving, as high up as these people, is a sign of the apocalypse for AMD.

Red team says it's just two people moving on. Zen will fix everything.




Both sides are wrong. Keller leaving was the end of a contractual obligation. His job was done, so he left. Rogers is a problem. He's leaving for a competitor, which is making inroads into new markets. At the same time as it is bad, it's not like AMD is losing everything with one person. Rogers is jumping to where the best money is, and that isn't really bad for a cash strapped AMD.

AMD isn't doing great. Zen is largely a make it or break it situation for them. If they pull the rabbit out of the hat, they can get back to a single company. If they fail there's going to have to be some sacrificial offering. Keller and Rogers aren't the bell weather for AMD, they're very small (if well known) cogs in a larger plan.

It is fun to see how insane the fan boys are on each side though. I'm waiting for the truly irrational ones to come forward and defend bulldozer. I wish I had some booze.
 
For KNC vs GK110, I was looking an Tianhe-2 vs TITAN over at top500 (which tests using LINPACK), where Tianhe-2 has about twice the performance (33.86 PFLOPS vs 17.59 PFLOPS), at the cost of thrice the chip count (48 000 phis vs 18 688 K20X) and twice the power draw (17.6MW vs 8.2MW). In HPC, space isn't generally a major concern, but cooling and power is, which is why in general FLOPS/W is the better metric to use.
The comparison is flawed. Tianhe-2 uses 32000 Intel Xeon E5-2692 in a 2:3 ratio with Xeon Phi. Titan uses Opteron 6274's in a 1:1 ratio with Tesla K20X's. Not only do GPGPUs offer better FLOPs and FLOPs/watt than CPUs, the Xeon's in Tianhe-2 themselves offer greater actual FLOPS per core/processor (and FLOPS/watt even though both the E5-2692 and Opteron 6274 are nominally rated at 115W TDP)

AMD Q3 Earnings....not great, but at least they get a short-term cash injection.
 
Last edited:
The comparison is flawed. Tianhe-2 uses 32000 Intel Xeon E5-2692 in a 2:3 ratio with Xeon Phi. Titan uses Opteron 6274's in a 1:1 ratio with Tesla K20X's. Not only do GPGPUs offer better FLOPs and FLOPs/watt than CPUs, the Xeon's in Tianhe-2 themselves offer greater actual FLOPS per core/processor (and FLOPS/watt even though both the E5-2692 and Opteron 6274 are nominally rated at 115W TDP)

AMD Q3 Earnings....not great, but at least they get a short-term cash injection.

Oops, missed that catch.. Still, compared to the co-processors, they're not that big a deal, since the CPUs do in the 100s of GFLOPS, while both coprocessors break the TFLOP barrier on their own. It's not an insignificant difference (if anything, the Titan gains the edge of CPU aid at it's 1:1 ratio), but in the end, not that big a difference. Eitherwyas, it doesn't detract from my original comment of the two being comparable/competitive. Phi being not competitive would be being unable to hit half the K20X' performance at LINPACK for example, which is clearly not the case.

AMD must be really cash strapped to have to spin off even more :/
 
Oops, missed that catch.. Still, compared to the co-processors, they're not that big a deal, since the CPUs do in the 100s of GFLOPS, while both coprocessors break the TFLOP barrier on their own.
Just to wrap this up, those are theoretical numbers. In reality in tuned HPC workloads Xeon Phi hits around 70% of theoretical SGEMM, while Kepler is around 83% (Maxwell is hitting 94-95% as an aside). The problem is that as the workload increases in size, Phi starts losing efficiency as this whitepaper from Sandia Labs shows, so comparable/competitive is a moving target. Broadly comparable in small workloads rapidly turns worse in larger workloads ( Density/Pressure/Coordinate calculations in the table are time to completion (lower is better obviously)). There is a reason that Intel subsidize Xeon Phi. Some of it is to get a footing in the co-processor industry, but most of it comes down to hit-or-miss performance highly dependent on job size, something Nvidia and AMD don't tend to suffer from (although the latter has support issues).

WSvFXF7.jpg


AMD must be really cash strapped to have to spin off even more :/
Inventory write downs and a 23% gross margin will do that.
 
Last edited:
Just to wrap this up, those are theoretical numbers. In reality in tuned HPC workloads Xeon Phi hits around 70% of theoretical SGEMM, while Kepler is around 83% (Maxwell is hitting 94-95% as an aside). The problem is that as the workload increases in size, Phi starts losing efficiency as this whitepaper from Sandia Labs shows, so comparable/competitive is a moving target. Broadly comparable in small workloads rapidly turns worse in larger workloads ( Density/Pressure/Coordinate calculations in the table are time to completion (lower is better obviously)). There is a reason that Intel subsidize Xeon Phi. Some of it is to get a footing in the co-processor industry, but most of it comes down to hit-or-miss performance highly dependent on job size, something Nvidia and AMD don't tend to suffer from (although the latter has support issues).

WSvFXF7.jpg

Very interesting stuff. Where do you find those? I want in!
 
Back
Top