Sunday, July 6th 2008

Intel Larrabee Capable of 2 TFLOPs

German tech-journal Heise caught up with Intel's Pat Gelsinger for an article discussing the company's past and future as the silicon giant heads towards 40 years of service this 18th of July.

Among several topics, came up the most interesting one, visual computing and Intel's plans on it. 'Larrabee' strikes as a buzzword. It is the codename of Intel's upcoming graphics processor (GPU) architecture with which it plans to take on established players such as NVIDIA and AMD among others.

What's unique (so far) about Larrabee is that it's entirely made up of x86 processing cores. The Larrabee is likely to have 32 x86 processing cores. Here's a surprise: These processing cores are based on the design of Pentuim P54C, a 13+ year old x86 processor. This processor will be miniaturised to the 45nm fabrication process, they will be assisted by a 512-bit SIMD unit and these cores will support 64-bit address. Gelsinger says that 32 of these cores clocked at 2.00 GHz could belt out 2 TFLOPs of raw computational power. That's close to that of the upcoming AMD R700. Heise also reports that this GPU could have a TDP of as much as 300W (peak).With inputs from Heise
Add your own comment

77 Comments on Intel Larrabee Capable of 2 TFLOPs

#51
AphexDreamer
Does this mean that with intel joining in on the GPU Market that GPU's will become cheaper due to increased competition? Or not?
Posted on Reply
#52
hat
Enthusiast
Hopefully. Also, hopefully, both ATi and Nvidia won't be able to lazily build minor improvements on the same type of arcitecture with Intel swimming around in the pool... there's gonna be a lot of dunking heads underwater going on :)
Posted on Reply
#54
lemonadesoda
Pohl is the German computer science student behind the ray-traced versions of Quake 3 and 4 that have been featured on Digg and Slashdot. For his masters' thesis, he built a version of Quake 4 that uses real-time ray tracing to achieve some pretty remarkable effects—shadows are correctly cast and rendered in real-time, water has the proper reflections, indirect lighting looks like it's supposed to, etc. He was later hired by Intel, and now he's working within their graphics unit on real-time ray tracing for games.
Contrast that story with Creative. They SUED the guy who was trying to push Audigy further. Just goes to show there is far better management at Intel than Creative.
Posted on Reply
#55
W1zzard
you wont be able to run existing programs on it and make them run 8479483 times faster. its just like going from single core to dual core to quad core. almost no application scales from x1 to x8 or even more. yes there may be some exceptions (maybe 10 apps on the market right now in total?) but nothing that anyone here regularly uses
Posted on Reply
#56
lemonadesoda
Weer... It may have 30 cores, but guess how many ALU's each core has - that's right 1, just like any other CPU. Considering the 2 TFLOP computational power assesment, it is likely a very powerful ALU, but it would still only amount to the same amount on a GPU, which puts the Larrabee at a huge disadvantage against identically-architectured GPU's, such as the G92. It would be a lot more powerful, naturally, but just as the 800 ALU's running under the "R700"' core, it will fail at performing gaming-specific operations ...
Not true per se. Why?

1./ Larrabee has a much more powerful ALU that a GPU, meaning that for some tasks, Larrabee can do in one instruction what might take a fat loop and lookuptables on a GPU

2./ Larrabee ALU is DP and FP. GPU SPE is SP. To mimick DP or FP using SP requires a lot of loop and overhead

3./ SIMD on Larrabee is 512bit or more. That's the same as 16x 32bit (SP) calculations at once. With 32 x86 cores in the Larrabee matrix, that is equivalent to 16x 32cores = 512 simultantous SP calculations. ie the same as 512 shader processor units.

The key and as yet unknown data is how many clock cycles to execute SIMD compared to a GPU's SPE.
Posted on Reply
#57
Unregistered
Its looking good for this intel gpu.Remember how much money intel has,loads for r+d,it has its own fabs and can write its own drivers.They also have a hell of a lot of processor manufacturing experience to fall back on.

I hope intel can sock it to the other 2,it will be good for us in the longrun,whether their first attempt is good or not.
#58
OnBoard
swaayePentium P54C is the Pentium 75-200 MHz
Oh I remember when a friend of mine had a Pentium 75MHz and he had it overclocked to 90MHz and NFS (1) run on full screen! I had something 486 (edit: probably 486SX 33MHz) back then and could only run it half screen big :) I was so in awe of the overclock and the performance, remember everyone was not doing it (OC) those days.
Posted on Reply
#59
TheGuruStud
OnBoardOh I remember when a friend of mine had a Pentium 75MHz and he had it overclocked to 90MHz and NFS (1) run on full screen! I had something 486 (edit: probably 486SX 33MHz) back then and could only run it half screen big :) I was so in awe of the overclock and the performance, remember everyone was not doing it (OC) those days.
Oh yeah, well I had a 486 then a pentium 233 WITH MMX! Top that sucka! :p
Posted on Reply
#60
lemonadesoda
Anyone here interested in top500.org supercomputers?

Well, this Larrabee thing will put an end to Beowulf Class I clusters. And put a STOP to the interest in Cell blades.

Why? Much cheaper. And you wouldnt need to learn a new architecture model for programming, e.g. Cell. Just use your regular x86 IDE with Larrabee add-in.
Average Power consumption of a TOP10 system is 1.32 Mwatt and average power efficiency is 248 Mflop/s/Watt
With Larrabee we are getting 2000Gflops / 300Watt = 6000Mflops / watt, ie 10-30 times as power efficient as the best supercomputers.

That has a HUGE implication to power and cooling needed to host a number crunching monster.

It also has a HUGE implication on the cost of installing an HPC given how cheap Larrabee is compared to scaling under regular Beowulf.

With Larrabee, anyone could have an HPC if they wanted to.
Posted on Reply
#61
Error 404
Hey, would you be able to get a single one of the cores and then put it on a Pentium board? :D
Hopefully they'll get smart and use Pentium Pro cores instead; 512 kb of L2 cache, MMX arch., and cooler name; what could go wrong?
Also, imagine if you got a bunch of mobos with 4 PCI-E x16 lanes (I'm pretty sure they exist), stuck these cards onto a whole bunch of them (along with a quad core something), and ran a beowulf cluster? Say you had 8 motherboards, thats 4 cards per mobo, which is 32 cards, which is 64 TFLOPS!! :eek:

@ TheGuruStud: I went from a Pentium 90 to a Celeron-400! :p
Posted on Reply
#62
mrhuggles
there is more too it than this, normal cpus are much more powerfull and multipurpose, altho they should go with core2 duh, heh... P54Cs kinda suck imho.... and also there is even more to it than just the raw processing power, like the cache interfaces, and omg.. the memory interfaces, <3 2900XT/pro and 4870 for having a 512bit ring bus combined with a direct bus for low latency, honestly, p54C? they must plan on useing DDR 400mhz.. you think they mighta revamped some things?
TheGuruStudSince when is a general purpose cpu going to be able to process graphics at a respectable rate?

If that was the case, everyone with a quad core would be getting 50 FPS in 3dmark with the cpu test (I don't care if it has high speed ram and cache attached or not). I'm calling intel retarded, again.

edit: Or it's more fud. Like that 10 GHz pentium 4 they just had laying around :laugh:
OOPS i mean sims at 60mhz :? wow i was a whole 2 generations off.
Posted on Reply
#63
TheGuruStud
Error 404@ TheGuruStud: I went from a Pentium 90 to a Celeron-400! :p
I've still got you beat :) After the 233 I got a celeron 366 and Oced to 550. The chip could do over 600, but my MB sucked.

Then I swapped it for a 600 pentium III, but ran it at stock. Piece of crap CPU just magically died one day. Then I built a new rig :) AMD 1.4 Thunderbird! And I've never looked back (upgraded to xp 2100, then a long wait until athlon 64 3500, x2 4200 and opteron 170).

Damn, way off topic. Don't hurt me.
Posted on Reply
#64
lemonadesoda
Error 404Hey, would you be able to get a single one of the cores and then put it on a Pentium board? :D
No.
Hopefully they'll get smart and use Pentium Pro cores instead; 512 kb of L2 cache, MMX arch., and cooler name; what could go wrong?
Too big, too much heat, too much power and VERY little gain. Remember, these things are for crunching, not for executing long complex and branching code. MMX and SSEx are ditched in favour of specialised SIMD instructions. forums.techpowerup.com/showpost.php?p=872820&postcount=5
Also, imagine if you got a bunch of mobos with 4 PCI-E x16 lanes (I'm pretty sure they exist), stuck these cards onto a whole bunch of them (along with a quad core something), and ran a beowulf cluster? Say you had 8 motherboards, thats 4 cards per mobo, which is 32 cards, which is 64 TFLOPS!! :eek:
You wont need a PCEIx16 slot for these. They will probably be on PCIex1 or x4 slots. x16 not needed. Remember these things crunch... they dont need a super high bandwidth for most applications. Think of gigabit network. That bandwidth goes quite easily down a x1 slot. So you would have a gigbit bandwidth of data, representing data that had been seriously crunched to produce.

With a Larrabee, it is a cluster, but, strictly, it is not a beowulf cluster.

If you like home-made beowulfs, go here www.calvin.edu/~adams/research/microwulf/
Posted on Reply
#66
lemonadesoda
Larrabee Architecture for Visual Computing -- With plans for the first demonstrations later this year, the Larrabee architecture will be Intel's next step in evolving the visual computing platform. The Larrabee architecture includes a high-performance, wide SIMD vector processing unit (VPU) along with a new set of vector instructions including integer and floating point arithmetic, vector memory operations and conditional instructions. In addition, Larrabee includes a major new hardware coherent cache design enabling the many-core architecture. The architecture and instructions have been designed to deliver performance, energy efficiency and general purpose programmability to meet the demands of visual computing and other workloads that are inherently parallel in nature. Tools are critical to success and key Intel® Software Products will be enhanced to support the Larrabee architecture and enable unparalleled developer freedom. Industry APIs such as DirectX™ and OpenGL will be supported on Larrabee-based products.
Intel AVX: The next step in the Intel instruction set -- Gelsinger also discussed Intel AVX (Advanced Vector Extensions) which, when used by software programmers, will increase performance in floating point, media, and processor intensive software. AVX can also increase energy efficiency, and is backwards compatible to existing Intel processors. Key features include wider vectors, increasing from 128 bit to 256 bit wide, resulting in up to 2x peak FLOPs output. Enhanced data rearrangement, resulting in allowing data to be pulled more efficiently, and three operand, non-destructive syntax for a range of benefits. Intel will make the detailed specification public in early April at the Intel Developer Forum in Shanghai. The instructions will be implemented in the microarchitecture codenamed "Sandy Bridge" in the 2010 timeframe.
www.intel.com/pressroom/archive/reference/IntelMulticore_factsheet.pdf

So, will Larrabe be adopting AVX?
Posted on Reply
#68
WarEagleAU
Bird of Prey
And even more time for ray tracing, which apparently is made use of in the 4800 series cards. Intels first shot at GPUs ended miserably roughly 10 - 15 years ago. Im sure theyve learned from their mistakes back then. I for one am interested in seeing how it performs, but in the time frame given, it wont be new and cutting edge. Its a rehash. From all the information given and linked, it seems alot more complicated now than I originally though it was.
Posted on Reply
#69
Initialised
jyoung752 TFLOPS by Larabee a year from now is nice, but I can get 2.4 TFLOPS from the Radeon 4870x2 a month from now. And the Radeon cards are already rumored to be ray tracing monsters (used for ray tracing HD scenes in Transformers) www.tgdaily.com/content/view/38145/135/.
Yup, and that was with 1GB 2900XTs, the extra branching logic on R770 should make a big gains.
eidairaman1Ok this gives AMD and Nvidia Time to Send in Working Pieces for Hybrid units.
I foresee nVidia integrating Via Nano or Cell cores and AMD/ATI using Thunderbirds or K6-2s.
Posted on Reply
#70
bryan_d
I wonder if they will be implementing the old PowerVR tech that the Kyro series used against ATI and nVidia in the past. Hidden Surface Removal was a tech that I wished ATI and nVidia would actually steal! :) Sure ATI had their Z-buffer, and nVidia with their variant... but they simply were not as efficient as PowerVR. My Kyro2 only ran at 175MHz and it held its own fine against what ATI and nVidia had.

If this becomes something big, it will suck for nVida and AMD... and for us computer tweakers.

bryan d
Posted on Reply
#71
eidairaman1
The Exiled Airman
PowerVR is NEC/Panasonic, Graphics for Dreamcast were Awesome.
Posted on Reply
#72
substance90
Woah, since when is Intel planning on entering the video card industry with something more powerful then built-in GPUs?! And what`s with the design?! You can`t just stich 32 Pentiums together and call it a GPU! nVidia and AMD are way ahead in graphics card design!
Posted on Reply
#73
vojc
thast just LOL of GPU amd 4870 X2 has ~ 2,4Gflops and TDP under 300W (250-270 i guess)
Posted on Reply
#74
Morgoth
Fueled by Sapphire
substance90Woah, since when is Intel planning on entering the video card industry with something more powerful then built-in GPUs?! And what`s with the design?! You can`t just stich 32 Pentiums together and call it a GPU! nVidia and AMD are way ahead in graphics card design!
panchoman number 2 :laugh:
yes you can
Posted on Reply
#75
HTC
Mountain House (CA) - Earlier today we learned that Intel is already heavily pitching its Larrabee technology to partners, but the technology foundation largely remains a mystery. German publication heise.de now provides more clues with a rather interesting note that Larrabee is built on Intel en.wikipedia.org/wiki/Intel_Corporation ’s nearly two decade-old P5 architecture.

According to Heise author Andreas Stiller, possibly the most prominent person to cover computer hardware in Germany, Intel dipped into the bin of obsolete technology (Intel’s phrase for replaced technology) to come up with a technology base for the Larrabee cGPU. While attending Intel’s 40th anniversary briefing (Intel will celebrate its 40th birthday on July 18), Stiller apparently found out that the Larrabee cores will be built on the P54C core — which was the code-name for the second-gen, 600 nm Pentium chip.

The first Pentium core (P5, 800 nm, 60 and 66 MHz) was in development since 1989 and was introduced in 1993. The P54C was launched in 1994 with speeds up to 120 MHz, while the succeeding 350 nm P54CS reached 200 MHz. The 55C core (280 nm up to 233 MHz) followed in 1995 and was replaced with the Pentium II in 1997.

Stiller added that Larrabee will debut with 32 cores that "are likely" to be equipped with MMX extensions, which would mean that Larrabee will actually be based on a modified, 45 nm P54CS core. The cores will also support 64-bit. If you count in the fact that the MMX part was replaced with a 512-bit wide AVX (Advanced Vector Extensions) unit, Stiller comes up with a theoretical performance of 32 flop/sec. per clock, topping the 2 Tflop/sec. mark at a clock speed of 2 GHz.

If this is true, then Intel may be able to hit about twice the performance in single precision calculations as Nvidia and AMD achieve today. However, both Nvidia and AMD were able to double their floating point performance between 2007 and 2008 and we have reason to believe that once Larrabee will be available, GPUs may be hitting 3 to 4 Tflop/sec. in single GPU configurations. AMD’s dual-GPU ATI Radeon 4870 X2 (clocked at 778 MHz) is estimated to hit 2.49 Tflop/sec. when it debuts within the next few weeks.

It looks like that Intel should be aiming for at least 4 Tflop/sec. for the second half of 2009.
Source: Tom's Hardware
Posted on Reply
Add your own comment
May 8th, 2024 14:01 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts