Tuesday, May 9th 2023

432-Core RISC-V Processor with Chiplets Aims to Provide Ultra-Efficient Floating-Point Computation

Researchers at the Integrated Systems Laboratory (IIS) of ETH Zürich and the Energy-efficient Embedded Systems (EEES) group of the University of Bologna teamed up to form a project called Parallel Ultra Low Power (PULP) Platform project, with an aim to provide open hardware with efficiency in mind. Today, we learn that the PULP project has made significant progress with its " Occamy " project to explore ultra-efficient floating-point computation. The Occamy is a high-performance AI chip designed for efficiency. Based on RISC-V ISA, each Occamy chiplet uses 216 of 32-bit "Snitch" open-source cores organized in groups of four compute clusters, where each cluster shares tightly-coupled memory among eight compute cores and a high-bandwidth (512-bit) DMA-enhanced core directing the data flow. For control, the Occamy chiplet features "CVA6" open-source, Linux-capable, 64-bit RISC-V core.

With 16 GB high-bandwidth HBM2E memory available to each chiplet via 2.5D integration, the compute chiplet is built on a GlobalFoundries' 12 nm GF12LPP low-power process, and placed on top of a passive 65 nm interposer. Featuring about one billion transistors, they are packed in a 72 square millimeter solution, which is placed on a 52.5x45 mm carrier PCB for Fan-Out mounting. The entire Occamy chiplet features two compute dies for a total of 432 cores and two 16 GB HBM2E memory dies that can communicate with a neighboring chiplet over a 19.5 GB/s wide, source-synchronous technology-independent die-to-die DDR link. Regarding the performance target, Occamy is capable of 0.768 TeraFLOPS for FP64, 1.536 TeraFLOPS for FP32, 3.072 TeraFLOPS for FP16/FP16alt, and 6.144 TeraFLOPS for FP8/FP8alt. The exact number is yet to be determined for power usage, but the estimates are around the low 10s of Watts range. Once the assembled system has arrived, more numbers will be published.
PULP Occamy
Being a research vehicle and not an actual product, Occamy is not made for production, and only tens of final modules will arrive in the hands of researchers. Potential future applications include Automotive, Avionics, and Space, where efficient and high-performance chips are a must. Our initial article claimed that these modules were going into space and funded by European Space Agency; however, that is not the case. The main sponsors of this project are GlobalFoundries, Rambus, Synopsys, Micron, and Avery. On the official Occamy website, the PULP Platform shares further insight about the project and how it happened. You can see it below.
PULP PlatformThe Occamy project started as a serendipitous outcome of the Manticore high-performance architecture concept we presented at the Hot Chips conference in 2020 [1,2]. After Hot Chips 2020, the PULP Platform team was approached by GlobalFoundries with an exciting proposal to turn a concept architecture into a real silicon design. The project was made possible by the generous contribution and strong support of GlobalFoundries (technology access, expert advice, ecosystem enablement, and silicon budget), Rambus (HBM2e controller IP and integration support), Micron (HBM2e DRAMs supply and integration support), Synopsys (EDA tool licenses and support) and Avery (HBM2e DRAM verification model). We kick-started the Occamy project on the 20th of April 2021 and taped out the Occamy compute chiplet [3,4] in GlobalFoundries 12nm FinFet technology in July 2022 after less than 15 months of hard work with a team of only <25 people, mostly doctoral students. A few months later, in October, we taped out the passive silicon interposer called Hedwig [5] in GlobalFoundries 65nm technology. Both tape-outs were supported by the Europractice-IC team at Fraunhofer IIS.
Source: PULP Platform
Add your own comment

13 Comments on 432-Core RISC-V Processor with Chiplets Aims to Provide Ultra-Efficient Floating-Point Computation

#2
bonehead123
So now we know where all those excess/unused RISC chips have been hiding all this time, hehehe :D
Posted on Reply
#3
Warigator
0.768 TeraFLOPS at FP64 as a future product seems ridiculously low to me to be honest.
Posted on Reply
#4
Arkz
Warigator0.768 TeraFLOPS at FP64 as a future product seems ridiculously low to me to be honest.
Most space worthy ultra low power radiation resistant chips are weak as shit, so this would be a good step up, assuming that's its intended purpose.
Posted on Reply
#5
TumbleGeorge
I'm interested in what the FLOPS performance is actually like on modern consumer processors, especially in the full and double precision columns. In reviews of processors, such testing and ranking is completely absent. Probably not to show how underperforming consumer CPUs are despite their staggering power consumption.

Edit. I found something but didn't understand what level of precision is used.

From gadgetversus
Around 2TFlops for consumer flagship CPU.
Posted on Reply
#6
CapitanXeon
Warigator0.768 TeraFLOPS at FP64 as a future product seems ridiculously low to me to be honest.
At 0.043W per core at full tilt it doesn't seem that bad, compared to the 300W+ figures intel is pushing currently.
Posted on Reply
#7
demirael
2 chiplets at 231 cores per chiplet isn't 432 cores, it's 462.
Posted on Reply
#8
TumbleGeorge
demirael2 chiplets at 231 cores per chiplet isn't 432 cores, it's 462.
I suppose that you see a slide? 216, 32 bit cores per chiplet. 216*2=? Other 16 cores per chiplet work with 64 bit. Math (216*2)+(16*2)=. :)
Posted on Reply
#9
Geofrancis
The mars rovers are still running PowerPC 750 single core chips at 233mhz, this is the type of hardware its going up against, not 64 core server chips.
Posted on Reply
#10
ArdWar
Warigator0.768 TeraFLOPS at FP64 as a future product seems ridiculously low to me to be honest.
It is ridiculously high for what I assume a RadHard processor (or at least RadRes/RadSafe)...

Especially if you consider its energy efficiency
Posted on Reply
#11
R-T-B
GeofrancisThe mars rovers are still running PowerPC 750 single core chips at 233mhz, this is the type of hardware its going up against, not 64 core server chips.
That's actually on the high end of radiation hardened hardware.
Posted on Reply
#12
TumbleGeorge
R-T-BThat's actually on the high end of radiation hardened hardware.
Was. :)
BAE RAD5500 series which contains model with 4 cores and support for DDR3 RAM is relatively 10+ times better.
Posted on Reply
#13
R-T-B
TumbleGeorgeWas. :)
BAE RAD5500 series which contains model with 4 cores and support for DDR3 RAM is relatively 10+ times better.
Well, it sure beats a lot of the stuff out there so is still relatively high end. Still, would not recommend for space gamers.

I believe New Horizons had something like a 5mhz MIPS based core, but don't quote me on that.

Voyager probably has a literal blob of potato magic running it.
Posted on Reply
Add your own comment
Jul 18th, 2024 09:40 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts