Tuesday, May 9th 2023
432-Core RISC-V Processor with Chiplets Aims to Provide Ultra-Efficient Floating-Point Computation
Researchers at the Integrated Systems Laboratory (IIS) of ETH Zürich and the Energy-efficient Embedded Systems (EEES) group of the University of Bologna teamed up to form a project called Parallel Ultra Low Power (PULP) Platform project, with an aim to provide open hardware with efficiency in mind. Today, we learn that the PULP project has made significant progress with its " Occamy " project to explore ultra-efficient floating-point computation. The Occamy is a high-performance AI chip designed for efficiency. Based on RISC-V ISA, each Occamy chiplet uses 216 of 32-bit "Snitch" open-source cores organized in groups of four compute clusters, where each cluster shares tightly-coupled memory among eight compute cores and a high-bandwidth (512-bit) DMA-enhanced core directing the data flow. For control, the Occamy chiplet features "CVA6" open-source, Linux-capable, 64-bit RISC-V core.
With 16 GB high-bandwidth HBM2E memory available to each chiplet via 2.5D integration, the compute chiplet is built on a GlobalFoundries' 12 nm GF12LPP low-power process, and placed on top of a passive 65 nm interposer. Featuring about one billion transistors, they are packed in a 72 square millimeter solution, which is placed on a 52.5x45 mm carrier PCB for Fan-Out mounting. The entire Occamy chiplet features two compute dies for a total of 432 cores and two 16 GB HBM2E memory dies that can communicate with a neighboring chiplet over a 19.5 GB/s wide, source-synchronous technology-independent die-to-die DDR link. Regarding the performance target, Occamy is capable of 0.768 TeraFLOPS for FP64, 1.536 TeraFLOPS for FP32, 3.072 TeraFLOPS for FP16/FP16alt, and 6.144 TeraFLOPS for FP8/FP8alt. The exact number is yet to be determined for power usage, but the estimates are around the low 10s of Watts range. Once the assembled system has arrived, more numbers will be published.Being a research vehicle and not an actual product, Occamy is not made for production, and only tens of final modules will arrive in the hands of researchers. Potential future applications include Automotive, Avionics, and Space, where efficient and high-performance chips are a must. Our initial article claimed that these modules were going into space and funded by European Space Agency; however, that is not the case. The main sponsors of this project are GlobalFoundries, Rambus, Synopsys, Micron, and Avery. On the official Occamy website, the PULP Platform shares further insight about the project and how it happened. You can see it below.
Source:
PULP Platform
With 16 GB high-bandwidth HBM2E memory available to each chiplet via 2.5D integration, the compute chiplet is built on a GlobalFoundries' 12 nm GF12LPP low-power process, and placed on top of a passive 65 nm interposer. Featuring about one billion transistors, they are packed in a 72 square millimeter solution, which is placed on a 52.5x45 mm carrier PCB for Fan-Out mounting. The entire Occamy chiplet features two compute dies for a total of 432 cores and two 16 GB HBM2E memory dies that can communicate with a neighboring chiplet over a 19.5 GB/s wide, source-synchronous technology-independent die-to-die DDR link. Regarding the performance target, Occamy is capable of 0.768 TeraFLOPS for FP64, 1.536 TeraFLOPS for FP32, 3.072 TeraFLOPS for FP16/FP16alt, and 6.144 TeraFLOPS for FP8/FP8alt. The exact number is yet to be determined for power usage, but the estimates are around the low 10s of Watts range. Once the assembled system has arrived, more numbers will be published.Being a research vehicle and not an actual product, Occamy is not made for production, and only tens of final modules will arrive in the hands of researchers. Potential future applications include Automotive, Avionics, and Space, where efficient and high-performance chips are a must. Our initial article claimed that these modules were going into space and funded by European Space Agency; however, that is not the case. The main sponsors of this project are GlobalFoundries, Rambus, Synopsys, Micron, and Avery. On the official Occamy website, the PULP Platform shares further insight about the project and how it happened. You can see it below.
PULP PlatformThe Occamy project started as a serendipitous outcome of the Manticore high-performance architecture concept we presented at the Hot Chips conference in 2020 [1,2]. After Hot Chips 2020, the PULP Platform team was approached by GlobalFoundries with an exciting proposal to turn a concept architecture into a real silicon design. The project was made possible by the generous contribution and strong support of GlobalFoundries (technology access, expert advice, ecosystem enablement, and silicon budget), Rambus (HBM2e controller IP and integration support), Micron (HBM2e DRAMs supply and integration support), Synopsys (EDA tool licenses and support) and Avery (HBM2e DRAM verification model). We kick-started the Occamy project on the 20th of April 2021 and taped out the Occamy compute chiplet [3,4] in GlobalFoundries 12nm FinFet technology in July 2022 after less than 15 months of hard work with a team of only <25 people, mostly doctoral students. A few months later, in October, we taped out the passive silicon interposer called Hedwig [5] in GlobalFoundries 65nm technology. Both tape-outs were supported by the Europractice-IC team at Fraunhofer IIS.
13 Comments on 432-Core RISC-V Processor with Chiplets Aims to Provide Ultra-Efficient Floating-Point Computation
Edit. I found something but didn't understand what level of precision is used.
From gadgetversus
Around 2TFlops for consumer flagship CPU.
Especially if you consider its energy efficiency
BAE RAD5500 series which contains model with 4 cores and support for DDR3 RAM is relatively 10+ times better.
I believe New Horizons had something like a 5mhz MIPS based core, but don't quote me on that.
Voyager probably has a literal blob of potato magic running it.