News Posts matching #HBM2E

Thursday, 02:32 Discuss (20 Comments)

European Processor Initiative (EPI) is nearing completion of its first goal. SiPearl, the leading developer behind the Rhea1 processor, has finally reached the tapeout stage after a string of delays, but it will not be ready for delivery until 2026 at the earliest. When the project launched in 2020, SiPearl planned to begin production in 2023; however, the 61 billion-transistor chip only entered tapeout this summer. The design, built on TSMC's N6 process, features 80 Arm Neoverse V1 cores alongside 64 GB of HBM2E memory and a DDR5 interface. While these specifications once looked cutting‑edge, the industry has already moved on, and Rhea1's raw performance may seem dated by the time samples are available. SiPearl initially explored a RISC‑V architecture back in 2019 but abandoned it after early feedback and comments highlighted the instruction set's immaturity for exascale computing.

Development was further interrupted by shifting core‑count debates, with teams alternately considering 72 cores, then 64, before finally settling on 80 cores by 2022. Those back‑and‑forth decisions, combined with evolving performance expectations, helped push the timeline back by years. Despite missing its original schedule, Rhea1 remains vital to European ambitions for high‑performance computing sovereignty and serves as the intended CPU for the Jupiter supercomputer. Thanks to Jupiter's modular design, the system was not left idle; its GPU booster module, running NVIDIA Grace Hopper accelerators, is already operational and approximately 80 percent complete. With the CPU clusters slated for mid-2026 deployment, full system readiness is expected by the end of 2026. To support this effort, SiPearl has recently secured €130 million in new financing from the French government, industry partners, and Taiwan's Cathay Venture. As Rhea1 finishes its goal, work on Rhea2 is already underway, and we can expect more updates about Rhea2 in a year or two.

Huawei Prepares 6 nm Ascend 920C Accelerator: 900 TeraFLOPS, 4000 GB/s HBM3

AleksandarK

Apr 21st, 2025 04:00 Discuss (5 Comments)

Huawei recently revealed that its CloudMatrix 384 AI super node cluster can outperform NVIDIA's GB200 NVL72 in standard benchmarks, even though it consumes more power per performance unit. That system relies on Huawei's current Ascend 910C accelerators, which deliver strong raw compute performance but lag behind in efficiency metrics. To address this gap, Huawei is preparing the Ascend 920 family, with the training‑focused Ascend 920C built on SMIC's 6 nm process. According to DigiTimes, each 920C card will deliver more than 900 TeraFLOPS of BF16 half-precision performance. It also upgrades memory to HBM3 modules, providing 4,000 GB/s of bandwidth, which is up from the Ascend 910C's eight HBM2E stacks and 3,200 GB/s.

The existing Ascend 910C peaks at 780 TeraFLOPS in BF16 operations and uses a chip‑to‑chip interconnect bandwidth of 400 GB/s. Its packaging limits that bandwidth, but it still supports high‑speed communication between nodes in ultra‑dense AI clusters. Huawei will retain the chiplet‑based design in the 920C and refine the tensor acceleration engines for Transformer and Mixture‑of‑Experts models. Internal projections estimate that overall training efficiency on the 920C will improve by 30-40 percent compared to the 910C. This should narrow the performance‑per‑watt difference against competitors' solutions. In terms of system integration, the Ascend 920C will support PCIe 5.0 and next‑generation high‑throughput interconnect protocols. These features aim to improve resource scheduling and reduce latency in super node deployments, where tight node‑to‑node synchronization is critical. Huawei has not announced a firm release date for the Ascend 920C, but DigiTimes sources claim that it will enter mass production in the second half of 2025, which could mean just a few months from now.

Altera Starts Production Shipments of Agilex 7 FPGA M-Series

Press Release by

Nomad76

Apr 1st, 2025 12:21 Discuss (0 Comments)

Altera Corporation, a leader in FPGA innovations, today announced production shipments of its Agilex 7 FPGA M-Series, the industry's first high-end, high-density FPGA to feature integrated high bandwidth memory and support for DDR5 and LPDDR5 memory technologies. Offering over 3.8 million logic elements, Agilex 7 FPGA M-Series is optimized for applications that demand the highest performance and highest memory bandwidth, including AI, data centers, next-generation firewalls, 5G communications infrastructure and 8K broadcast equipment.

As data traffic continues to increase exponentially due to the growth of AI, cloud computing and video streaming services, the demand for higher memory bandwidth, increased capacity, and improved power efficiency has never been greater. Agilex 7 FPGA M-Series addresses these challenges by offering users high logic densities, a high-performance fabric and a memory interface that accelerates data throughput speeds while reducing memory bottlenecks and latency.

Read full story

ASUS Showcases Servers Based on Intel Xeon 6, Intel Gaudi 3 at CloudFest 2025

Press Release by

GFreeman

Mar 13th, 2025 09:03 Discuss (0 Comments)

ASUS today announced its showcase of comprehensive AI infrastructure solutions at CloudFest 2025, bringing together cutting-edge hardware powered by Intel Xeon 6 processors, NVIDIA GPUs and AMD EPYC processors. The company will also highlight its integrated software platforms, reinforcing its position as a total AI solution provider for enterprises seeking seamless AI deployments from edge to cloud.

Intel Xeon 6-based AI solutions and Gaudi 3 Acceleration for generative AI inferencing and fine tuning training
ASUS Intel Xeon 6-based servers leverage the Data Center Modular Hardware System (DC-MHS) architecture, providing unparalleled scalability, cost-efficiency and simplified maintenance. ASUS will showcase a comprehensive Intel Xeon 6 family of processors at CloudFest 2025, including the RS700-E12, RS720Q-E12. and ESC8000-E12P-series servers. The ESC800-E12P-series servers will debut the Intel Gaudi 3 AI accelerator PCIe card. This lineup underscores the ASUS commitment to delivering comprehensive AI solutions that integrate cutting-edge hardware with enterprise-grade software platforms for seamless, scalable AI deployments, highlighting Intel's latest innovations for high-performance AI training, inference, and cloud-native workloads.

Read full story

European Supercomputer Chip SiPearl Rhea Delayed, But Upgraded with More Cores

AleksandarK

May 14th, 2024 09:05 Discuss (8 Comments)

The rollout of SiPearl's much-anticipated Rhea processor for European supercomputers has been pushed back by a year to 2025, but the delay comes with a silver lining - a significant upgrade in core count and potential performance. Originally slated to arrive in 2024 with 72 cores, the homegrown high-performance chip will now pack 80 cores when it eventually launches. This decisive move by SiPearl and its partners is a strategic choice to ensure the utmost quality and capabilities for the flagship European processor. The additional 12 months will allow the engineering teams to further refine the chip's architecture, carry out extensive testing, and optimize software stacks to take full advantage of Rhea's computing power. Now called the Rhea1, the chip is a crucial component of the European Processor Initiative's mission to develop domestic high-performance computing technologies and reduce reliance on foreign processors. Supercomputer-scale simulations spanning climate science, drug discovery, energy research and more all require astonishing amounts of raw compute grunt.

By scaling up to 80 cores based on the latest Arm Neoverse V1, Rhea1 aims to go toe-to-toe with the world's most powerful processors optimized for supercomputing workloads. The SiPearl wants to utilize TSCM's N6 manufacturing process. The CPU will have 256-bit DDR5 memory connections, 104 PCIe 5.0 lanes, and four stacks of HBM2E memory. The roadmap shift also provides more time for the expansive European supercomputing ecosystem to prepare robust software stacks tailored for the upgraded Rhea silicon. Ensuring a smooth deployment with existing models and enabling future breakthroughs are top priorities. While the delay is a setback for SiPearl's launch schedule, the substantial upgrade could pay significant dividends for Europe's ambitions to join the elite ranks of worldwide supercomputer power. All eyes will be on Rhea's delivery in 2025, mainly from Europe's governments, which are funding the project.

Intel Launches Gaudi 3 AI Accelerator: 70% Faster Training, 50% Faster Inference Compared to NVIDIA H100, Promises Better Efficiency Too

AleksandarK

Apr 9th, 2024 12:59 Discuss (14 Comments)

During the Vision 2024 event, Intel announced its latest Gaudi 3 AI accelerator, promising significant improvements over its predecessor. Intel claims the Gaudi 3 offers up to 70% improvement in training performance, 50% better inference, and 40% better efficiency than Nvidia's H100 processors. The new AI accelerator is presented as a PCIe Gen 5 dual-slot add-in card with a 600 W TDP or an OAM module with 900 W. The PCIe card has the same peak 1,835 TeraFLOPS of FP8 performance as the OAM module despite a 300 W lower TDP. The PCIe version works as a group of four per system, while the OAM HL-325L modules can be run in an eight-accelerator configuration per server. This likely will result in a lower sustained performance, given the lower TDP, but it confirms that the same silicon is used, just finetuned with a lower frequency. Built on TSMC's N5 5 nm node, the AI accelerator features 64 Tensor Cores, delivering double the FP8 and quadruple FP16 performance over the previous generation Gaudi 2.

The Gaudi 3 AI chip comes with 128 GB of HBM2E with 3.7 TB/s of bandwidth and 24 200 Gbps Ethernet NICs, with dual 400 Gbps NICs used for scale-out. All of that is laid out on 10 tiles that make up the Gaudi 3 accelerator, which you can see pictured below. There is 96 MB of SRAM split between two compute tiles, which acts as a low-level cache that bridges data communication between Tensor Cores and HBM memory. Intel also announced support for the new performance-boosting standardized MXFP4 data format and is developing an AI NIC ASIC for Ultra Ethernet Consortium-compliant networking. The Gaudi 3 supports clusters of up to 8192 cards, coming from 1024 nodes comprised of systems with eight accelerators. It is on track for volume production in Q3, offering a cost-effective alternative to NVIDIA accelerators with the additional promise of a more open ecosystem. More information and a deeper dive can be found in the Gaudi 3 Whitepaper.

SK Hynix To Invest $1 Billion into Advanced Chip Packaging Facilities

T0@st

Mar 9th, 2024 11:00 Discuss (2 Comments)

Lee Kang-Wook, Vice President of Research and Development at SK Hynix, has discussed the increased importance of advanced chip packaging with Bloomberg News. In an interview with the media company's business section, Lee referred to a tradition of prioritizing the design and fabrication of chips: "the first 50 years of the semiconductor industry has been about the front-end." He believes that the latter half of production processes will take precedence in the future: "...but the next 50 years is going to be all about the back-end." He outlined a "more than $1 billion" investment into South Korean facilities—his department is hoping to "improve the final steps" of chip manufacturing.

SK Hynix's Head of Packaging Development pioneered a novel method of packaging the third generation of high bandwidth technology (HBM2E)—that innovation secured NVIDIA as a high-profile and long term customer. Demand for Team Green's AI GPUs has boosted the significance of HBM technologies—Micron and Samsung are attempting to play catch up with new designs. South Korea's leading memory supplier is hoping to stay ahead in the next-gen HBM contest—supposedly 12-layer fifth generation samples have been submitted to NVIDIA for approval. SK Hynix's Vice President recently revealed that HBM production volumes for 2024 have sold out—currently company leadership is considering the next steps for market dominance in 2025. The majority of the firm's newly announced $1 billion budget will be spent on the advancement of MR-MUF and TSV technologies, according to their R&D chief.

Intel and Ohio Supercomputer Center Double AI Processing Power with New HPC Cluster

Press Release by

btarunr

Feb 20th, 2024 08:16 Discuss (0 Comments)

A collaboration including Intel, Dell Technologies, Nvidia and the Ohio Supercomputer Center (OSC), today introduces Cardinal, a cutting-edge high-performance computing (HPC) cluster. Purpose-built to meet the increasing demand for HPC resources in Ohio across research, education and industry innovation, particularly in artificial intelligence (AI).

AI and machine learning are integral tools in scientific, engineering and biomedical fields for solving complex research inquiries. As these technologies continue to demonstrate efficacy, academic domains such as agricultural sciences, architecture and social studies are embracing their potential. Cardinal is equipped with the hardware capable of meeting the demands of expanding AI workloads. In both capabilities and capacity, the new cluster will be a substantial upgrade from the system it will replace, the Owens Cluster launched in 2016.

Read full story

Manufacturers Anticipate Completion of NVIDIA's HBM3e Verification by 1Q24; HBM4 Expected to Launch in 2026

Press Release by

TheLostSwede

Nov 27th, 2023 08:19 Discuss (0 Comments)

TrendForce's latest research into the HBM market indicates that NVIDIA plans to diversify its HBM suppliers for more robust and efficient supply chain management. Samsung's HBM3 (24 GB) is anticipated to complete verification with NVIDIA by December this year. The progress of HBM3e, as outlined in the timeline below, shows that Micron provided its 8hi (24 GB) samples to NVIDIA by the end of July, SK hynix in mid-August, and Samsung in early October.

Given the intricacy of the HBM verification process—estimated to take two quarters—TrendForce expects that some manufacturers might learn preliminary HBM3e results by the end of 2023. However, it's generally anticipated that major manufacturers will have definite results by 1Q24. Notably, the outcomes will influence NVIDIA's procurement decisions for 2024, as final evaluations are still underway.

Read full story

EK Fluid Works Enhances Portfolio with NVIDIA H100 GPU Integration

Press Release by

GFreeman

Oct 11th, 2023 10:08 Discuss (7 Comments)

EK, the leading PC liquid cooling solutions provider, has expanded its hardware support for the EK Fluid Works systems by integrating the state-of-the-art NVIDIA H100 PCIe Tensor Core GPU. NVIDIA's latest release, acclaimed for its unprecedented performance, scalability, and security across diverse workloads, has discovered its ultimate home in EK Fluid Works servers and workstations.

Notably, EK's commitment to sustainability transforms these systems into eco-friendlier platforms, unlocking the full potential of Large Language Models (LLM), machine learning, and AI model training. EK Fluid Works systems emerge as the top choice for those seeking the unleashed power of NVIDIA H100 Tensor Core GPUs, offering an impressive array of efficiency benefits, including:

Unparalleled returns on investment
The lowest total cost of operation (TCO/OpEx)
Minimal additional capital expenditure (CapEx)

Read full story

BBCube 3D Could be the Future of Stacked DRAM

TheLostSwede

Jul 7th, 2023 07:57 Discuss (18 Comments)

Scientists at the Tokyo Institute of Technology have developed a new type of stacked or 3D DRAM that the researchers call Bumpless Build Cube 3D or BBCube 3D, which relies on Through Silicon Vias or TSVs to connect the DRAM dies. This is a different approach to HBM which relies on micro bumps to connect the layers together and the Japanese scientists are saying that their bumpless wafer-on-wafer solution should allow not only for an easier manufacturing process, but more importantly, improved cooling, as the TSVs can channel the heat from the DRAM dies down into whatever substrate the BBCube 3D stack is finally mounted onto.

If that wasn't enough, the researchers believe that BBCube 3D will be able to deliver higher speeds than HBM courtesy of a combination of the TSVs being relatively short and "high-density signal parallelism". BBCube 3D is expected to deliver up to a 32 fold increase in bandwidth compared to DDR5 memory and a four fold increase compared to HBM2E memory, while at the same time, drawing less power. The research paper goes into a lot more details for those interested at taking a closer look at this potentially revolutionary shift in DRAM assembly. However, the question that remains unanswered is if this will end up as a real world product some time in the near future, which is all based on how manufacturable BBCube 3D memory will be.

NVIDIA Allegedly Preparing H100 GPU with 94 and 64 GB Memory

AleksandarK

Jun 26th, 2023 08:17 Discuss (18 Comments)

NVIDIA's compute and AI-oriented H100 GPU is supposedly getting an upgrade. The H100 GPU is NVIDIA's most powerful offering and comes in a few different flavors: H100 PCIe, H100 SXM, and H100 NVL (a duo of two GPUs). Currently, the H100 GPU comes with 80 GB of HBM2E, both in the PCIe and SXM5 version of the card. A notable exception if the H100 NVL, which comes with 188 GB of HBM3, but that is for two cards, making it 94 GB per each. However, we could see NVIDIA enable 94 and 64 GB options for the H100 accelerator soon, as the latest PCI ID Repository shows.

According to the PCI ID Repository listing, two messages are posted: "Kindly help to add H100 SXM5 64 GB into 2337." and "Kindly help to add H100 SXM5 94 GB into 2339." These two messages indicate that NVIDIA could prepare its H100 in more variations. In September 2022, we saw NVIDIA prepare an H100 variation with 120 GB of memory, but that still isn't official. These PCIe IDs could just come from engineering samples that NVIDIA is testing in the labs, and these cards could never appear on any market. So, we have to wait and see how it plays out.

AMD Confirms that Instinct MI300X GPU Can Consume 750 W

T0@st

Jun 15th, 2023 13:15 Discuss (43 Comments)

AMD recently revealed its Instinct MI300X GPU at their Data Center and AI Technology Premiere event on Tuesday (June 15). The keynote presentation did not provide any details about the new accelerator model's power consumption, but that did not stop one tipster - Hoang Anh Phu - from obtaining this information from Team Red's post-event footnotes. A comparative observation was made: "MI300X (192 GB HBM3, OAM Module) TBP is 750 W, compared to last gen, MI250X TBP is only 500-560 W." A leaked Giga Computing roadmap from last month anticipated server-grade GPUs hitting the 700 W mark.

NVIDIA's Hopper H100 took the crown - with its demand for a maximum of 700 W - as the most power-hungry data center enterprise GPU until now. The MI300X's OCP Accelerator Module-based design now surpasses Team Green's flagship with a slightly greater rating. AMD's new "leadership generative AI accelerator" sports 304 CDNA 3 compute units, which is a clear upgrade over the MI250X's 220 (CDNA 2) CUs. Engineers have also introduced new 24G B HBM3 stacks, so the MI300X can be specced with 192 GB of memory (as a maximum), the MI250X is limited to a 128 GB memory capacity with its slower HBM2E stacks. We hope to see sample units producing benchmark results very soon, with the MI300X pitted against H100.

Chinese GPU Maker Biren Technology Loses its Co-Founder, Only Months After Revealing New GPUs

T0@st

Apr 5th, 2023 09:27 Discuss (4 Comments)

Golf Jiao, a co-founder and general manager of Biren Technology, has left the company late last month according to insider sources in China. No official statement has been issued by the executive team at Biren Tech, and Jiao has not provided any details regarding his departure from the fabless semiconductor design company. The Shanghai-based firm is a relatively new startup - it was founded in 2019 by several former NVIDIA, Qualcomm and Alibaba veterans. Biren Tech received $726.6 million in funding for its debut range of general-purpose graphics processing units (GPGPUs), also defined as high-performance computing graphics processing units (HPC GPUs).

The company revealed its ambitions to take on NVIDIA's Ampere A100 and Hopper H100 compute platforms, and last August announced two HPC GPUs in the form of the BR100 and BR104. The specifications and performance charts demonstrated impressive figures, but Biren Tech had to roll back its numbers when it was hit by U.S Government enforced sanctions in October 2022. The fabless company had contracted with TSMC to produce its Biren range, and the new set of rules resulted in shipments from the Taiwanese foundry being halted. Biren Tech cut its work force by a third soon after losing its supply chain with TSMC, and the engineering team had to reassess how the BR100 and BR104 would perform on a process node larger than the original 7 nm design. It was decided that a downgrade in transfer rates would appease the legal teams, and get newly redesigned Biren silicon back onto the assembly line.

Read full story

Samsung Develops GDDR6W Memory Standard: Double the Bandwidth and Density of GDDR6 Through Packaging Innovations

Press Release by

btarunr

Nov 29th, 2022 00:53 Discuss (30 Comments)

As advanced graphics and display technologies develop, they are blurring the lines between metaverse and our everyday experience. Much of this important shift is being made possible by the advancement of memory solutions designed for graphics products. One of the biggest challenges for improving virtual reality is taking the complexities of real-world objects and environments and recreating them in a virtual space. Doing so requires massive memory and increased computing power. At the same time, the benefits of creating more true-to-life metaverse will be far reaching, including real-life simulations of complicated scenarios and more, sparking innovation across a number of industries.

This is the central idea behind one of the most popular concepts in virtual reality: digital twin. A digital twin is a virtual representation of an object or space. Updated in real-time in accordance with the actual environment, a digital twin spans the lifecycle of its source and uses simulation, machine learning and reasoning to help decision-making. While until recently this was not feasible proposition due to limitations on data processing and transference, digital twins are now gaining traction thanks to availability of high bandwidth technologies.

Read full story

NVIDIA Could Launch Hopper H100 PCIe GPU with 120 GB Memory

AleksandarK

Sep 26th, 2022 03:57 Discuss (8 Comments)

NVIDIA's high-performance computing hardware stack is now equipped with the top-of-the-line Hopper H100 GPU. It features 16896 or 14592 CUDA cores, developing if it comes in SXM5 of PCIe variant, with the former being more powerful. Both variants come with a 5120-bit interface, with the SXM5 version using HBM3 memory running at 3.0 Gbps speed and the PCIe version using HBM2E memory running at 2.0 Gbps. Both versions use the same capacity capped at 80 GBs. However, that could soon change with the latest rumor suggesting that NVIDIA could be preparing a PCIe version of Hopper H100 GPU with 120 GBs of an unknown type of memory installed.

According to the Chinese website "s-ss.cc" the 120 GB variant of the H100 PCIe card will feature an entire GH100 chip with everything unlocked. As the site suggests, this version will improve memory capacity and performance over the regular H100 PCIe SKU. With HPC workloads increasing in size and complexity, more significant memory allocation is needed for better performance. With the recent advances in Large Language Models (LLMs), AI workloads use trillions of parameters for tranining, most of which is done on GPUs like NVIDIA H100.

Intel Xeon Scalable "Sapphire Rapids" with HBM2E Beaten by Older AMD EPYC "Milan-X" in Leaked Benchmarks

btarunr

Sep 14th, 2022 01:07 Discuss (19 Comments)

Intel's Xeon Scalable "Sapphire Rapids" processor may have a tough time getting to market, as leaked benchmarks suggest that even its premium HPC models with on-package HBM2E memory are outperformed by AMD's older-generation "Zen 3" EPYC processors. The 64-core/128-thread EPYC "Milan-X" processor based on older "Zen 3" microarchitecture with 3D Vertical Cache (3DV cache) chiplets, allegedly outperforms 52-core/104-thread Xeon Platinum 8472C and 60-core/120-thread Xeon Platinum 8490H "Sapphire Rapids" engineering samples in CPU-Z Bench and V-ray tests that scale across cores. These benchmark scores were compared with those of the EPYC "Milan-X" by Tom's Hardware, in which they well woefully short of the AMD chips.

AMD Releases its CDNA2 MI250X "Aldebaran" HPC GPU Block Diagram

btarunr

Aug 24th, 2022 07:43 Discuss (9 Comments)

AMD in its HotChips 22 presentation released a block-diagram of its biggest AI-HPC processor, the Instinct MI250X. Based on the CDNA2 compute architecture, at the heart of the MI250X is the "Aldebaran" MCM (multi-chip module). This MCM contains two logic dies (GPU dies), and eight HBM2E stacks, four per GPU die. The two GPU dies are connected by a 400 GB/s Infinity Fabric link. They each have up to 500 GB/s of external Infinity Fabric bandwidth for inter-socket communications; and PCI-Express 4.0 x16 as the host system bus for AIC form-factors. The two GPU dies together make up 58 billion transistors, and are fabricated on the TSMC N6 (6 nm) node.

The component hierarchy of each GPU die sees eight Shader Engines share a last-level L2 cache. The eight Shader Engines total 112 Compute Units, or 14 CU per engine. The CDNA2 compute unit contains 64 stream processors making up the Shader Core, and four Matrix Core Units. These are specialized hardware for matrix/tensor math operations. There are hence 7,168 stream processors per GPU die, and 14,336 per package. AMD claims a 100% increase in double-precision compute performance over CDNA (MI100). AMD attributes this to increases in frequencies, efficient data paths, extensive operand reuse and forwarding; and power-optimization enabling those higher clocks. The MI200 is already powering the Frontier supercomputer, and is working for more design wins in the HPC space. The company also dropped a major hint that the MI300, based on CDNA3, will be an APU. It will incorporate GPU dies, core-logic, and CPU CCDs onto a single package, in what is a rival solution to NVIDIA Grace Hopper Superchip.

BIREN BR100 Detailed: China's AI-HPC Processor Storms into the HPC GPU Big Leagues

btarunr

Aug 24th, 2022 06:52 Discuss (6 Comments)

If InnoSilicon's Fenghua gaming GPU hit the scene last November seemingly out of nowhere, then another Chinese GPU developer is making waves at HotChips 22, this time in the enterprise space. The BR100 by BIREN is a large AI-HPC GPU-based processor that's China's answer to the Hopper, Ponte Vecchio, and CDNA2, and ensure China's growth as an AI/HPC leader is unaffected in the event of a tech embargo for whatever reason.

The BR100 is an MCM of two planar-silicon dies built on the 7 nm DUV node, with a striking 77 billion transistor-count between them, and 550 W TDP (typical). The chip features 64 GB of on-package HBM2E memory. System bus interfaces include PCI-Express 5.0 x16 with CXL, and eight lanes of a proprietary interconnect called B-Link, which total 2.3 TB/s of bandwidth. The processor supports nearly all popular compute formats except double-precision floating-point, or FP64. Among the supported ones are single-precision or FP32, TF32+, FP16, BF16, INT16, and INT8. BIREN claims up to 256 TFLOP/s FP32, up to 512 TFLOP/s TF32+, up to 1 PFLOP/s BF16, and 2,048 TOPS INT8. This would put it at 2.4 to 2.8 times faster than NVIDIA's "Ampere" A100.

Read full story

Biren Technology Unveils BR100 7 nm HPC GPU with 77 Billion Transistors

Uskompuf

Aug 9th, 2022 06:28 Discuss (16 Comments)

Chinese company Biren Technology has recently unveiled the Biren BR100 HPC GPU during their Biren Explore Summit 2022 event. The Biren BR100 features an in-house chiplet architecture with 77 billion transistors and is manufactured on a 7 nm process using TSMC's 2.5D CoWoS packaging technology. The card is equipped with 300 MB of onboard cache alongside 64 GB of HBM2E memory running at 2.3 TFLOPs. This combination delivers performance above that of the NVIDIA Ampere A100 achieving 1024 TFLOPs in 16-bit floating point operations.

The company also announced the BR104 which features a monolithic design and should offer approximately half the performance of the BR100 at a TDP of 300 W. The Biren BR104 will be available as a standard PCIe card while the BR100 will come in the form of an OAM compatible board with a custom tower cooler. The pricing and availability information for these cards is currently unknown.

BittWare Announces PCIe 5.0/CXL FPGA Accelerators Featuring Intel Agilex M-Series and I-Series to Drive Memory and Interconnectivity Improvements

Press Release by

TheLostSwede

Aug 2nd, 2022 08:07 Discuss (2 Comments)

BittWare, a Molex company, a leading supplier of enterprise-class accelerators for edge and cloud-computing applications, today introduced new card and server-level solutions featuring Intel Agilex FPGAs. The new BittWare IA-860m helps customers alleviate memory-bound application workloads by leveraging up to 32 GB of HBM2E in-package memory and 16-lanes of PCIe 5.0 (with CXL upgrade option). BittWare also added new Intel Agilex I-Series FPGA-based products with the introduction of the IA-440i and IA-640i accelerators, which support high-performance interfaces, including 400G Ethernet and PCIe 5.0 (CXL option). These newest models complement BittWare's existing lineup of Intel Agilex F-Series products to comprise one of the broadest portfolios of Intel Agilex FPGA-based offerings on the market. This announcement reinforces BittWare's commitment to addressing ever-increasing demands of high-performance compute, storage, network and sensor processing applications.

"BittWare is excited to apply Intel's advanced technology to solve increasingly difficult application problems, quickly and at low risk," said Craig Petrie, vice president, Sales and Marketing of BittWare. "Our longstanding collaboration with Intel, expertise with the latest development tools, including OneAPI, as well as alignment with Molex's global supply chain and manufacturing capabilities enable BittWare to reduce development time by 12-to-18 months while ensuring smooth transitions from proof-of-concept to volume product deployment."

Read full story

Intel Announces "Rialto Bridge" Accelerated AI and HPC Processor

Press Release by

btarunr

Jun 1st, 2022 01:53 Discuss (7 Comments)

During the International Supercomputing Conference on May 31, 2022, in Hamburg, Germany, Jeff McVeigh, vice president and general manager of the Super Compute Group at Intel Corporation, announced Rialto Bridge, Intel's data center graphics processing unit (GPU). Using the same architecture as the Intel data center GPU Ponte Vecchio and combining enhanced tiles with Intel's next process node, Rialto Bridge will offer up to 160 Xe cores, more FLOPs, more I/O bandwidth and higher TDP limits for significantly increased density, performance and efficiency.

"As we embark on the exascale era and sprint towards zettascale, the technology industry's contribution to global carbon emissions is also growing. It has been estimated that by 2030, between 3% and 7% of global energy production will be consumed by data centers, with computing infrastructure being a top driver of new electricity use," said Jeff McVeigh, vice president and general manager of the Super Compute Group at Intel Corporation.

Read full story

Intel Meteor Lake, HBM2E-enabled Sapphire Rapids, and Ponte Vecchio Pictured

AleksandarK

May 12th, 2022 04:04 Discuss (13 Comments)

Intel has allowed the media to get a closer look at the next generation of silicon that will power millions of systems in years to come during its private Vision event. PC Watch, a Japanese tech media, managed to get some shots of the upcoming Meteor Lake, Sapphire Rapids, and Ponte Vecchio processors. Starting with Meteor Lake, Intel has displayed two packages for this processor family. The first one is the ultra-compact, high-density UP9 package used for highly compact mobile systems, and it is made out of silicon with minimal packaging to save space. The second one is a traditional design with more oversized packaging, designed for typical laptop/notebook configurations.

Read full story

Intel Details Ponte Vecchio Accelerator: 63 Tiles, 600 Watt TDP, and Lots of Bandwidth

AleksandarK

Feb 22nd, 2022 02:36 Discuss (9 Comments)

During the International Solid-State Circuits Conference (ISSCC) 2022, Intel gave us a more significant look at its upcoming Ponte Vecchio HPC accelerator and how it operates. So far, Intel convinced us that the company created Ponte Vecchio out of 47 tiles glued together in one package. However, the ISSCC presentation shows that the accelerator is structured rather interestingly. There are 63 tiles in total, where 16 are reserved for compute, eight are used for RAMBO cache, two are Foveros base tiles, two represent Xe-Link tiles, eight are HBM2E tiles, and EMIB connection takes up 11 tiles. This totals for about 47 tiles. However, an additional 16 thermal tiles used in Ponte Vecchio regulate the massive TDP output of this accelerator.

What is interesting is that Intel gave away details of the RAMBO cache. This novel SRAM technology uses four banks of 3.75 MB groups total of 15 MB per tile. They are connected to the fabric at 1.3 TB/s connection per chip. In contrast, compute tiles are connected at 2.6 TB/s speeds to the chip fabric. With eight RAMBO cache tiles, we get an additional 120 MB SRAM present. The base tile is a 646 mm² die manufactured in Intel 7 semiconductor process and contains 17 layers. It includes a memory controller, the Fully Integrated Voltage Regulators (FIVR), power management, 16-lane PCIe 5.0 connection, and CXL interface. The entire area of Ponte Vecchio is rather impressive, as 47 active tiles take up 2,330 mm², whereas when we include thermal dies, the total area jumps to 3,100 mm². And, of course, the entire package is much larger at 4,844 mm², connected to the system with 4,468 pins.

Read full story

NVIDIA CMP 170HX Mining Card Tested, Based on GA100 GPU SKU

AleksandarK

Nov 25th, 2021 02:05 Discuss (7 Comments)

NVIDIA's Crypto Mining (CMP) series of graphics cards are made to work only for one purpose: mining cryptocurrency coins. Hence, their functionality is somewhat limited, and they can not be used for gaming as regular GPUs can. Today, Linus Tech Tips got ahold of NVIDIA's CMP 170HX mining card, which is not listed on the company website. According to the source, the card runs on NVIDIA's GA100-105F GPU, a version based on the regular GA100 SXM design used in data-center applications. Unlike its bigger brother, the GA100-105F SKU is a cut-down design with 4480 CUDA cores and 8 GB of HBM2E memory. The complete design has 6912 cores and 40/80 GB HBM2E memory configurations.

As far as the reason for choosing 8 GB HBM2E memory goes, we know that the Ethereum DAG file is under 5 GB, so the 8 GB memory buffer is sufficient for mining any coin out there. It is powered by an 8-pin CPU power connector and draws about 250 Watts of power. It can be adjusted to 200 Watts while retaining the 165 MH/s hash rate for Ethereum. This reference design is manufactured by NVIDIA and has no active cooling, as it is meant to be cooled in high-density server racks. Only a colossal heatsink is attached, meaning that the cooling needs to come from a third party. As far as pricing is concerned, Linus managed to get this card for $5000, making it a costly mining option.

More images follow...

Read full story

Return to Keyword Browsing

News Posts matching #HBM2E

European HPC Processor "Rhea1" Tapes Out, Launch Delayed to 2026

Huawei Prepares 6 nm Ascend 920C Accelerator: 900 TeraFLOPS, 4000 GB/s HBM3

Altera Starts Production Shipments of Agilex 7 FPGA M-Series

ASUS Showcases Servers Based on Intel Xeon 6, Intel Gaudi 3 at CloudFest 2025

European Supercomputer Chip SiPearl Rhea Delayed, But Upgraded with More Cores

Intel Launches Gaudi 3 AI Accelerator: 70% Faster Training, 50% Faster Inference Compared to NVIDIA H100, Promises Better Efficiency Too

SK Hynix To Invest $1 Billion into Advanced Chip Packaging Facilities

Intel and Ohio Supercomputer Center Double AI Processing Power with New HPC Cluster

Manufacturers Anticipate Completion of NVIDIA's HBM3e Verification by 1Q24; HBM4 Expected to Launch in 2026

EK Fluid Works Enhances Portfolio with NVIDIA H100 GPU Integration

BBCube 3D Could be the Future of Stacked DRAM

NVIDIA Allegedly Preparing H100 GPU with 94 and 64 GB Memory

AMD Confirms that Instinct MI300X GPU Can Consume 750 W

Chinese GPU Maker Biren Technology Loses its Co-Founder, Only Months After Revealing New GPUs

Samsung Develops GDDR6W Memory Standard: Double the Bandwidth and Density of GDDR6 Through Packaging Innovations

NVIDIA Could Launch Hopper H100 PCIe GPU with 120 GB Memory

Intel Xeon Scalable "Sapphire Rapids" with HBM2E Beaten by Older AMD EPYC "Milan-X" in Leaked Benchmarks

AMD Releases its CDNA2 MI250X "Aldebaran" HPC GPU Block Diagram

BIREN BR100 Detailed: China's AI-HPC Processor Storms into the HPC GPU Big Leagues

Biren Technology Unveils BR100 7 nm HPC GPU with 77 Billion Transistors

BittWare Announces PCIe 5.0/CXL FPGA Accelerators Featuring Intel Agilex M-Series and I-Series to Drive Memory and Interconnectivity Improvements

Intel Announces "Rialto Bridge" Accelerated AI and HPC Processor

Intel Meteor Lake, HBM2E-enabled Sapphire Rapids, and Ponte Vecchio Pictured

Intel Details Ponte Vecchio Accelerator: 63 Tiles, 600 Watt TDP, and Lots of Bandwidth

NVIDIA CMP 170HX Mining Card Tested, Based on GA100 GPU SKU

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts