News Posts matching #HPC

Return to Keyword Browsing

UK Competition Regulator Probes AMD's Buyout of Xilinx

British competition regulator Competition and Markets Authority (CMA) on Monday, launched an enquiry into the ramifications of AMD's buy-out of FPGA maker Xilinx. The agency is soliciting opinions from the public on whether the $35 billion all-stock purchase will make goods and services less competitive for the UK. Unlike NVIDIA's Arm buyout the Xilinx acquisition is seeing no opposition from tech-giants. The Register notes that AMD could combine Xilinx's FPGAs with its x86 CPU and RDNA SIMD to create highly customizable HPC accelerators. AMD president Dr Lisa Su said "By combining our world-class engineering team and deep domain expertise, we will create an industry leader with the vision, talent and scale to define the future of high performance computing."

Samsung Unveils Industry-First Memory Module Incorporating New CXL Interconnect

Samsung Electronics Co., Ltd., the world leader in advanced memory technology, today unveiled the industry's first memory module supporting the new Compute Express Link (CXL) interconnect standard. Integrated with Samsung's Double Data Rate 5 (DDR5) technology, this CXL-based module will enable server systems to significantly scale memory capacity and bandwidth, accelerating artificial intelligence (AI) and high-performance computing (HPC) workloads in data centers.

The rise of AI and big data has been fueling the trend toward heterogeneous computing, where multiple processors work in parallel to process massive volumes of data. CXL—an open, industry-supported interconnect based on the PCI Express (PCIe) 5.0 interface—enables high-speed, low latency communication between the host processor and devices such as accelerators, memory buffers and smart I/O devices, while expanding memory capacity and bandwidth well beyond what is possible today. Samsung has been collaborating with several data center, server and chipset manufacturers to develop next-generation interface technology since the CXL consortium was formed in 2019.

Intel Ponte Vecchio GPU Scores Another Win in Leibniz Supercomputing Centre

Today, Lenovo in partnership with Intel has announced that Leibniz Supercomputing Centre (LRZ) is building a supercomputer powered by Intel's next-generation technologies. Specifically, the supercomputer will use Intel's Sapphire Rapids CPUs in combination with the highly-teased Ponte Vecchio GPUs to power the applications running at Leibniz Supercomputing Centre. Along with the various processors, the LRZ will also deploy Intel Optane persistent memory to process the huge amount of data the LRZ has and is producing. The integration of HPC and AI processing will be enabled by the expansion of LRZ's current supercomputer called SuperMUG-NG, which will receive an upgrade in 2022, which will feature both Sapphire Rapids and Ponte Vecchio.

Mr. Raja Koduri, Intel graphics guru, has on Twitter teased that this supercomputer installment will represent a combination of Sapphire Rapids, Ponte Vecchio, Optane, and One API all in one machine. The system will use over one petabyte of Distributed Asynchronous Object Storage (DAOS) based on the Optane technologies. Then, Mr. Koduri has teased some Ponte Vecchio eye candy, which is a GIF of tiles combining to form a GPU, which you can check out here. You can also see some pictures of Ponte Vecchio below.
Intel Ponte Vecchio GPU Intel Ponte Vecchio GPU Intel Ponte Vecchio GPU Intel Ponte Vecchio GPU

Samsung Announces Availability of Its Next Generation 2.5D Integration Solution I-Cube4 for High-Performance Applications

Samsung Electronics Co., Ltd., a world leader in advanced semiconductor technology, today announced the immediate availability of its next-generation 2.5D packaging technology Interposer-Cube4 (I-Cube4), leading the evolution of chip packaging technology once again. Samsung's I-CubeTM is a heterogeneous integration technology that horizontally places one or more logic dies (CPU, GPU, etc.) and several High Bandwidth Memory (HBM) dies on top of a silicon interposer, making multiple dies operate as a single chip in one package.

Samsung's new I-Cube4, which incorporates four HBMs and one logic die, was developed in March as the successor of I-Cube2. From high-performance computing (HPC) to AI, 5G, cloud and large data center applications, I-Cube4 is expected to bring another level of fast communication and power efficiency between logic and memory through heterogeneous integration.

Arm Announces Neoverse N2 and V1 Server Platforms

The demands of data center workloads and internet traffic are growing exponentially, and new solutions are needed to keep up with these demands while reducing the current and anticipated growth of power consumption. But the variety of workloads and applications being run today means the traditional one-size-fits all approach to computing is not the answer. The industry demands flexibility; design freedom to achieve the right level of compute for the right application.

As Moore's Law comes to an end, solution providers are seeking specialized processing. Enabling specialized processing has been a focal point since the inception of our Neoverse line of platforms, and we expect these latest additions to accelerate this trend.

Foundry Revenue Projected to Reach Historical High of US$94.6 Billion in 2021 Thanks to High 5G/HPC/End-Device Demand, Says TrendForce

As the global economy enters the post-pandemic era, technologies including 5G, WiFi6/6E, and HPC (high-performance computing) have been advancing rapidly, in turn bringing about a fundamental, structural change in the semiconductor industry as well, according to TrendForce's latest investigations. While the demand for certain devices such as notebook computers and TVs underwent a sharp uptick due to the onset of the stay-at-home economy, this demand will return to pre-pandemic levels once the pandemic has been brought under control as a result of the global vaccination drive. Nevertheless, the worldwide shift to next-gen telecommunication standards has brought about a replacement demand for telecom and networking devices, and this demand will continue to propel the semiconductor industry, resulting in high capacity utilization rates across the major foundries. As certain foundries continue to expand their production capacities this year, TrendForce expects total foundry revenue to reach a historical high of US$94.6 billion this year, an 11% growth YoY.

Intel CEO on NVIDIA CPUs: They Are Responding to Us

NVIDIA has recently announced the company's first standalone Grace CPU that will come out as a product in 2023. NVIDIA has designed Grace on Arm ISA, likely ARM v9, to represent a new way that data centers are built and deliver a whole new level of HPC and AI performance. However, the CPU competition in a data center space is considered one of the hardest markets to enter. Usually, the market is a duopoly between Intel and AMD, which supply x86 processors to server vendors. In the past few years, there have been few Arm CPUs that managed to enter the data canter space, however, NVIDIA is aiming to deliver much more performance and grab a bigger piece of the market.

As a self-proclaimed leader in AI, Intel is facing hard competition from NVIDIA in the coming years. In an interview with Fortune, Intel's new CEO Pat Gelsinger has talked about NVIDIA and how the company sees the competition between the two. Mr. Gelsinger is claiming that Intel is a leader in CPUs that feature AI acceleration built in the chip and that they are not playing defense, but rather offense against NVIDIA. You can check out the whole quote from the interview below.

KIOXIA PCIe 4.0 NVMe SSDs Now Qualified for NVIDIA Magnum IO GPUDirect Storage

KIOXIA today announced that its lineup of CM6 Series PCIe 4.0 enterprise NVMe SSDs has been successfully tested and certified to support NVIDIA's Magnum IO GPUDirect Storage. Modern AI and data science applications are synonymous with massive datasets - as are the storage requirements that go along with them. Part of the NVIDIA Magnum IO subsystem designed for GPU-accelerated compute environments, NVIDIA Magnum IO GPUDirect Storage allows the GPU to bypass the CPU and communicate directly with NVMe SSD storage. This improves overall system performance while reducing the impact on host CPU and memory resources. Through rigorous testing conducted by NVIDIA, KIOXIA's CM6 drives have been confirmed to meet the demanding storage requirements of GPU-intensive applications.

Large AI/ML, HPC modeling and data analytics datasets need to be moved and processed in real-time, pushing performance requirements through the roof," said Neville Ichhaporia, vice president, SSD marketing and product management for KIOXIA America, Inc. "By delivering speeds up to 16.0 gigatransfers per second throughput per lane, our CM6 Series SSDs enable NVIDIA's Magnum IO GPUDirect Storage to work with increasingly large and distributed datasets, thereby improving overall application performance and providing a path to scaling dataset sizes even further."

OpenFive Tapes Out SoC for Advanced HPC/AI Solutions on TSMC 5 nm Technology

OpenFive, a leading provider of customizable, silicon-focused solutions with differentiated IP, today announced the successful tape out of a high-performance SoC on TSMC's N5 process, with integrated IP solutions targeted for cutting edge High Performance Computing (HPC)/AI, networking, and storage solutions.

The SoC features an OpenFive High Bandwidth Memory (HBM3) IP subsystem and D2D I/Os, as well as a SiFive E76 32-bit CPU core. The HBM3 interface supports 7.2 Gbps speeds allowing high throughput memories to feed domain-specific accelerators in compute-intensive applications including HPC, AI, Networking, and Storage. OpenFive's low-power, low-latency, and highly scalable D2D interface technology allows for expanding compute performance by connecting multiple dice together using an organic substrate or a silicon interposer in a 2.5D package.

NVIDIA Announces Grace CPU for Giant AI and High Performance Computing Workloads

NVIDIA today announced its first data center CPU, an Arm-based processor that will deliver 10x the performance of today's fastest servers on the most complex AI and high performance computing workloads.

The result of more than 10,000 engineering years of work, the NVIDIA Grace CPU is designed to address the computing requirements for the world's most advanced applications—including natural language processing, recommender systems and AI supercomputing—that analyze enormous datasets requiring both ultra-fast compute performance and massive memory. It combines energy-efficient Arm CPU cores with an innovative low-power memory subsystem to deliver high performance with great efficiency.

Intel Announces 10 nm Third Gen Xeon Scalable Processors "Ice Lake"

Intel today launched its most advanced, highest performance data center platform optimized to power the industry's broadest range of workloads—from the cloud to the network to the intelligent edge. New 3rd Gen Intel Xeon Scalable processors (code-named "Ice Lake") are the foundation of Intel's data center platform, enabling customers to capitalize on some of the most significant business opportunities today by leveraging the power of AI.

New 3rd Gen Intel Xeon Scalable processors deliver a significant performance increase compared with the prior generation, with an average 46% improvement on popular data center workloads. The processors also add new and enhanced platform capabilities including Intel SGX for built-in security, and Intel Crypto Acceleration and Intel DL Boost for AI acceleration. These new capabilities, combined with Intel's broad portfolio of Intel Select Solutions and Intel Market Ready Solutions, enable customers to accelerate deployments across cloud, AI, enterprise, HPC, networking, security and edge applications.

Raja Koduri Teases "Petaflops in Your Palm" Intel Xe-HPC Ponte Vecchio GPU

Raja Koduri of Intel has today posted an interesting video on his Twitter account. Showing one of the greatest engineering marvels Intel has ever created, Mr. Koduri has teased what is to come when the company launches the Xe-HPC Ponte Vecchio graphics card designed for high-performance computing workloads. Showcased today was the "petaflops in your palm" chip, designed to run AI workloads with a petaflop of computing power. Having over 100 billion transistors, the chip uses as much as 47 tiles combined in the most advanced packaging technology ever created by Intel. They call them "magical tiles", and they bring logic, memory, and I/O controllers, all built using different semiconductor nodes.

Mr. Koduri also pointed out that the chip was born after only two years after the concept, which is an awesome achievement, given that the research of the new silicon takes years. The chip will be the heart of many systems that require massive computational power, especially the ones like AI. Claiming to have the capability to perform quadrillion floating-point operations per second (one petaflop), the chip will be a true monster. So far we don't know other details like the floating-point precision it runs at with one petaflop or the total power consumption of those 47 tiles, so we have to wait for more details.
More pictures follow.

Intel to Launch 3rd Gen Intel Xeon Scalable Portfolio on April 6

Intel today revealed that it will launch its 3rd Generation Xeon Scalable processor series at an online event titled "How Wonderful Gets Done 2021," on April 6, 2021. This will be one of the first major media events headed by Intel's new CEO, Pat Gelsinger. Besides the processor launch, Intel is expected to detail many of its advances in the enterprise space, particularly in the areas of 5G infrastructure rollout, edge computing, and AI/HPC. The 3rd Gen Xeon Scalable processors are based on the new 10 nm "Ice Lake-SP" silicon, heralding the company's first CPU core IPC gain in the server space since 2015. The processors also introduce new I/O capabilities, such as PCI-Express 4.0.

TYAN Now Offers AMD EPYC 7003 Processor Powered Systems

TYAN, an industry-leading server platform design manufacturer and a MiTAC Computing Technology Corporation subsidiary, today introduced AMD EPYC 7003 Series Processor-based server platforms featuring efficiency and performance enhancements in hardware, security, and memory density for the modern data center.

"Big data has become capital today. Large amounts of data and faster answers drive better decisions. TYAN's industry-leading server platforms powered by 3rd Gen AMD EPYC processors enable businesses to make more accurate decisions with higher precision," said Danny Hsu, Vice President of MiTAC Computing Technology Corporation's Server Infrastructure BU. "Moving the bar once more for workload performance, EPYC 7003 Series processors provide the performance needed in the heart of the enterprise to help IT professionals drive faster time to results," said Ram Peddibhotla, corporate vice president, EPYC product management, AMD. "Time is the new metric for efficiency and EPYC 7003 Series processors are the perfect choice for the most diverse workloads, helping provide more and better data to drive better business outcomes."

Fujitsu Completes Development of World's Fastest Supercomputer

Fugaku is Japan's supercomputer that has been developed as a core system for the innovative High-Performance Computing Infrastructure (HPCI) promoted by Japan's Ministry of Education, Culture, Sports, Science and Technology (MEXT). In 2014, RIKEN and Fujitsu began joint development of Fugaku and completed delivery of all racks in May 2020. Since then, they have developed and optimized the user environment in preparation for the commencement of shared use.

In the meantime, Fugaku has claimed the world's top spot for two consecutive terms in June and November 2020 in four categories on the major high-performance computer rankings: the TOP500, HPCG, HPL-AI, as well as the Graph 500, and has been used on a trial basis under the "Program for Promoting Research on the Supercomputer Fugaku", "research projects aimed to combat COVID-19", etc. since April 2020. In these trials, two projects, "Study on Large-Scale Numerical Fluid Simulation" and "Largest Ever Meteorological Calculation" have already been selected as the ACM Gordon Bell Prize finalists. In addition, research on "Prediction and Countermeasures for Infection by Virus Contaminated Droplet in Indoor Environment" has led to changes in people's lifestyles, and Fugaku is already making steady progress toward becoming a key technological platform for science and for building Society 5.0.

SiPearl to Manufacture its 72-Core Rhea HPC SoC at TSMC Facilities

SiPearl has this week announced their collaboration with Open-Silicon Research, the India-based entity of OpenFive, to produce the next-generation SoC designed for HPC purposes. SiPearl is a part of the European Processor Initiative (EPI) team and is responsible for designing the SoC itself that is supposed to be a base for the European exascale supercomputer. In the partnership with Open-Silicon Research, SiPearl expects to get a service that will integrate all the IP blocks and help with the tape out of the chip once it is done. There is a deadline set for the year 2023, however, both companies expect the chip to get shipped by Q4 of 2022.

When it comes to details of the SoC, it is called Rhea and it will be a 72-core Arm ISA based processor with Neoverse Zeus cores interconnected by a mesh. There are going to be 68 mesh network L3 cache slices in between all of the cores. All of that will be manufactured using TSMC's 6 nm extreme ultraviolet lithography (EUV) technology for silicon manufacturing. The Rhea SoC design will utilize 2.5D packaging with many IP blocks stitched together and HBM2E memory present on the die. It is unknown exactly what configuration of HBM2E is going to be present. The system will also see support for DDR5 memory and thus enable two-level system memory by combining HBM and DDR. We are excited to see how the final product looks like and now we wait for more updates on the project.

Revenue of Top 10 Foundries Expected to Increase by 20% YoY in 1Q21 in Light of Fully Loaded Capacities, Says TrendForce

Demand in the global foundry market remains strong in 1Q21, according to TrendForce's latest investigations. As various end-products continue to generate high demand for chips, clients of foundries in turn stepped up their procurement activities, which subsequently led to a persistent shortage of production capacities across the foundry industry. TrendForce therefore expects foundries to continue posting strong financial performances in 1Q21, with a 20% YoY growth in the combined revenues of the top 10 foundries, while TSMC, Samsung, and UMC rank as the top three in terms of market share. However, the future reallocation of foundry capacities still remains to be seen, since the industry-wide effort to accelerate the production of automotive chips may indirectly impair the production and lead times of chips for consumer electronics and industrial applications.

TSMC has been maintaining a steady volume of wafer inputs at its 5 nm node, and these wafer inputs are projected to account for 20% of the company's revenue. On the other hand, owing to chip orders from AMD, Nvidia, Qualcomm, and MediaTek, demand for TSMC's 7 nm node is likewise strong and likely to account for 30% of TSMC's revenue, a slight increase from the previous quarter. On the whole, TSMC's revenue is expected to undergo a 25% increase YoY in 1Q21 and set a new high on the back of surging demand for 5G, HPC, and automotive applications.

Samsung Develops Industry's First High Bandwidth Memory with AI Processing Power

Samsung Electronics, the world leader in advanced memory technology, today announced that it has developed the industry's first High Bandwidth Memory (HBM) integrated with artificial intelligence (AI) processing power—the HBM-PIM. The new processing-in-memory (PIM) architecture brings powerful AI computing capabilities inside high-performance memory, to accelerate large-scale processing in data centers, high performance computing (HPC) systems and AI-enabled mobile applications.

Kwangil Park, senior vice president of Memory Product Planning at Samsung Electronics stated, "Our groundbreaking HBM-PIM is the industry's first programmable PIM solution tailored for diverse AI-driven workloads such as HPC, training and inference. We plan to build upon this breakthrough by further collaborating with AI solution providers for even more advanced PIM-powered applications."

HPE Develops New Spaceborne Computer-2 Computing System for the International Space Station

Hewlett Packard Enterprise (HPE) today announced it is accelerating space exploration and increasing self-sufficiency for astronauts by enabling real-time data processing with advanced commercial edge computing in space for the first time. Astronauts and space explorers aboard the International Space Station (ISS) will speed time-to-insight from months to minutes on various experiments in space, from processing medical imaging and DNA sequencing to unlocking key insights from volumes of remote sensors and satellites, using HPE's Spaceborne Computer-2 (SBC-2), an edge computing system.

Spaceborne Computer-2 is scheduled to launch into orbit on the 15th Northrop Grumman Resupply Mission to Space Station (NG-15) on February 20 and will be available for use on the International Space Station for the next 2-3 years. The NG-15 spacecraft has been named "SS. Katherine Johnson" in honor of Katherine Johnson, a famed Black, female NASA mathematician who was critical to the early success of the space program.

Intel Xe HPC Multi-Chip Module Pictured

Intel SVP for architecture, graphics, and software, Raja Koduri, tweeted the first picture of the Xe HPC scalar compute processor multi-chip module, with its large IHS off. It reveals two large main logic dies built on the 7 nm silicon fabrication process from a third-party foundry. The Xe HPC processor will be targeted at supercomputing and AI-ML applications, so the main logic dies are expected to be large arrays of execution units, spread across what appear to be eight clusters, surrounded by ancillary components such as memory controllers and interconnect PHYs.

There appear to be two kinds of on-package memory on the Xe HPC. The first kind is HBM stacks (from either the HBM2E or HBM3 generation), serving as the main high-speed memory; while the other is a mystery for now. This could either be another class of DRAM, serving a serial processing component on the main logic die; or a non-volatile memory, such as 3D XPoint or NAND flash (likely the former), providing fast persistent storage close to the main logic dies. There appear to be four HBM-class stacks per logic die (so 4096-bit per die and 8192-bit per package), and one die of this secondary memory per logic die.

Tachyum Prodigy Software Emulation Systems Now Available for Pre-Order

Tachyum Inc. today announced that it is signing early adopter customers for the software emulation system for its Prodigy Universal Processor, customers may begin the process of native software development (i.e. using Prodigy Instruction Set Architecture) and porting applications to run on Prodigy. Prodigy software emulation systems will be available at the end of January 2021.

Customers and partners can use Prodigy's software emulation for evaluation, development and debug, and with it, they can begin to transition existing applications that demand high performance and low power to run optimally on Prodigy processors. Pre-built systems include a Prodigy emulator, native Linux, toolchains, compilers, user mode applications, x86, ARM and RISC-V emulators. Software updates will be issued as needed.

NVIDIA is Preparing Co-Packaged Photonics for NVLink

During its GPU Technology Conference (GTC) in China, Mr. Bill Dally—NVIDIA's chief scientist and SVP of research—has presented many interesting things about how the company plans to push the future of HPC, AI, graphics, healthcare, and edge computing. Mr. Dally has presented NVIDIA's research efforts and what is the future vision for its products. Among one of the most interesting things presented was a plan to ditch the standard electrical data transfer and use the speed of light to scale and advance node communication. The new technology utilizing optical data transfer is supposed to bring the power required to transfer by a significant amount.

The proposed plan by the company is to use an optical NVLink equivalent. While the current NVLink 2.0 chip uses eight pico Joules per bit (8 pJ/b) and can send signals only to 0.3 meters without any repeaters, the optical replacement is capable of sending data anywhere from 20 to 100 meters while consuming half the power (4 pJ/b). NVIDIA has conceptualized a system with four GPUs in a tray, all of which are connected by light. To power such a setup, there are lasers that produce 8-10 wavelengths. These wavelengths are modulated onto this at a speed of 25 Gbit/s per wavelength, using ring resonators. On the receiving side, ring photodetectors are used to pick up the wavelength and send it to the photodetector. This technique ensures fast data transfer capable of long distances.

Intel and Argonne Developers Carve Path Toward Exascale 

Intel and Argonne National Laboratory are collaborating on the co-design and validation of exascale-class applications using graphics processing units (GPUs) based on Intel Xe-HP microarchitecture and Intel oneAPI toolkits. Developers at Argonne are tapping into Intel's latest programming environments for heterogeneous computing to ensure scientific applications are ready for the scale and architecture of the Aurora supercomputer at deployment.

"Our close collaboration with Argonne is enabling us to make tremendous progress on Aurora, as we seek to bring exascale leadership to the United States. Providing developers early access to hardware and software environments will help us jumpstart the path toward exascale so that researchers can quickly start taking advantage of the system's massive computational resources." -Trish Damkroger, Intel vice president and general manager of High Performance Computing.

Arm Based Fugaku Supercomputer Retains #1 Top500 Spot

Fugaku—the Arm technology-based supercomputer jointly developed by RIKEN and Fujitsu—was awarded the number one spot on the Top500 list for the second time in a row. This achievement further highlights the rapidly evolving demands of high-performance computing (HPC) that Arm technology uniquely addresses through the unmatched combination of power efficiency, performance, and scalability.

In addition to the great work RIKEN and Fujitsu have done, we're seeing more adoption for Arm-based solutions across our ecosystem. ETRI, the national computing institute of the Republic of Korea, recently announced plans to adopt the upcoming Neoverse V1 (formerly code-named Zeus) CPU design, which feature Arm Scalable Vector Extensions (SVE), for its K-AB21 system. ETRI has set a goal of 16 teraflops per CPU and 1600 teraflops per rack for AB 21 (which stands for 'Artificial Brain 21') while reducing power consumption by 60% compared to its target.

TOP500 Expands Exaflops Capacity Amidst Low Turnover

The 56th edition of the TOP500 saw the Japanese Fugaku supercomputer solidify its number one status in a list that reflects a flattening performance growth curve. Although two new systems managed to make it into the top 10, the full list recorded the smallest number of new entries since the project began in 1993.

The entry level to the list moved up to 1.32 petaflops on the High Performance Linpack (HPL) benchmark, a small increase from 1.23 petaflops recorded in the June 2020 rankings. In a similar vein, the aggregate performance of all 500 systems grew from 2.22 exaflops in June to just 2.43 exaflops on the latest list. Likewise, average concurrency per system barely increased at all, growing from 145,363 cores six months ago to 145,465 cores in the current list.
Return to Keyword Browsing
Dec 18th, 2024 23:48 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts