News Posts matching #accelerator

Return to Keyword Browsing

AMD Instinct MI300X Accelerators Power Microsoft Azure OpenAI Service Workloads and New Azure ND MI300X V5 VMs

Today at Microsoft Build, AMD (NASDAQ: AMD) showcased its latest end-to-end compute and software capabilities for Microsoft customers and developers. By using AMD solutions such as AMD Instinct MI300X accelerators, ROCm open software, Ryzen AI processors and software, and Alveo MA35D media accelerators, Microsoft is able to provide a powerful suite of tools for AI-based deployments across numerous markets. The new Microsoft Azure ND MI300X virtual machines (VMs) are now generally available, giving customers like Hugging Face, access to impressive performance and efficiency for their most demanding AI workloads.

"The AMD Instinct MI300X and ROCm software stack is powering the Azure OpenAI Chat GPT 3.5 and 4 services, which are some of the world's most demanding AI workloads," said Victor Peng, president, AMD. "With the general availability of the new VMs from Azure, AI customers have broader access to MI300X to deliver high-performance and efficient solutions for AI applications."

Intel's Next-Gen Falcon Shores GPU to Consume 1500 W, No Air-Cooled Variant Planned

Intel's upcoming Falcon Shores GPU is shaping up to be a powerhouse for AI and high-performance computing (HPC) workloads, but it will also be an extreme power hog. The processor, combining Gaudi and Ponte Vecchio successors into a single GPU, is expected to consume an astonishing 1500 W of power - more than even Nvidia's beefy B200 accelerator, which draws 1000 W. This immense power consumption will require advanced cooling solutions to ensure the Falcon Shores GPU operates efficiently and safely. Intel's partners may turn to liquid cooling or even full immersion liquid cooling, a technology Intel has been promoting for power-hungry data center hardware. The high power draw is the cost of the Falcon Shores GPU's formidable performance promises. Intel claims it will deliver 5x higher performance per watt and 5x more memory capacity and bandwidth compared to its Ponte Vecchio products.

Intel may need to develop proprietary hardware modules or a new Open Accelerator Module (OAM) spec to support such extreme power levels, as the current OAM 2.0 tops out around 1000 W. Slated for release in 2025, the Falcon Shores GPU will be Intel's GPU IP based on its next-gen Xe graphics architecture. It aims to be a major player in the AI accelerator market, backed by Intel's robust oneAPI software development ecosystem. While the 1500 W power consumption is sure to raise eyebrows, Intel is betting that the Falcon Shores GPU's supposedly impressive performance will make it an enticing option for AI and HPC customers willing to invest in robust cooling infrastructure. The ultra-high-end accelerator market is heating up, and the HPC accelerator market needs a Ponte Vecchio successor.

Intel Ponte Vecchio Waves Goodbye, Company Focuses on Falcon Shores for 2025 Release

According to ServeTheHome, Intel has decided to discontinue its high-performance computing (HPC) product line, Ponte Vecchio, and shift its focus towards developing its next-generation data center GPU, codenamed Falcon Shores. This decision comes as Intel aims to streamline its operations and concentrate its resources on the most promising and competitive offerings. The Ponte Vecchio GPU, released in January of 2023, was intended to be Intel's flagship product for the HPC market, competing against the likes of NVIDIA's H100 and AMD's Instinct MI series. However, despite its impressive specifications and features, Ponte Vecchio faced significant delays and challenges in its development and production cycle. Intel's decision to abandon Ponte Vecchio is pragmatic, recognizing the intense competition and rapidly evolving landscape of the data center GPU market.

By pivoting its attention to Falcon Shores, Intel aims to deliver a more competitive and cutting-edge solution that can effectively challenge the dominance of its rivals. Falcon Shores, slated for release in 2025, is expected to leverage Intel's latest process node and architectural innovations. Currently, Intel has Gaudi 2 and Gaudi 3 accelerators for AI. However, the HPC segment is left without a clear leader in the company's product offerings. Intel's Ponte Vecchio is powering Aurora exascale supercomputer, which is the latest submission to the TOP500 supercomputer lists. This is also coming after the Rialto Bridge cancellation, which was supposed to be an HPC-focused card. In the future, the company will focus only on the Falcon Shores accelerator, which will unify HPC and AI needs for high-precision FP64 and lower-precision FP16/INT8.

SpiNNcloud Systems Announces First Commercially Available Neuromorphic Supercomputer

Today, in advance of ISC High Performance 2024, SpiNNcloud Systems announced the commercial availability of its SpiNNaker2 platform, a supercomputer-level hybrid AI high-performance computer system based on principles of the human brain. Pioneered by Steve Furber, designer of the original ARM and SpiNNaker1 architectures, the SpiNNaker2 supercomputing platform uses a large number of low-power processors for efficiently computing AI and other workloads.

First-generation SpiNNaker1 architecture is currently used in dozens of research groups across 23 countries worldwide. Sandia National Laboratories, Technical University of München and Universität Göttingen are among the first customers placing orders for SpiNNaker2, which was developed around commercialized IP invented in the Human Brain Project, a billion-euro research project funded by the European Union to design intelligent, efficient artificial systems.

Apple Introduces the M4 Chip

Apple today announced M4, the latest chip delivering phenomenal performance to the all-new iPad Pro. Built using second-generation 3-nanometer technology, M4 is a system on a chip (SoC) that advances the industry-leading power efficiency of Apple silicon and enables the incredibly thin design of iPad Pro. It also features an entirely new display engine to drive the stunning precision, color, and brightness of the breakthrough Ultra Retina XDR display on iPad Pro. A new CPU has up to 10 cores, while the new 10-core GPU builds on the next-generation GPU architecture introduced in M3, and brings Dynamic Caching, hardware-accelerated ray tracing, and hardware-accelerated mesh shading to iPad for the first time. M4 has Apple's fastest Neural Engine ever, capable of up to 38 trillion operations per second, which is faster than the neural processing unit of any AI PC today. Combined with faster memory bandwidth, along with next-generation machine learning (ML) accelerators in the CPU, and a high-performance GPU, M4 makes the new iPad Pro an outrageously powerful device for artificial intelligence.

"The new iPad Pro with M4 is a great example of how building best-in-class custom silicon enables breakthrough products," said Johny Srouji, Apple's senior vice president of Hardware Technologies. "The power-efficient performance of M4, along with its new display engine, makes the thin design and game-changing display of iPad Pro possible, while fundamental improvements to the CPU, GPU, Neural Engine, and memory system make M4 extremely well suited for the latest applications leveraging AI. Altogether, this new chip makes iPad Pro the most powerful device of its kind."

Sony PlayStation 5 Pro Specifications Confirmed, Console Arrives Before Holidays

Thanks for the detailed information obtained by The Verge, today we confirm previously leaked details as Sony gears up to unveil the highly anticipated PlayStation 5 Pro, codenamed "Trinity." According to insider reports, Sony is urging developers to optimize their games for the PS5 Pro, with a primary focus on enhancing ray tracing capabilities. The console is expected to feature an RDNA 3 GPU with 30 WGP running BVH8, capable of 33.5 TeraFLOPS of FP32 single-precision computing power, and a slightly quicker CPU running at 3.85 GHz, enabling it to render games with ray tracing enabled or achieve higher resolutions and frame rates in select titles. Sony anticipates GPU rendering on the PS5 Pro to be approximately 45 percent faster than the standard PlayStation 5. The PS5 Pro GPU will be larger and utilize faster system memory to bolster ray tracing performance, boasting up to three times the speed of the regular PS5.

Additionally, the console will employ a more powerful ray tracing architecture, backed by PlayStation Spectral Super Resolution (PSSR), allowing developers to leverage graphics features like ray tracing more extensively. To support this endeavor, Sony is providing developers with test kits, and all games submitted for certification from August onward must be compatible with the PS5 Pro. Insider Gaming, the first to report the full PS5 Pro specs, suggests a potential release during the 2024 holiday period. The PS5 Pro will also feature modifications for developers regarding system memory, with Sony increasing the memory bandwidth from 448 GB/s to 576 GB/s, enhancing efficiency for an even more immersive gaming experience. To do AI processing, there is an custom AI accelerator capable of 300 8-bit INT8 TOPS and 67 16-bit FP16 TeraFLOPS, in addition to ACV audio codec running up to 35% faster.

Intel Launches Gaudi 3 AI Accelerator: 70% Faster Training, 50% Faster Inference Compared to NVIDIA H100, Promises Better Efficiency Too

During the Vision 2024 event, Intel announced its latest Gaudi 3 AI accelerator, promising significant improvements over its predecessor. Intel claims the Gaudi 3 offers up to 70% improvement in training performance, 50% better inference, and 40% better efficiency than Nvidia's H100 processors. The new AI accelerator is presented as a PCIe Gen 5 dual-slot add-in card with a 600 W TDP or an OAM module with 900 W. The PCIe card has the same peak 1,835 TeraFLOPS of FP8 performance as the OAM module despite a 300 W lower TDP. The PCIe version works as a group of four per system, while the OAM HL-325L modules can be run in an eight-accelerator configuration per server. This likely will result in a lower sustained performance, given the lower TDP, but it confirms that the same silicon is used, just finetuned with a lower frequency. Built on TSMC's N5 5 nm node, the AI accelerator features 64 Tensor Cores, delivering double the FP8 and quadruple FP16 performance over the previous generation Gaudi 2.

The Gaudi 3 AI chip comes with 128 GB of HBM2E with 3.7 TB/s of bandwidth and 24 200 Gbps Ethernet NICs, with dual 400 Gbps NICs used for scale-out. All of that is laid out on 10 tiles that make up the Gaudi 3 accelerator, which you can see pictured below. There is 96 MB of SRAM split between two compute tiles, which acts as a low-level cache that bridges data communication between Tensor Cores and HBM memory. Intel also announced support for the new performance-boosting standardized MXFP4 data format and is developing an AI NIC ASIC for Ultra Ethernet Consortium-compliant networking. The Gaudi 3 supports clusters of up to 8192 cards, coming from 1024 nodes comprised of systems with eight accelerators. It is on track for volume production in Q3, offering a cost-effective alternative to NVIDIA accelerators with the additional promise of a more open ecosystem. More information and a deeper dive can be found in the Gaudi 3 Whitepaper.

Intel Unleashes Enterprise AI with Gaudi 3, AI Open Systems Strategy and New Customer Wins

At the Intel Vision 2024 customer and partner conference, Intel introduced the Intel Gaudi 3 accelerator to bring performance, openness and choice to enterprise generative AI (GenAI), and unveiled a suite of new open scalable systems, next-gen products and strategic collaborations to accelerate GenAI adoption. With only 10% of enterprises successfully moving GenAI projects into production last year, Intel's latest offerings address the challenges businesses face in scaling AI initiatives.

"Innovation is advancing at an unprecedented pace, all enabled by silicon - and every company is quickly becoming an AI company," said Intel CEO Pat Gelsinger. "Intel is bringing AI everywhere across the enterprise, from the PC to the data center to the edge. Our latest Gaudi, Xeon and Core Ultra platforms are delivering a cohesive set of flexible solutions tailored to meet the changing needs of our customers and partners and capitalize on the immense opportunities ahead."

Unannounced AMD Instinct MI388X Accelerator Pops Up in SEC Filing

AMD's Instinct family has welcomed a new addition—the MI388X AI accelerator—as discovered in a lengthy regulatory 10K filing (submitted to the SEC). The document reveals that the unannounced SKU—along with the MI250, MI300X and MI300A integrated circuits—cannot be sold to Chinese customers due to updated US trade regulations (new requirements were issued around October 2023). Versal VC2802 and VE2802 FPGA products are also mentioned in the same section. Earlier this month, AMD's Chinese market-specific Instinct MI309 package was deemed to be too powerful for purpose by the US Department of Commerce.

AMD has not published anything about the Instinct MI388X's official specification, and technical details have not emerged via leaks. The "X" tag likely implies that it has been designed for AI and HPC applications, akin to the recently launched MI300X accelerator. The designation of a higher model number could (naturally) point to a potentially more potent spec sheet, although Tom's Hardware posits that MI388X is a semi-custom spinoff of an existing model.

Arm China Develops NPU Accelerator for AI, Targeting Domestic CPUs

Arm China is making strides in the AI accelerator market with its new neural processing unit (NPU) called Zhouyi. The company aims to integrate the NPU into low-cost domestic CPUs, potentially giving it an edge over competitors like AMD and Intel. Initially a part of Arm Holdings, which licensed IP in China, Arm China took on a new strategy of developing its own IP specifically for Chinese customers a few years ago. While the company does not develop high-performance general-purpose cores, its Zhouyi NPU could become a fundamental building block for affordable processors. A significant step forward is the upcoming addition of an open-source driver for Zhouyi to the Linux kernel. This will make the IP easy to program for software developers, increasing its appeal to chip designers.

Being an open-source driver, the integration in the Linux kernel brings assurance to developers that Zhouyi NPU could be the first in many generations from Arm China. While Zhouyi may not directly compete with offerings from AMD or Intel, its potential for widespread adoption in millions of devices could help Arm China acquire local customers with their IP. The project, which began three years ago with a kernel-only driver, has since evolved into a full driver stack. There is even a development kit board called EAIDK310, powered by Rockwell SoC and Zhouyi NPU, which is available on Aliexpress and Amazon. The integration of AI accelerator technology into the Linux ecosystem is a significant development, though there is still work to be done. Nonetheless, Arm China's Zhouyi NPU and open-source driver are essential to making AI capabilities more accessible and widely available in the domestic Chinese market.

Taiwan Dominates Global AI Server Supply - Government Reportedly Estimates 90% Share

The Taiwanese Ministry of Economic Affairs (MOEA) managed to herd government representatives and leading Information and Communication Technology (ICT) industry figures together for an important meeting, according to DigiTimes Asia. The report suggests that the main topic of discussion focused on an anticipated growth of Taiwan's ICT industry—current market trends were analyzed, revealing that the nation absolutely dominates in the AI server segment. The MOEA has (allegedly) determined that Taiwan has shipped 90% of global AI server equipment—DigiTimes claims (based on insider info) that: "American brand vendors are expected to source their AI servers from Taiwanese partners." North American customers could be (presently) 100% reliant on supplies of Taiwanese-produced equipment—a scenario that potentially complicates ongoing international tensions.

The report posits that involved parties have formed plans to seize opportunities within an evergrowing global demand for AI hardware—a 90% market dominance is clearly not enough for some very ambitious industry bosses—although manufacturers will need to jump over several (rising) cost hurdles. Key components for AI servers are reported to be much higher than vanilla server parts—DigiTimes believes that AI processor/accelerator chips are priced close to ten times higher than general purpose server CPUs. Similar price hikes have reportedly affected AI adjacent component supply chains—notably cooling, power supplies and passive parts. Taiwanese manufacturers have spread operations around the world, but industry watchdogs (largely) believe that the best stuff gets produced on home ground—global expansions are underway, perhaps inching closer to better balanced supply conditions.

NVIDIA Hopper Leaps Ahead in Generative AI at MLPerf

It's official: NVIDIA delivered the world's fastest platform in industry-standard tests for inference on generative AI. In the latest MLPerf benchmarks, NVIDIA TensorRT-LLM—software that speeds and simplifies the complex job of inference on large language models—boosted the performance of NVIDIA Hopper architecture GPUs on the GPT-J LLM nearly 3x over their results just six months ago. The dramatic speedup demonstrates the power of NVIDIA's full-stack platform of chips, systems and software to handle the demanding requirements of running generative AI. Leading companies are using TensorRT-LLM to optimize their models. And NVIDIA NIM—a set of inference microservices that includes inferencing engines like TensorRT-LLM—makes it easier than ever for businesses to deploy NVIDIA's inference platform.

Raising the Bar in Generative AI
TensorRT-LLM running on NVIDIA H200 Tensor Core GPUs—the latest, memory-enhanced Hopper GPUs—delivered the fastest performance running inference in MLPerf's biggest test of generative AI to date. The new benchmark uses the largest version of Llama 2, a state-of-the-art large language model packing 70 billion parameters. The model is more than 10x larger than the GPT-J LLM first used in the September benchmarks. The memory-enhanced H200 GPUs, in their MLPerf debut, used TensorRT-LLM to produce up to 31,000 tokens/second, a record on MLPerf's Llama 2 benchmark. The H200 GPU results include up to 14% gains from a custom thermal solution. It's one example of innovations beyond standard air cooling that systems builders are applying to their NVIDIA MGX designs to take the performance of Hopper GPUs to new heights.

Samsung Prepares Mach-1 Chip to Rival NVIDIA in AI Inference

During its 55th annual shareholders' meeting, Samsung Electronics announced its entry into the AI processor market with the upcoming launch of its Mach-1 AI accelerator chips in early 2025. The South Korean tech giant revealed its plans to compete with established players like NVIDIA in the rapidly growing AI hardware sector. The Mach-1 generation of chips is an application-specific integrated circuit (ASIC) design equipped with LPDDR memory that is envisioned to excel in edge computing applications. While Samsung does not aim to directly rival NVIDIA's ultra-high-end AI solutions like the H100, B100, or B200, the company's strategy focuses on carving out a niche in the market by offering unique features and performance enhancements at the edge, where low power and efficient computing is what matters the most.

According to SeDaily, the Mach-1 chips boast a groundbreaking feature that significantly reduces memory bandwidth requirements for inference to approximately 0.125x compared to existing designs, which is an 87.5% reduction. This innovation could give Samsung a competitive edge in terms of efficiency and cost-effectiveness. As the demand for AI-powered devices and services continues to soar, Samsung's foray into the AI chip market is expected to intensify competition and drive innovation in the industry. While NVIDIA currently holds a dominant position, Samsung's cutting-edge technology and access to advanced semiconductor manufacturing nodes could make it a formidable contender. The Mach-1 has been field-verified on an FPGA, while the final design is currently going through a physical design for SoC, which includes placement, routing, and other layout optimizations.

Chinese Research Institute Utilizing "Banned" NVIDIA H100 AI GPUs

NVIDIA's freshly unveiled "Blackwell" B200 and GB200 AI GPUs will be getting plenty of coverage this year, but many organizations will be sticking with current or prior generation hardware. Team Green is in the process of shipping out compromised "Hopper" designs to customers in China, but the region's appetite for powerful AI-crunching hardware is growing. Last year's China-specific H800 design, and the older "Ampere" A800 chip were deemed too potent—new regulations prevented further sales. Recently, AMD's Instinct MI309 AI accelerator was considered "too powerful to gain unconditional approval from the US Department of Commerce." Natively-developed solutions are catching up with Western designs, but some institutions are not prepared to queue up for emerging technologies.

NVIDIA's new H20 AI GPU as well as Ada Lovelace-based L20 PCIe and L2 PCIe models are weakened enough to get a thumbs up from trade regulators, but likely not compelling enough for discerning clients. The Telegraph believes that NVIDIA's uncompromised H100 AI GPU is currently in use at several Chinese establishments—the report cites information presented within four academic papers published on ArXiv, an open access science website. The Telegraph's news piece highlights one of the studies—it was: "co-authored by a researcher at 4paradigm, an AI company that was last year placed on an export control list by the US Commerce Department for attempting to acquire US technology to support China's military." Additionally, the Chinese Academy of Sciences appears to have conducted several AI-accelerated experiments, involving the solving of complex mathematical and logical problems. The article suggests that this research organization has acquired a very small batch of NVIDIA H100 GPUs (up to eight units). A "thriving black market" for high-end NVIDIA processors has emerged in the region—last Autumn, the Center for a New American Security (CNAS) published an in-depth article about ongoing smuggling activities.

AI-Capable PCs Forecast to Make Up 40% of Global PC Shipments in 2025

Canalys' latest forecast predicts that an estimated 48 million AI-capable PCs will ship worldwide in 2024, representing 18% of total PC shipments. But this is just the start of a major market transition, with AI-capable PC shipments projected to surpass 100 million in 2025, 40% of all PC shipments. In 2028, Canalys expects vendors to ship 205 million AI-capable PCs, representing a staggering compound annual growth rate of 44% between 2024 and 2028.

These PCs, integrating dedicated AI accelerators, such as Neural Processing Units (NPUs), will unlock new capabilities for productivity, personalization and power efficiency, disrupting the PC market and delivering significant value gains to vendors and their partners.

Sony PlayStation 5 Pro Details Emerge: Faster CPU, More System Bandwidth, and Better Audio

Sony is preparing to launch its next-generation PlayStation 5 Pro console in the Fall of 2024, right around the holidays. We previously covered a few graphics details about the console. However, today, we get more details about the CPU and the overall system, thanks to the exclusive information from Insider Gaming. Starting off, the sources indicate that PS5 Pro system memory will get a 28% bump in bandwidth, where the standard PS5 console had 448 GB/s, and the upgraded PS5 Pro will get 576 GB/s. Apparently, the memory system is more efficient, likely coming from an upgrade in memory from the GDDR6 SDRAM of the regular PS5. The next upgrade is the CPU, which has special modes for the main processor. The CPU uArch is likely the same, with clocks pushed to 3.85 GHz, resulting in a 10% frequency increase.

However, this is only achieved in the "High CPU Frequency Mode," which steals the SoC's power from the GPU and downclocks it slightly to allocate more power to the CPU in highly CPU-intense settings. The GPU we discussed here is an RDNA 3 IP with up to 45% faster graphics rendering. The ray tracing performance can be up to four times higher than the regular PS5, while the entire GPU delivers 33.5 TeraFLOPS of FP32 single-precision computing. This comes from 30 WGP running BVH8 shaders vs the 18 WGPs running BVH4 shaders on the regular PS5. There are PSSR upscalers present, and the GPU can output 8K resolution, which will come with future software updates. Last but not least, the AI front also has a custom AI accelerator capable of 300 8-bit INT8 TOPS and 67 16-bit FP16 TeraFLOPS. Audio codecs are getting some love, as well, with ACV running up to 35% faster.

Next-Generation NVIDIA DGX Systems Could Launch Soon with Liquid Cooling

During the 2024 SIEPR Economic Summit, NVIDIA CEO Jensen Huang acknowledged that the company's next-generation DGX systems, designed for AI and high-performance computing workloads, will require liquid cooling due to their immense power consumption. Huang also hinted that these new systems are set to be released in the near future. The revelation comes as no surprise, given the increasing power of GPUs needed to satisfy AI and machine learning applications. As computational requirements continue to grow, so does the need for more powerful hardware. However, with great power comes great heat generation, necessitating advanced cooling solutions to maintain optimal performance and system stability. Liquid cooling has long been a staple in high-end computing systems, offering superior thermal management compared to traditional air cooling methods.

By implementing liquid cooling in the upcoming DGX systems, NVIDIA aims to push the boundaries of performance while ensuring the hardware remains reliable and efficient. Although Huang did not provide a specific release date for the new DGX systems, his statement suggests that they are on the horizon. Whether the next generation of DGX systems uses the current NVIDIA H200 or the upcoming Blackwell B100 GPU as their primary accelerator, the performance will undoubtedly be delivered. As the AI and high-performance computing landscape continues to evolve, NVIDIA's position continues to strengthen, and liquid-cooled systems will certainly play a crucial role in shaping the future of these industries.

Marvell Announces Industry's First 2 nm Platform for Accelerated Infrastructure Silicon

Marvell Technology, Inc., a leader in data infrastructure semiconductor solutions, is extending its collaboration with TSMC to develop the industry's first technology platform to produce 2 nm semiconductors optimized for accelerated infrastructure.

Behind the Marvell 2 nm platform is the company's industry-leading IP portfolio that covers the full spectrum of infrastructure requirements, including high-speed long-reach SerDes at speeds beyond 200 Gbps, processor subsystems, encryption engines, system-on-chip fabrics, chip-to-chip interconnects, and a variety of high-bandwidth physical layer interfaces for compute, memory, networking and storage architectures. These technologies will serve as the foundation for producing cloud-optimized custom compute accelerators, Ethernet switches, optical and copper interconnect digital signal processors, and other devices for powering AI clusters, cloud data centers and other accelerated infrastructure.

Intel Gaudi 2 AI Accelerator Powers Through Llama 2 Text Generation

Intel's "AI Everywhere" hype campaign has generated the most noise in mainstream and enterprise segments. Team Blue's Gaudi—a family of deep learning accelerators—does not hit the headlines all that often. Their current generation model, Gaudi 2, is overshadowed by Team Green and Red alternatives—according to Intel's official marketing spiel: "it performs competitively on deep learning training and inference, with up to 2.4x faster performance than NVIDIA A100." Habana, an Intel subsidiary, has been working on optimizing Large Language Model (LLM) inference on Gaudi 1 and 2 for a while—their co-operation with Hugging Face has produced impressive results, as of late February. Siddhant Jagtap, an Intel Data Scientist, has demonstrated: "how easy it is to generate text with the Llama 2 family of models (7b, 13b and 70b) using Optimum Habana and a custom pipeline class."

Jagtap reckons that folks will be able to: "run the models with just a few lines of code" on Gaudi 2 accelerators—additionally, Intel's hardware is capable of accepting single and multiple prompts. The custom pipeline class: "has been designed to offer great flexibility and ease of use. Moreover, it provides a high level of abstraction and performs end-to-end text-generation which involves pre-processing and post-processing." His article/blog outlines various prerequisites and methods of getting Llama 2 text generation up and running on Gaudi 2. Jagtap concluded that Habana/Intel has: "presented a custom text-generation pipeline on Intel Gaudi 2 AI accelerator that accepts single or multiple prompts as input. This pipeline offers great flexibility in terms of model size as well as parameters affecting text-generation quality. Furthermore, it is also very easy to use and to plug into your scripts, and is compatible with LangChain." Hugging Face reckons that Gaudi 2 delivers roughly twice the throughput speed of NVIDIA A100 80 GB in both training and inference scenarios. Intel has teased third generation Gaudi accelerators—industry watchdogs believe that next-gen solutions are designed to compete with Team Green H100 AI GPUs.

Tiny Corp. Builds AI Platform with Six AMD Radeon RX 7900 XTX GPUs

Tiny Corp., a neural network framework specialist, has revealed intimate details about the ongoing development and building of its "tinybox" system: "I don't think there's much value in secrecy. We have the parts to build 12 boxes and a case that's pretty close to final. Beating back all the PCI-E AER errors was hard, as anyone knows who has tried to build a system like this. Our BOM cost is around $10k, and we are selling them for $15k. We've put a year of engineering into this, it's a lot harder than it first seemed. You are welcome to believe me or not, but unless you are building in huge quantity, you are getting a great deal for $15k." The startup has taken the unusual step of integrating Team Red's current flagship gaming GPU into its AI-crunching platform. Tiny Corp. founder—George Hotz—has documented his past rejections of NVIDIA AI hardware on social media, but TinyBox will not be running AMD's latest Instinct MI300X accelerators. RDNA 3.0 is seemingly favored over CDNA 3.0—perhaps due to growing industry demand for enterprise-grade GPUs.

The rack-mounted 12U TinyBox build houses an AMD EPYC 7532 processor with 128 GB of system memory. Five 1 TB SN850X SSDs take care of storage duties (4 in raid, 1 for boot), and an unoccupied 16x OCP 3.0 slot is designated for networking tasks Two 1600 W PSUs provide necessary electrical juice. The Tiny Corp. social media picture feed indicates that they have acquired a pile of XFX Speedster MERC310 RX 7900 XTX graphics cards—six units are hooked up inside of each TinyBox system. Hotz's young startup has ambitious plans: "The system image shipping with the box will be Ubuntu 22.04. It will only include tinygrad out of the box, but PyTorch and JAX support on AMD have come a long way, and your hardware is your hardware. We make money either way, you are welcome to buy it for any purpose. The goal of the tiny corp is to commoditize the petaflop, and we believe tinygrad is the best way to do it. Solving problems in software is cheaper than in hardware. tinygrad will elucidate the deep structure of what neural networks are. We have 583 preorders, and next week we'll place an order for 100 sets of parts. This is $1M in outlay. We will also ship five of the 12 boxes we have to a few early people who I've communicated with. For everyone else, they start shipping in April. The production line started running yesterday."

AMD CTO Teases Memory Upgrades for Revised Instinct MI300-series Accelerators

Brett Simpson, Partner and Co-Founder of Arete Research, sat down with AMD CTO Mark Papermaster during the former's "Investor Webinar Conference." A transcript of the Arete + AMD question and answer session appeared online last week—the documented fireside chat concentrated mostly on "AI compute market" topics. Papermaster was asked about his company's competitive approach when taking on NVIDIA's very popular range of A100 and H100 AI GPUs, as well as the recently launched GH200 chip. The CTO did not reveal any specific pricing strategies—a "big picture" was painted instead: "I think what's important when you just step back is to look at total cost of ownership, not just one GPU, one accelerator, but total cost of ownership. But now when you also look at the macro, if there's not competition in the market, you're going to see not only a growth of the price of these devices due to the added content that they have, but you're -- without a check and balance, you're going to see very, very high margins, more than that could be sustained without a competitive environment."

Papermaster continued: "And what I think is very key with -- as AMD has brought competition market for these most powerful AI training and inference devices is you will see that check and balance. And we have a very innovative approach. We've been a leader in chiplet design. And so we have the right technology for the right purpose of the AI build-out that we do. We have, of course, a GPU accelerator. But there's many other circuitry associated with being able to scale and build out these large clusters, and we're very, very efficient in our design." Team Red started to ship its flagship accelerator, Instinct MI300X, to important customers at the start of 2024—Arete Research's Simpson asked about the possibility of follow-up models. In response, AMD's CTO referenced some recent history: "Well, I think the first thing that I'll highlight is what we did to arrive at this point, where we are a competitive force. We've been investing for years in building up our GPU road map to compete in both HPC and AI. We had a very, very strong harbor train that we've been on, but we had to build our muscle in the software enablement."

NVIDIA Prepared to Offer Custom Chip Designs to AI Clients

NVIDIA is reported to be setting up an AI-focused semi-custom chip design business unit, according to inside sources known to Reuters—it is believed that Team Green leadership is adapting to demands leveraged by key data-center customers. Many companies are seeking cheaper alternatives, or have devised their own designs (budget/war chest permitting)—NVIDIA's current range of AI GPUs are simply off-the-shelf solutions. OpenAI has generated the most industry noise—their alleged early 2024 fund-raising pursuits have attracted plenty of speculative/kind-of-serious interest from notable semiconductor personalities.

Team Green is seemingly reacting to emerging market trends—Jensen Huang (CEO, president and co-founder) has hinted that NVIDIA custom chip designing services are on the cusp. Stephen Nellis—a Reuters reporter specializing in tech industry developments—has highlighted select NVIDIA boss quotes from an incoming interview piece: "We're always open to do that. Usually, the customization, after some discussion, could fall into system reconfigurations or recompositions of systems." The Team Green chief teased that his engineering team is prepared to take on the challenge meeting exact requests: "But if it's not possible to do that, we're more than happy to do a custom chip. And the benefit to the customer, as you can imagine, is really quite terrific. It allows them to extend our architecture with their know-how and their proprietary information." The rumored NVIDIA semi-custom chip design business unit could be introduced in an official capacity at next month's GTC 2024 Conference.

NVIDIA Expects Upcoming Blackwell GPU Generation to be Capacity-Constrained

NVIDIA is anticipating supply issues for its upcoming Blackwell GPUs, which are expected to significantly improve artificial intelligence compute performance. "We expect our next-generation products to be supply constrained as demand far exceeds supply," said Colette Kress, NVIDIA's chief financial officer, during a recent earnings call. This prediction of scarcity comes just days after an analyst noted much shorter lead times for NVIDIA's current flagship Hopper-based H100 GPUs tailored to AI and high-performance computing. The eagerly anticipated Blackwell architecture and B100 GPUs built on it promise major leaps in capability—likely spurring NVIDIA's existing customers to place pre-orders already. With skyrocketing demand in the red-hot AI compute market, NVIDIA appears poised to capitalize on the insatiable appetite for ever-greater processing power.

However, the scarcity of NVIDIA's products may present an excellent opportunity for significant rivals like AMD and Intel. If both companies can offer a product that could beat NVIDIA's current H100 and provide a suitable software stack, customers would be willing to jump to their offerings and not wait many months for the anticipated high lead times. Intel is preparing the next-generation Gaudi 3 and working on the Falcon Shores accelerator for AI and HPC. AMD is shipping its Instinct MI300 accelerator, a highly competitive product, while already working on the MI400 generation. It remains to be seen if AI companies will begin the adoption of non-NVIDIA hardware or if they will remain a loyal customer and agree to the higher lead times of the new Blackwell generation. However, capacity constrain should only be a problem at launch, where the availability should improve from quarter to quarter. As TSMC improves CoWoS packaging capacity and 3 nm production, NVIDIA's allocation of the 3 nm wafers will likely improve over time as the company moves its priority from H100 to B100.

Acer Launches Swift Series Laptops Powered by AMD Ryzen 8040 Series

Acer today announced new models of the Acer Swift Edge 16 and Acer Swift Go 14 laptops, blending AI power and innovative features in stylish thin and light devices. The latest additions to the Swift lineup feature AMD Ryzen 8040 Series processors with up to AMD Radeon 780M Graphics and equipped with Ryzen AI for versatile performance and support for Acer's AI-powered capabilities such as Acer PurifiedVoice, Acer PurfiedView, and the new Acer LiveArt photo-editing feature. Intuitive control and seamless navigation on the AI PCs are made possible thanks to the AcerSense application and Copilot in Windows with instant access through dedicated keys. Users can also appreciate clear images and rich colors when working or streaming through the OLED laptops' displays, as well as Microsoft Pluton technology, enabled by default, to help secure devices, personal data, and encryption keys.

Harnessing the Power of AI with AMD Ryzen 8040 Series Processors
Designed to deliver premium AI experiences and reliable performance for everyday productivity, the Swift Edge 16 and Swift Go 14 laptops are powered by AMD Ryzen 8040 Series processors with Ryzen AI technology built in. AMD's latest processors enable the efficient distribution of AI workloads between accelerators on the NPU, GPU, and CPU, to advance user experiences with AI technology on the devices. It leverages AMD's "Zen 4" processor architecture with up to eight cores and delivers up to 16 threads of processing power, so creative professionals and mainstream users can expect fast, power-efficient computing and longer battery life on the ultrathin Swift laptops.

NVIDIA Accelerates Quantum Computing Exploration at Australia's Pawsey Supercomputing Centre

NVIDIA today announced that Australia's Pawsey Supercomputing Research Centre will add the NVIDIA CUDA Quantum platform accelerated by NVIDIA Grace Hopper Superchips to its National Supercomputing and Quantum Computing Innovation Hub, furthering its work driving breakthroughs in quantum computing.

Researchers at the Perth-based center will leverage CUDA Quantum - an open-source hybrid quantum computing platform that features powerful simulation tools, and capabilities to program hybrid CPU, GPU and QPU systems - as well as, the NVIDIA cuQuantum software development kit of optimized libraries and tools for accelerating quantum computing workflows. The NVIDIA Grace Hopper Superchip - which combines the NVIDIA Grace CPU and Hopper GPU architectures - provides extreme performance to run high-fidelity and scalable quantum simulations on accelerators and seamlessly interface with future quantum hardware infrastructure.
Return to Keyword Browsing
Nov 21st, 2024 08:07 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts