News Posts matching #accelerator

Return to Keyword Browsing

Financial Analyst Outs AMD Instinct MI300X "Projected" Pricing

AMD's December 2023 launch of new Instinct series accelerators has generated a lot of tech news buzz and excitement within the financial world, but not many folks are privy to Team Red's MSRP for the CDNA 3.0 powered MI300X and MI300A models. A Citi report has pulled back the curtain, albeit with "projected" figures—an inside source claims that Microsoft has purchased the Instinct MI300X 192 GB model for ~$10,000 a piece. North American enterprise customers appear to have taken delivery of the latest MI300 products around mid-January time—inevitably, top secret information has leaked out to news investigators. SeekingAlpha's article (based on Citi's findings) alleges that the Microsoft data center division is AMD's top buyer of MI300X hardware—GPT-4 is reportedly up and running on these brand new accelerators.

The leakers claim that businesses further down the (AI and HPC) food chain are having to shell out $15,000 per MI300X unit, but this is a bargain when compared to NVIDIA's closest competing package—the venerable H100 SXM5 80 GB professional card. Team Green, similarly, does not reveal its enterprise pricing to the wider public—Tom's Hardware has kept tabs on H100 insider info and market leaks: "over the recent quarters, we have seen NVIDIA's H100 80 GB HBM2E add-in-card available for $30,000, $40,000, and even much more at eBay. Meanwhile, the more powerful H100 80 GB SXM with 80 GB of HBM3 memory tends to cost more than an H100 80 GB AIB." Citi's projection has Team Green charging up to four times more for its H100 product, when compared to Team Red MI300X pricing. NVIDIA's dominant AI GPU market position could be challenged by cheaper yet still very performant alternatives—additionally chip shortages have caused Jensen & Co. to step outside their comfort zone. Tom's Hardware reached out to AMD for comment on the Citi pricing claims—a company representative declined this invitation.

AMD Instinct MI300X Released at Opportune Moment. NVIDIA AI GPUs in Short Supply

LaminiAI appeared to be one of the first customers to receive an initial shipment of AMD's Instinct MI300X accelerators, as disclosed by their CEO posting about functioning hardware on social media late last week. A recent Taiwan Economic Daily article states that the "MI300X is rumored to have begun supply"—we are not sure about why they have adopted a semi-secretive tone in their news piece, but a couple of anonymous sources are cited. A person familiar with supply chains in Taiwan divulged that: "(they have) been receiving AMD MI300X chips one after another...due to the huge shortage of NVIDIA AI chips, the arrival of new AMD products is really a timely rainfall." Favorable industry analysis (from earlier this month) has placed Team Red in a position of strength, due to growing interest in their very performant flagship AI accelerator.

The secrecy seems to lie in Team Red's negotiation strategies in Taiwan—the news piece alleges that big manufacturers in the region have been courted. AMD has been aggressive in a push to: "cooperate and seize AI business opportunities, with GIGABYTE taking the lead and attracting the most attention. Not only was GIGABYTE the first to obtain a partnership with AMD's MI300A chip, which had previously been mass-produced, but GIGABYTE was also one of the few Taiwanese manufacturers included in AMD's first batch of MI300X partners." GIGABYTE is expected to release two new "G593" product lines of server hardware later this year, based on combinations of AMD's Instinct MI300X accelerator and EPYC 9004 series processors.

Google Faces Potential Billion-Dollar Damages in TPU Patent Dispute

Tech giant Google is embroiled in a high-stakes legal battle over the alleged infringement of patents related to its Tensor Processing Units (TPUs), custom AI accelerator chips used to power machine learning applications. Massachusetts-based startup Singular Computing has accused Google of incorporating architectures described in several of its patents into the design of the TPU without permission. The disputed patents, first filed in 2009, outline computer architectures optimized for executing a high volume of low-precision calculations per cycle - an approach well-suited for neural network-based AI. In a 2019 lawsuit, Singular argues that Google knowingly infringed on these patents in developing its TPU v2 and TPU v3 chips introduced in 2017 and 2018. Singular Computing is seeking between $1.6 billion and $5.19 billion in damages from Google.

Google denies these claims, stating that its TPUs were independently developed over many years. The company is currently appealing to have Singular's patents invalidated, which would undermine the infringement allegations. The high-profile case highlights mounting legal tensions as tech giants race to dominate the burgeoning field of AI hardware. With billions in potential damages at stake, the outcome could have major implications for the competitive landscape in cloud-based machine learning services. As both sides prepare for court, the dispute underscores the massive investments tech leaders like Google make to integrate specialized AI accelerators into their cloud infrastructures. Dominance in this sphere is a crucial strategic advantage as more industries embrace data-hungry neural network applications.

Update 17:25 UTC: According to Reuters, Google and Singular Computing have settled the case with details remaining private for the time being.

HBM Industry Revenue Could Double by 2025 - Growth Driven by Next-gen AI GPUs Cited

Samsung, SK hynix, and Micron are considered to be the top manufacturing sources of High Bandwidth Memory (HBM)—the HBM3 and HBM3E standards are becoming increasingly in demand, due to a widespread deployment of GPUs and accelerators by generative AI companies. Taiwan's Commercial Times proposes that there is an ongoing shortage of HBM components—but this presents a growth opportunity for smaller manufacturers in the region. Naturally, the big name producers are expected to dive in head first with the development of next generation models. The aforementioned financial news article cites research conducted by the Gartner group—they predict that the HBM market will hit an all-time high of $4.976 billion (USD) by 2025.

This estimate is almost double that of projected revenues (just over $2 billion) generated by the HBM market in 2023—the explosive growth of generative AI applications has "boosted" demand for the most performant memory standards. The Commercial Times report states that SK Hynix is the current HBM3E leader, with Micron and Samsung trailing behind—industry experts believe that stragglers will need to "expand HBM production capacity" in order to stay competitive. SK Hynix has shacked up with NVIDIA—the GH200 Grace Hopper platform was unveiled last summer; outfitted with the South Korean firm's HBM3e parts. In a similar timeframe, Samsung was named as AMD's preferred supplier of HBM3 packages—as featured within the recently launched Instinct MI300X accelerator. NVIDIA's HBM3E deal with SK Hynix is believed to extend to the internal makeup of Blackwell GB100 data-center GPUs. The HBM4 memory standard is expected to be the next major battleground for the industry's hardest hitters.

OpenAI CEO Reportedly Seeking Funds for Purpose-built Chip Foundries

OpenAI CEO, Sam Altman, had a turbulent winter 2023 career moment, but appears to be going all in with his company's future interests. A Bloomberg report suggests that the tech visionary has initiated a major fundraising initiative for the construction of OpenAI-specific semiconductor production plants. The AI evangelist reckons that his industry will become prevalent enough to demand a dedicated network of manufacturing facilities—the U.S. based artificial intelligence (AI) research organization is (reportedly) exploring custom artificial intelligence chip designs. Proprietary AI-focused GPUs and accelerators are not novelties at this stage in time—many top tech companies rely on NVIDIA solutions, but are keen to deploy custom-built hardware in the near future.

OpenAI's popular ChatGPT system is reliant on NVIDIA H100 and A100 GPUs, but tailor-made alternatives seem to be the desired route for Altman & Co. The "on their own terms" pathway seemingly skips an expected/traditional chip manufacturing process—the big foundries could struggle to keep up with demand for AI-oriented silicon. G42 (an Abu Dhabi-based AI development holding company) and SoftBank Group are mentioned as prime investment partners in OpenAI's fledgling scheme—Bloomberg proposes that Altman's team is negotiating a $8 to 10 billion deal with top brass at G42. OpenAI's planned creation of its own foundry network is certainly a lofty and costly goal—the report does not specify whether existing facilities will be purchased and overhauled, or new plants being constructed entirely from scratch.

AMD Ryzen 7 8840U APU Benched in GPD Win Max 2 Handheld

GPD has disclosed to ITHome that a specification refresh of its Win Max 2 handheld/mini-laptop gaming PC is incoming—this model debuted last year with Ryzen 7040 "Phoenix" APUs sitting in the driver's seat. A company representative provided a sneak peek of an upgraded device that sports a Team Red Ryzen 8040 series "Hawk Point" mobile processor, and a larger pool of system memory (32 GB versus the 2023 model's 16 GB). The refreshed GPD Win Max 2's Ryzen 7 8840U APU was compared to the predecessor's Ryzen 7 7840U in CPU-Z benchmarks (standard and AX-512)—the results demonstrate a very slight difference in performance between generations.

The 8040 and 7040 APUs share the same "Phoenix" basic CPU design (8-cores + 16-threads) based on the prevalent "Zen 4" microarchitecture, plus an integration of AMD's Radeon 780M GPU. The former's main upgrade lies in its AI-crunching capabilities—a deployment of Team Red's XDNA AI engine. Ryzen 8040's: "NPU performance has been increased to 16 TOPS, compared to 10 TOPS of the NPU on the 'Phoenix' silicon. AMD is taking a whole-of-silicon approach to AI acceleration, which includes not just the NPU, but also the 'Zen 4' CPU cores that support the AVX-512 VNNI instruction set that's relevant to AI; and the iGPU based on the RDNA 3 graphics architecture, with each of its compute unit featuring two AI accelerators, components that make the SIMD cores crunch matrix math. The whole-of-silicon performance figures for "Phoenix" is 33 TOPS; while 'Hawk Point' boasts of 39 TOPS. In benchmarks by AMD, 'Hawk Point' is shown delivering a 40% improvement in vision models, and Llama 2, over the Ryzen 7040 "Phoenix" series."

DEEPX's DX-M1 Chip Recognized at CES 2024 as Leading AI of Things Solution

DEEPX (CEO, Lokwon Kim), an original AI semiconductor technology company, is announcing that it has surpassed 40 customers for its flagship chip solution, DX-M1—the only AI accelerator on the market to combine low power consumption, high efficiency and performance, and cost-effectiveness. The groundbreaking solution has been deployed for a hands-on trial to this customer pool, which spans global companies and domestic Korean enterprises across various sectors.

DEEPX is currently running an Early Engagement Customer Program (EECP) to provide customers with early access to its small camera module, a one-chip solution featuring DX-V1; M.2 module featuring DX-M1; and DXNN, the company's developer environment. This allows customers to receive pre-production validation of DEEPX's hardware and software, integrate them into mass-produced products, and realize AI technology innovations with the brand's technical support.

Neuchips to Showcase Industry-Leading Gen AI Inferencing Accelerators at CES 2024

Neuchips, a leading AI Application-Specific Integrated Circuits (ASIC) solutions provider, will demo its revolutionary Raptor Gen AI accelerator chip (previously named N3000) and Evo PCIe accelerator card LLM solutions at CES 2024. Raptor, the new chip solution, enables enterprises to deploy large language models (LLMs) inference at a fraction of the cost of existing solutions.

"We are thrilled to unveil our Raptor chip and Evo card to the industry at CES 2024," said Ken Lau, CEO of Neuchips. "Neuchips' solutions represent a massive leap in price to performance for natural language processing. With Neuchips, any organisation can now access the power of LLMs for a wide range of AI applications."

MemryX Demos Production Ready AI Accelerator (MX3) During 2024 CES Show

MemryX Inc. is announcing the availability of production level silicon of its cutting-edge AI Accelerator (MX3). MemryX is a pioneering startup specializing in accelerating artificial intelligence (AI) processing for edge devices. In less than 30 days after receiving production silicon from TSMC, MemryX will publicly showcase the ability to efficiently run hundreds of unaltered AI models at the 2024 Consumer Electronics Show (CES) in Las Vegas from Jan 9 through Jan 12.

Intel Preparing Habana "Gaudi2C" SKU for the Chinese AI Market

Intel's software team has added support in its open-source Linux drivers for an unannounced Habana "Gaudi2C" AI accelerator variant. Little is documented about the mystery Gaudi2C, which shares a core identity with Intel's flagship Gaudi2 data center training and inference chip, otherwise broadly available. The new revision is distinguished only by a PCI ID of "3" in the latest patch set for Linux 6.8. Speculations circulate that Gaudi2C may be a version tailored to meet China-specific demands, similar to Intel's Gaudi2 HL-225B SKU launched in July with reduced interconnect links. With US export bans restricting sales of advanced hardware to China, including Intel's leading Gaudi2 products, creating reduced-capability spinoffs that meet export regulations lets Intel maintain crucial Chinese revenue.

Meanwhile, Intel's upstream Linux contributions remain focused on hardening Gaudi/Gaudi2 support, now considered "very stable" by lead driver developer Oded Gabbay. Minor new additions reflect maturity, not instability. The open-sourced foundations contrast NVIDIA's proprietary driver model, a key Intel competitive argument for service developers using Habana Labs hardware. With the SynapseAI software suite reaching stability, some enterprises could consider Gaudi accelerators as an alternative to NVIDIA. And with Gaudi3 arriving next year, the ecosystem will get a better competitive advantage with increased performance targets.

Moore Threads Launches MTT S4000 48 GB GPU for AI Training/Inference and Presents 1000-GPU Cluster

Chinese chipmaker Moore Threads has launched its first domestically-produced 1000-card AI training cluster, dubbed the KUAE Intelligent Computing Center. A central part of the KUAE cluster is Moore Threads new MTT S4000 accelerator card with 48 GB VRAM utilizing the company's third-generation MUSA GPU architecture and 768 GB/s memory bandwidth. In FP32, the card can output 25 TeraFLOPS; in TF32, it can achieve 50 TeraFLOPS; and in FP16/BF16, up to 200 TeraFLOPS. Also supported is INT8 at 200 TOPS. The MTT S4000 focuses on both training and inference, leveraging Moore Thread's high-speed MTLink 1.0 intra-system interconnect to scale cards for distributed model parallel training of datasets with hundreds of billions of parameters. The card also provides graphics, video encoding/decoding, and 8K display capabilities for graphics workloads. Moore Thread's KUAE cluster combines the S4000 GPU hardware with RDMA networking, distributed storage, and integrated cluster management software. The KUAE Platform oversees multi-datacenter resource allocation and monitoring. KUAE ModelStudio hosts training frameworks and model repositories to streamline development.

With integrated solutions now proven at thousands of GPUs, Moore Thread is positioned to power ubiquitous intelligent applications - from scientific computing to the metaverse. The KUAE cluster reportedly achieves near-linear 91% scaling. Taking 200 billion training data as an example, Zhiyuan Research Institute's 70 billion parameter Aquila2 can complete training in 33 days; a model with 130 billion parameters can complete training in 56 days on the KUAE cluster. In addition, the Moore Threads KUAE killocard cluster supports long-term continuous and stable operation, supports breakpoint resume training, and has an asynchronous checkpoint that is less than 2 minutes. For software, Moore Threads also boasts full compatibility with NVIDIA's CUDA framework, where its MUSIFY tool translates CUDA code to MUSA GPU architecture at supposedly zero cost of migration, i.e., no performance penalty.

Intel's New 5th Gen "Emerald Rapids" Xeon Processors are Built with AI Acceleration in Every Core

Today at the "AI Everywhere" event, Intel launched its 5th Gen Intel Xeon processors (code-named Emerald Rapids) that deliver increased performance per watt and lower total cost of ownership (TCO) across critical workloads for artificial intelligence, high performance computing (HPC), networking, storage, database and security. This launch marks the second Xeon family upgrade in less than a year, offering customers more compute and faster memory at the same power envelope as the previous generation. The processors are software- and platform-compatible with 4th Gen Intel Xeon processors, allowing customers to upgrade and maximize the longevity of infrastructure investments while reducing costs and carbon emissions.

"Designed for AI, our 5th Gen Intel Xeon processors provide greater performance to customers deploying AI capabilities across cloud, network and edge use cases. As a result of our long-standing work with customers, partners and the developer ecosystem, we're launching 5th Gen Intel Xeon on a proven foundation that will enable rapid adoption and scale at lower TCO." -Sandra Rivera, Intel executive vice president and general manager of Data Center and AI Group.

Two New Marvell OCTEON 10 Processors Bring Server-Class Performance to Networking Devices

Marvell Technology, a leader in data infrastructure semiconductor solutions, is enabling networking equipment and firewall manufacturers achieve breakthrough levels of performance and efficiency with two new OCTEON 10 data processing units (DPUs), the OCTEON 10 CN102 and OCTEON 10 CN103. The 5 nm OCTEON CN102 and CN103, broadly available to OEMs for product design and pilot production, are optimized for data and control plane applications in routers, firewalls, 5G small cells, SD-WAN appliances, and control plane applications in top-of-rack switches and line card controllers. Several of the world's largest networking equipment manufacturers have already incorporated the OCTEON 10 CN102 into a number of product designs.

Containing up to eight Arm Neoverse N2 cores, OCTEON 10 CN102 and CN103 deliver 3x the performance of Marvell current DPU solutions for devices while reducing power consumption by 50% to 25 W. Achieving SPEC CPU (2017) integer rate (SPECint) scores of 36.5, OCTEON 10 CN102 and CN103 are able to deliver nearly 1.5 SPECint points per Watt. The chips can serve as an offload DPU for host processors or as the primary processor in devices; advanced performance per watt also enables OEMs to design fanless systems to simplify systems and further reduce cost, maintenance and power consumption.

AMD Delivers Leadership Portfolio of Data Center AI Solutions with AMD Instinct MI300 Series

Today, AMD announced the availability of the AMD Instinct MI300X accelerators - with industry leading memory bandwidth for generative AI and leadership performance for large language model (LLM) training and inferencing - as well as the AMD Instinct MI300A accelerated processing unit (APU) - combining the latest AMD CDNA 3 architecture and "Zen 4" CPUs to deliver breakthrough performance for HPC and AI workloads.

"AMD Instinct MI300 Series accelerators are designed with our most advanced technologies, delivering leadership performance, and will be in large scale cloud and enterprise deployments," said Victor Peng, president, AMD. "By leveraging our leadership hardware, software and open ecosystem approach, cloud providers, OEMs and ODMs are bringing to market technologies that empower enterprises to adopt and deploy AI-powered solutions."

Intel "Emerald Rapids" Die Configuration Leaks, More Details Appear

Thanks to the leaked slides obtained by @InstLatX64, we have more details and some performance estimates about Intel's upcoming 5th Generation Xeon "Emerald Rapids" CPUs, boasting a significant performance leap over its predecessors. Leading the Emerald Rapids family is the top-end SKU, the Xeon 8592+, which features 64 cores and 128 threads, backed by a massive 480 MB L3 cache pool. The upcoming lineup shifts from a 4-tile to a 2-tile design to minimize latency and improve performance. The design utilizes the P-Core architecture under the Raptor Cove ISA and promises up to 40% faster performance than the current 4th Generation "Sapphire Rapids" CPUs in AI applications utilizing Intel AMX engine. Each chiplet has 35 cores, three of which are disabled, and each tile has two DDR5-5600 MT/s memory controllers, which operate two memory channels each and translating that into eight-channel design. There are three PCIe controllers per die, making it six in total.

Newer protocols and AI accelerators also back the upcoming lineup. Now, the Emerald Rapids family supports the Compute Express Link (CXL) Types 1/2/3 in addition to up to 80 PCIe Gen 5 lanes and enhanced Intel Ultra Path Interconnect (UPI). There are four UPI controllers spread over two dies. Moreover, features like the four on-die Intel Accelerator Engines, optimized power mode, and up to 17% improvement in general-purpose workloads make it seem like a big step up from the current generation. Much of this technology is found on the existing Sapphire Rapids SKUs, with the new generation enhancing the AI processing capability further. You can see the die configuration below. The 5th Generation Emerald Rapids designs are supposed to be official on December 14th, just a few days away.

Dell Allegedly Prohibits Sales of High-End Radeon and Instinct MI GPUs in China

AMD's lineup of Radeon and Instinct GPUs, including the flagship RX 7900 XTX/XT, the professional-grade PRO W7900, and the upcoming Instinct MI300, are facing sales prohibitions in China, according to an alleged sales advisory guide from Dell. This restriction mirrors the earlier ban on NVIDIA's RTX 4090, underscoring the increasing export limitations U.S.-based companies face for high-end semiconductor products that could be repurposed for military and strategic applications. Notably, Dell's report lists several AMD Instinct accelerators, which are integral to data center infrastructure, and Radeon GPUs, which are widely used in PCs, indicating the broad impact of the advisory.

The ban includes discrete GPUs like AMD's Radeon RX 7900 XTX and 7900 XT, which, despite their data-center potential, may still be sold under specific "NEC" eligibility. This status allows for continued sales in restricted regions like sales of NVIDIA's RTX 4090. However, the process to secure NEC eligibility is lengthy, potentially leading to supply shortages and increased GPU prices—a trend already observed with the RX 7900 XTX in China, where it's become a high-end alternative in light of the RTX 4090's scarcity and inflated pricing. The Dell sales advisory also lists that sales of the aforementioned products are banned in 22 countries, including Russia, Iran, Iraq, and others listed below.

Microsoft Introduces 128-Core Arm CPU for Cloud and Custom AI Accelerator

During its Ignite conference, Microsoft introduced a duo of custom-designed silicon made to accelerate AI and excel in cloud workloads. First of the two is Microsoft's Azure Cobalt 100 CPU, a 128-core design that features a 64-bit Armv9 instruction set, implemented in a cloud-native design that is set to become a part of Microsoft's offerings. While there aren't many details regarding the configuration, the company claims that the performance target is up to 40% when compared to the current generation of Arm servers running on Azure cloud. The SoC has used Arm's Neoverse CSS platform customized for Microsoft, with presumably Arm Neoverse N2 cores.

The next and hottest topic in the server space is AI acceleration, which is needed for running today's large language models. Microsoft hosts OpenAI's ChatGPT, Microsoft's Copilot, and many other AI services. To help make them run as fast as possible, Microsoft's project Athena now has the name of Maia 100 AI accelerator, which is manufactured on TSMC's 5 nm process. It features 105 billion transistors and supports various MX data formats, even those smaller than 8-bit bit, for maximum performance. Currently tested on GPT 3.5 Turbo, we have yet to see performance figures and comparisons with competing hardware from NVIDIA, like H100/H200 and AMD, with MI300X. The Maia 100 has an aggregate bandwidth of 4.8 Terabits per accelerator, which uses a custom Ethernet-based networking protocol for scaling. These chips are expected to appear in Microsoft data centers early next year, and we hope to get some performance numbers soon.

Qualcomm Snapdragon Elite X SoC for Laptop Leaks: 12 Cores, LPDDR5X Memory, and WiFi7

Thanks to the information from Windows Report, we have received numerous details regarding Qualcomm's upcoming Snapdragon Elite X chip for laptops. The Snapdragon Elite X SoC is built on top of Nuvia-derived Oryon cores, which Qualcomm put 12 off in the SoC. While we don't know their base frequencies, the all-core boost reaches 3.8 GHz. The SoC can reach up to 4.3 GHz on single and dual-core boosting. However, the slide notes that this is all pure "big" core configuration of the SoC, so no big.LITTLE design is done. The GPU part of Snapdragon Elite X is still based on Qualcomm's Adreno IP; however, the performance figures are up significantly to reach 4.6 TeraFLOPS of supposedly FP32 single-precision power. Accompanying the CPU and GPU, there are dedicated AI and image processing accelerators, like Hexagon Neural Processing Unit (NPU), which can process 45 trillion operations per second (TOPS). For the camera, the Spectra Image Sensor Processor (ISP) is there to support up to 4K HDR video capture on a dual 36 MP or a single 64 MP camera setup.

The SoC supports LPDDR5X memory running at 8533 MT/s and a maximum capacity of 64 GB. Apparently, the memory controller is an 8-channel one with a 16-bit width and a maximum bandwidth of 136 GB/s. Snapdragon Elite X has PCIe 4.0 and supports UFS 4.0 for outside connection. All of this is packed on a die manufactured by TSMC on a 4 nm node. In addition to marketing excellent performance compared to x86 solutions, Qualcomm also advertises the SoC as power efficient. The slide notes that it uses 1/3 of the power at the same peak PC performance of x86 offerings. It is also interesting to note that the package will support WiFi7 and Bluetooth 5.4. Officially coming in 2024, the Snapdragon Elite X will have to compete with Intel's Meteor Lake and/or Arrow Lake, in addition to AMD Strix Point.

Intel Launches Industry's First AI PC Acceleration Program

Building on the AI PC use cases shared at Innovation 2023, Intel today launched the AI PC Acceleration Program, a global innovation initiative designed to accelerate the pace of AI development across the PC industry.

The program aims to connect independent hardware vendors (IHVs) and independent software vendors (ISVs) with Intel resources that include AI toolchains, co-engineering, hardware, design resources, technical expertise and co-marketing opportunities. These resources will help the ecosystem take full advantage of Intel Core Ultra processor technologies and corresponding hardware to maximize AI and machine learning (ML) application performance, accelerate new use cases and connect the wider PC industry to the solutions emerging in the AI PC ecosystem. More information is available on the AI PC Acceleration Program website.

Fujitsu Details Monaka: 150-core Armv9 CPU for AI and Data Center

Ever since the creation of A64FX for the Fugaku supercomputer, Fujitsu has been plotting the development of next-generation CPU design for accelerating AI and general-purpose HPC workloads in the data center. Codenamed Monaka, the CPU is the latest creation for TSMC's 2 nm semiconductor manufacturing node. Based on Armv9-A ISA, the CPU will feature up to 150 cores with Scalable Vector Extensions 2 (SVE2), so it can process a wide variety of vector data sets in parallel. Using a 3D chiplet design, the 150 cores will be split into different dies and placed alongside SRAM and I/O controller. The current width of the SVE2 implementation is unknown.

The CPU is designed to support DDR5 memory and PCIe 6.0 connection for attaching storage and other accelerators. To bring cache coherency among application-specific accelerators, CXL 3.0 is present as well. Interestingly, Monaka is planned to arrive in FY2027, which starts in 2026 on January 1st. The CPU will supposedly use air cooling, meaning the design aims for power efficiency. Additionally, it is essential to note that Monaka is not a processor that will power the post-Fugaku supercomputer. The post-Fugaku supercomputer will use post-Monaka design, likely iterating on the design principles that Monaka uses and refining them for the launch of the post-Fugaku supercomputer scheduled for 2030. Below are the slides from Fujitsu's presentation, in Japenese, which highlight the design goals of the CPU.

AMD to Acquire Open-Source AI Software Expert Nod.ai

AMD today announced the signing of a definitive agreement to acquire Nod.ai to expand the company's open AI software capabilities. The addition of Nod.ai will bring an experienced team that has developed an industry-leading software technology that accelerates the deployment of AI solutions optimized for AMD Instinct data center accelerators, Ryzen AI processors, EPYC processors, Versal SoCs and Radeon GPUs to AMD. The agreement strongly aligns with the AMD AI growth strategy centered on an open software ecosystem that lowers the barriers of entry for customers through developer tools, libraries and models.

"The acquisition of Nod.ai is expected to significantly enhance our ability to provide AI customers with open software that allows them to easily deploy highly performant AI models tuned for AMD hardware," said Vamsi Boppana, senior vice president, Artificial Intelligence Group at AMD. "The addition of the talented Nod.ai team accelerates our ability to advance open-source compiler technology and enable portable, high-performance AI solutions across the AMD product portfolio. Nod.ai's technologies are already widely deployed in the cloud, at the edge and across a broad range of end point devices today."

Microsoft to Unveil Custom AI Chips to Fight NVIDIA's Monopoly

According to sources close to The Information, Microsoft is supposed to unveil details about its upcoming custom silicon design for accelerating AI workloads. Allegedly, the incoming chip announcement is scheduled for November during Microsoft's annual Ignite conference. Held in Seattle from November 14 to 17, the conference is supposed to show all of the work that the company has been doing in the field of AI. The alleged launch of an AI chip will undoubtedly take center stage in the announcement, as the demand for AI accelerators has been so great that companies can't get their hands on GPUs. The sector is mainly dominated by NVIDIA, with its H100 and A100 GPUs powering most of the AI infrastructure worldwide.

With the launch of a custom AI chip codenamed Athena, Microsoft hopes to match or beat the performance of NVIDIA's offerings and reduce the cost of AI infrastructure. As the price of H100 GPU can get up to 30,000 US Dollars, building a data center filled with H100s can cost hundreds of millions. The cost could be winded down using homemade chips, and Microsoft could be less dependent on NVIDIA to provide the backbone of AI servers needed in the coming years. Nevertheless, we are excited to see what the company has prepared, and we will report on the Microsoft Ignite announcement in November.

Dell Technologies Expands Generative AI Portfolio

Dell Technologies expands its Dell Generative AI Solutions portfolio, helping businesses transform how they work along every step of their generative AI (GenAI) journeys. "To maximize AI efforts and support workloads across public clouds, on-premises environments and at the edge, companies need a robust data foundation with the right infrastructure, software and services," said Jeff Boudreau, chief AI officer, Dell Technologies. "That's what we are building with our expanded validated designs, professional services, modern data lakehouse and the world's broadest GenAI solutions portfolio."

Customizing GenAI models to maximize proprietary data
The Dell Validated Design for Generative AI with NVIDIA for Model Customization offers pre-trained models that extract intelligence from data without building models from scratch. This solution provides best practices for customizing and fine-tuning GenAI models based on desired outcomes while helping keep information secure and on-premises. With a scalable blueprint for customization, organizations now have multiple ways to tailor GenAI models to accomplish specific tasks with their proprietary data. Its modular and flexible design supports a wide range of computational requirements and use cases, spanning training diffusion, transfer learning and prompt tuning.

AMD Unveils Alveo UL3524 Purpose-Built, FPGA-Based Accelerator

AMD today announced the AMD Alveo UL3524 accelerator card, a new fintech accelerator designed for ultra-low latency electronic trading applications. Already deployed by leading trading firms and enabling multiple solution partner offerings, the Alveo UL3524 provides proprietary traders, market makers, hedge funds, brokerages, and exchanges with a state-of-the-art FPGA platform for electronic trading at nanosecond (ns) speed.

The Alveo UL3524 delivers a 7X latency improvement over prior generation FPGA technology, achieving less than 3ns FPGA transceiver latency for accelerated trade execution. Powered by a custom 16 nm Virtex UltraScale + FPGA, it features a novel transceiver architecture with hardened, optimized network connectivity cores to achieve breakthrough performance. By combining hardware flexibility with ultra-low latency networking on a production platform, the Alveo UL3524 enables faster design closure and deployment compared to traditional FPGA alternatives.
Return to Keyword Browsing
Nov 21st, 2024 08:20 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts