News Posts matching #training

Return to Keyword Browsing

NVIDIA B200 "Blackwell" Records 2.2x Performance Improvement Over its "Hopper" Predecessor

We know that NVIDIA's latest "Blackwell" GPUs are fast, but how much faster are they over the previous generation "Hopper"? Thanks to the latest MLPerf Training v4.1 results, NVIDIA's HGX B200 Blackwell platform has demonstrated massive performance gains, measuring up to 2.2x improvement per GPU compared to its HGX H200 Hopper. The latest results, verified by MLCommons, reveal impressive achievements in large language model (LLM) training. The Blackwell architecture, featuring HBM3e high-bandwidth memory and fifth-generation NVLink interconnect technology, achieved double the performance per GPU for GPT-3 pre-training and a 2.2x boost for Llama 2 70B fine-tuning compared to the previous Hopper generation. Each benchmark system incorporated eight Blackwell GPUs operating at a 1,000 W TDP, connected via NVLink Switch for scale-up.

The network infrastructure utilized NVIDIA ConnectX-7 SuperNICs and Quantum-2 InfiniBand switches, enabling high-speed node-to-node communication for distributed training workloads. While previous Hopper-based systems required 256 GPUs to optimize performance for the GPT-3 175B benchmark, Blackwell accomplished the same task with just 64 GPUs, leveraging its larger HBM3e memory capacity and bandwidth. One thing to look out for is the upcoming GB200 NVL72 system, which promises even more significant gains past the 2.2x. It features expanded NVLink domains, higher memory bandwidth, and tight integration with NVIDIA Grace CPUs, complemented by ConnectX-8 SuperNIC and Quantum-X800 switch technologies. With faster switching and better data movement with Grace-Blackwell integration, we could see even more software optimization from NVIDIA to push the performance envelope.

Meta Shows Open-Architecture NVIDIA "Blackwell" GB200 System for Data Center

During the Open Compute Project (OCP) Summit 2024, Meta, one of the prime members of the OCP project, showed its NVIDIA "Blackwell" GB200 systems for its massive data centers. We previously covered Microsoft's Azure server rack with GB200 GPUs featuring one-third of the rack space for computing and two-thirds for cooling. A few days later, Google showed off its smaller GB200 system, and today, Meta is showing off its GB200 system—the smallest of the bunch. To train a dense transformer large language model with 405B parameters and a context window of up to 128k tokens, like the Llama 3.1 405B, Meta must redesign its data center infrastructure to run a distributed training job on two 24,000 GPU clusters. That is 48,000 GPUs used for training a single AI model.

Called "Catalina," it is built on the NVIDIA Blackwell platform, emphasizing modularity and adaptability while incorporating the latest NVIDIA GB200 Grace Blackwell Superchip. To address the escalating power requirements of GPUs, Catalina introduces the Orv3, a high-power rack capable of delivering up to 140kW. The comprehensive liquid-cooled setup encompasses a power shelf supporting various components, including a compute tray, switch tray, the Orv3 HPR, Wedge 400 fabric switch with 12.8 Tbps switching capacity, management switch, battery backup, and a rack management controller. Interestingly, Meta also upgraded its "Grand Teton" system for internal usage, such as deep learning recommendation models (DLRMs) and content understanding with AMD Instinct MI300X. Those are used to inference internal models, and MI300X appears to provide the best performance per Dollar for inference. According to Meta, the computational demand stemming from AI will continue to increase exponentially, so more NVIDIA and AMD GPUs is needed, and we can't wait to see what the company builds.

Accenture to Train 30,000 of Its Employees on NVIDIA AI Full Stack

Accenture and NVIDIA today announced an expanded partnership, including Accenture's formation of a new NVIDIA Business Group, to help the world's enterprises rapidly scale their AI adoption. With generative AI demand driving $3 billion in Accenture bookings in its recently-closed fiscal year, the new group will help clients lay the foundation for agentic AI functionality using Accenture's AI Refinery, which uses the full NVIDIA AI stack—including NVIDIA AI Foundry, NVIDIA AI Enterprise and NVIDIA Omniverse—to advance areas such as process reinvention, AI-powered simulation and sovereign AI.

Accenture AI Refinery will be available on all public and private cloud platforms and will integrate seamlessly with other Accenture Business Groups to accelerate AI across the SaaS and Cloud AI ecosystem.

NVIDIA Cancels Dual-Rack NVL36x2 in Favor of Single-Rack NVL72 Compute Monster

NVIDIA has reportedly discontinued its dual-rack GB200 NVL36x2 GPU model, opting to focus on the single-rack GB200 NVL72 and NVL36 models. This shift, revealed by industry analyst Ming-Chi Kuo, aims to simplify NVIDIA's offerings in the AI and HPC markets. The decision was influenced by major clients like Microsoft, who prefer the NVL72's improved space efficiency and potential for enhanced inference performance. While both models perform similarly in AI large language model (LLM) training, the NVL72 is expected to excel in non-parallelizable inference tasks. As a reminder, the NVL72 features 36 Grace CPUs, delivering 2,592 Arm Neoverse V2 cores with 17 TB LPDDR5X memory with 18.4 TB/s aggregate bandwidth. Additionally, it includes 72 Blackwell GB200 SXM GPUs that have a massive 13.5 TB of HBM3e combined, running at 576 TB/s aggregate bandwidth.

However, this shift presents significant challenges. The NVL72's power consumption of around 120kW far exceeds typical data center capabilities, potentially limiting its immediate widespread adoption. The discontinuation of the NVL36x2 has also sparked concerns about NVIDIA's execution capabilities and may disrupt the supply chain for assembly and cooling solutions. Despite these hurdles, industry experts view this as a pragmatic approach to product planning in the dynamic AI landscape. While some customers may be disappointed by the dual-rack model's cancellation, NVIDIA's long-term outlook in the AI technology market remains strong. The company continues to work with clients and listen to their needs, to position itself as a leader in high-performance computing solutions.

Huawei Starts Shipping "Ascend 910C" AI Accelerator Samples to Large NVIDIA Customers

Huawei has reportedly started shipping its Ascend 910C accelerator—the company's domestic alternative to NVIDIA's H100 accelerator for AI training and inference. As the report from China South Morning Post notes, Huawei is shipping samples of its accelerator to large NVIDIA customers. This includes companies like Alibaba, Baidu, and Tencent, which have ordered massive amounts of NVIDIA accelerators. However, Huawei is on track to deliver 70,000 chips, potentially worth $2 billion. With NVIDIA working on a B20 accelerator SKU that complies with US government export regulations, the Huawei Ascend 910C accelerator could potentially outperform NVIDIA's B20 processor, per some analyst expectations.

If the Ascend 910C receives positive results from Chinese tech giants, it could be the start of Huawei's expansion into data center accelerators, once hindered by the company's ability to manufacture advanced chips. Now, with foundries like SMIC printing 7 nm designs and possibly 5 nm coming soon, Huawei will leverage this technology to satisfy the domestic demand for more AI processing power. Competing on a global scale, though, remains a challenge. Companies like NVIDIA, AMD, and Intel have access to advanced nodes, which gives their AI accelerators more efficiency and performance.

Intel Targets 35% Cost Reduction in Sales and Marketing Group, Bracing for Tough Times Ahead

Intel's Sales and Marketing Group (SMG) has announced a 35% reduction in costs as the company looks to streamline operations and adapt to challenging market conditions. The cuts, revealed during an all-hands meeting on August 5th, will impact both jobs and marketing expenses within the SMG. Intel has directed the group to "simplify programs end-to-end" by the end of the year, a directive that comes on the heels of the company's announcement that it would lay off 15% of its global workforce to save $10 billion in operating expenses. "We are becoming a simpler, leaner, and more agile company that's easier for partners and customers to work with while ensuring we focus our investments on areas where we see the greatest opportunities for innovation and growth," Intel said in a statement to CRN. The company emphasized that this restructuring is about "building a stronger Intel for the future," with partners integral to its plans.

The job cuts within the SMG are expected to target overlapping responsibilities, such as account managers and industry-focused teams, which can confuse customers navigating Intel's complex organization. Additionally, the company plans to significantly reduce its marketing budget and simplify programs, aiming to save at least $100 million in the latter half of 2024 and an additional $300 million in the first half of 2025. The impact will also be felt in Intel's market development fund (MDF), a crucial tool for supporting OEMs and other partners through events, training, and more. An ex-Intel executive warned that the MDF had become vital as the company's product leadership waned, allowing it to maintain valuable relationships with partners. As Intel navigates these changes, its partners are bracing for the impact, with one CEO describing the situation as everyone "hunkering down and just waiting to hear something." Another partner executive expressed concerns about Intel's ability to maintain the level of service and support its customers have come to expect.

Microsoft Prepares MAI-1 In-House AI Model with 500B Parameters

According to The Information, Microsoft is developing a new AI model, internally named MAI-1, designed to compete with the leading models from Google, Anthropic, and OpenAI. This significant step forward in the tech giant's AI capabilities is boosted by Mustafa Suleyman, the former Google AI leader who previously served as CEO of Inflection AI before Microsoft acquired the majority of its staff and intellectual property for $650 million in March. MAI-1 is a custom Microsoft creation that utilizes training data and technology from Inflection but is not a transferred model. It is also distinct from Inflection's previously released Pi models, as confirmed by two Microsoft insiders familiar with the project. With approximately 500 billion parameters, MAI-1 will be significantly larger than its predecessors, surpassing the capabilities of Microsoft's smaller, open-source models.

For comparison, OpenAI's GPT-4 boasts 1.8 trillion parameters in a Mixture of Experts sparse design, while open-source models from Meta and Mistral feature 70 billion parameters dense. Microsoft's investment in MAI-1 highlights its commitment to staying competitive in the rapidly evolving AI landscape. The development of this large-scale model represents a significant step forward for the tech giant, as it seeks to challenge industry leaders in the field. The increased computing power, training data, and financial resources required for MAI-1 demonstrate Microsoft's dedication to pushing the boundaries of AI capabilities and intention to compete on its own. With the involvement of Mustafa Suleyman, a renowned expert in AI, the company is well-positioned to make significant strides in this field.

US Government Wants Nuclear Plants to Offload AI Data Center Expansion

The expansion of AI technology affects not only the production and demand for graphics cards but also the electricity grid that powers them. Data centers hosting thousands of GPUs are becoming more common, and the industry has been building new facilities for GPU-enhanced servers to serve the need for more AI. However, these powerful GPUs often consume over 500 Watts per single card, and NVIDIA's latest Blackwell B200 GPU has a TGP of 1000 Watts or a single kilowatt. These kilowatt GPUs will be present in data centers with 10s of thousands of cards, resulting in multi-megawatt facilities. To combat the load on the national electricity grid, US President Joe Biden's administration has been discussing with big tech to re-evaluate their power sources, possibly using smaller nuclear plants. According to an Axios interview with Energy Secretary Jennifer Granholm, she has noted that "AI itself isn't a problem because AI could help to solve the problem." However, the problem is the load-bearing of the national electricity grid, which can't sustain the rapid expansion of the AI data centers.

The Department of Energy (DOE) has been reportedly talking with firms, most notably hyperscalers like Microsoft, Google, and Amazon, to start considering nuclear fusion and fission power plants to satisfy the need for AI expansion. We have already discussed the plan by Microsoft to embed a nuclear reactor near its data center facility and help manage the load of thousands of GPUs running AI training/inference. However, this time, it is not just Microsoft. Other tech giants are reportedly thinking about nuclear as well. They all need to offload their AI expansion from the US national power grid and develop a nuclear solution. Nuclear power is a mere 20% of the US power sourcing, and DOE is currently financing a Holtec Palisades 800-MW electric nuclear generating station with $1.52 billion in funds for restoration and resumption of service. Microsoft is investing in a Small Modular Reactors (SMRs) microreactor energy strategy, which could be an example for other big tech companies to follow.

Altair SimSolid Transforms Simulation for Electronics Industry

Altair, a global leader in computational intelligence, announced the upcoming release of Altair SimSolid for electronics, bringing game-changing fast, easy, and precise multi-physics scenario exploration for electronics, from chips, PCBs, and ICs to full system design. "As the electronics industry pushes the boundaries of complexity and miniaturization, engineers have struggled with simulations that often compromise on detail for expediency. Altair SimSolid will empower engineers to capture the intricate complexities of PCBs and ICs without simplification," said James R. Scapa, founder and chief executive officer, Altair. "Traditional simulation methods often require approximations when analyzing PCB structures due to their complexity. Altair SimSolid eliminates these approximations to run more accurate simulations for complex problems with vast dimensional disparities."

Altair SimSolid has revolutionized conventional analysis in its ability to accurately predict complex structural problems with blazing-fast speed while eliminating the complexity of laborious hours of modeling. It eliminates geometry simplification and meshing, the two most time-consuming and expertise-intensive tasks done in traditional finite element analysis. As a result, it delivers results in seconds to minutes—up to 25x faster than traditional finite element solvers—and effortlessly handles complex assemblies. Having experienced fast adoption in the aerospace and automotive industries, two sectors that typically experience challenges associated with massive structures, Altair SimSolid is poised to play a significant role in the electronics market. The initial release, expected in Q2 2024, will support structural and thermal analysis for PCBs and ICs with full electromagnetics analysis coming in a future release.

Dell Expands Generative AI Solutions Portfolio, Selects NVIDIA Blackwell GPUs

Dell Technologies is strengthening its collaboration with NVIDIA to help enterprises adopt AI technologies. By expanding the Dell Generative AI Solutions portfolio, including with the new Dell AI Factory with NVIDIA, organizations can accelerate integration of their data, AI tools and on-premises infrastructure to maximize their generative AI (GenAI) investments. "Our enterprise customers are looking for an easy way to implement AI solutions—that is exactly what Dell Technologies and NVIDIA are delivering," said Michael Dell, founder and CEO, Dell Technologies. "Through our combined efforts, organizations can seamlessly integrate data with their own use cases and streamline the development of customized GenAI models."

"AI factories are central to creating intelligence on an industrial scale," said Jensen Huang, founder and CEO, NVIDIA. "Together, NVIDIA and Dell are helping enterprises create AI factories to turn their proprietary data into powerful insights."

MAINGEAR Introduces PRO AI Workstations Featuring aiDAPTIV+ For Cost-Effective Large Language Model Training

MAINGEAR, a leading provider of high-performance custom PC systems, and Phison, a global leader in NAND controllers and storage solutions, today unveiled groundbreaking MAINGEAR PRO AI workstations with Phison's aiDAPTIV+ technology. Specifically engineered to democratize Large Language Model (LLM) development and training for small and medium-sized businesses (SMBs), these ultra-powerful workstations incorporate aiDAPTIV+ technology to deliver supercomputer LLM training capabilities at a fraction of the cost of traditional AI training servers.

As the demand for large-scale generative AI models continues to surge and their complexity increases, the potential for LLMs also expands. However, this rapid advancement in LLM AI technology has led to a notable boost in hardware requirements, making model training cost-prohibitive and inaccessible for many small to medium businesses.

Cerebras & G42 Break Ground on Condor Galaxy 3 - an 8 exaFLOPs AI Supercomputer

Cerebras Systems, the pioneer in accelerating generative AI, and G42, the Abu Dhabi-based leading technology holding group, today announced the build of Condor Galaxy 3 (CG-3), the third cluster of their constellation of AI supercomputers, the Condor Galaxy. Featuring 64 of Cerebras' newly announced CS-3 systems - all powered by the industry's fastest AI chip, the Wafer-Scale Engine 3 (WSE-3) - Condor Galaxy 3 will deliver 8 exaFLOPs of AI with 58 million AI-optimized cores. The Cerebras and G42 strategic partnership already delivered 8 exaFLOPs of AI supercomputing performance via Condor Galaxy 1 and Condor Galaxy 2, each amongst the largest AI supercomputers in the world. Located in Dallas, Texas, Condor Galaxy 3 brings the current total of the Condor Galaxy network to 16 exaFLOPs.

"With Condor Galaxy 3, we continue to achieve our joint vision of transforming the worldwide inventory of AI compute through the development of the world's largest and fastest AI supercomputers," said Kiril Evtimov, Group CTO of G42. "The existing Condor Galaxy network has trained some of the leading open-source models in the industry, with tens of thousands of downloads. By doubling the capacity to 16exaFLOPs, we look forward to seeing the next wave of innovation Condor Galaxy supercomputers can enable." At the heart of Condor Galaxy 3 are 64 Cerebras CS-3 Systems. Each CS-3 is powered by the new 4 trillion transistor, 900,000 AI core WSE-3. Manufactured at TSMC at the 5-nanometer node, the WSE-3 delivers twice the performance at the same power and for the same price as the previous generation part. Purpose built for training the industry's largest AI models, WSE-3 delivers an astounding 125 petaflops of peak AI performance per chip.

Tiny Corp. CEO Expresses "70% Confidence" in AMD Open-Sourcing Certain GPU Firmware

Lately Tiny Corp. CEO—George Hotz—has used his company's social media account to publicly criticize AMD Radeon RX 7900 XTX GPU firmware. The creator of Tinybox, a pre-orderable $15,000 AI compute cluster, has not selected "traditional" hardware for his systems—it is possible that AMD's Instinct MI300X accelerator is quite difficult to acquire, especially for a young startup operation. The decision to utilize gaming-oriented XFX-branded RDNA 3.0 GPUs instead of purpose-built CDNA 3.0 platforms—for local model training and AI inference—is certainly a peculiar one. Hotz and his colleagues have encountered roadblocks in the development of their Tinybox system—recently, public attention was drawn to an "LLVM spilling bug." AMD President/CEO/Chair, Dr. Lisa Su, swiftly stepped in and promised a "good solution." Earlier in the week, Tiny Corp. reported satisfaction with a delivery of fixes—courtesy of Team Red's software engineering department. They also disclosed that they would be discussing matters with AMD directly, regarding the possibility of open-sourcing Radeon GPU MES firmware.

Subsequently, Hotz documented his interactions with Team Red representatives—he expressed 70% confidence in AMD approving open-sourcing certain bits of firmware in a week's time: "Call went pretty well. We are gating the commitment to 6x Radeon RX 7900 XTX on a public release of a roadmap to get the firmware open source. (and obviously the MLPerf training bug being fixed). We aren't open source purists, it doesn't matter to us if the HDCP stuff is open for example. But we need the scheduler and the memory hierarchy management to be open. This is what it takes to push the performance of neural networks. The Groq 500 T/s mixtral demo should be possible on a tinybox, but it requires god tier software and deep integration with the scheduler. We also advised that the build process for amdgpu-dkms should be more open. While the driver itself is open, we haven't found it easy to rebuild and install. Easy REPL cycle is a key driver for community open source. We want the firmware to be easy to rebuild and install also." Prior to this week's co-operations, Tiny Corp. hinted that it could move on from utilizing Radeon RX 7900 XTX, in favor of Intel Alchemist graphics hardware—if AMD's decision making does not favor them, Hotz & Co. could pivot to builds including Acer Predator BiFrost Arc A770 16 GB OC cards.

Jensen Huang Celebrates Rise of Portable AI Workstations

2024 will be the year generative AI gets personal, the CEOs of NVIDIA and HP said today in a fireside chat, unveiling new laptops that can build, test and run large language models. "This is a renaissance of the personal computer," said NVIDIA founder and CEO Jensen Huang at HP Amplify, a gathering in Las Vegas of about 1,500 resellers and distributors. "The work of creators, designers and data scientists is going to be revolutionized by these new workstations."

Greater Speed and Security
"AI is the biggest thing to come to the PC in decades," said HP's Enrique Lores, in the runup to the announcement of what his company billed as "the industry's largest portfolio of AI PCs and workstations." Compared to running their AI work in the cloud, the new systems will provide increased speed and security while reducing costs and energy, Lores said in a keynote at the event. New HP ZBooks provide a portfolio of mobile AI workstations powered by a full range of NVIDIA RTX Ada Generation GPUs. Entry-level systems with the NVIDIA RTX 500 Ada Generation Laptop GPU let users run generative AI apps and tools wherever they go. High-end models pack the RTX 5000 to deliver up to 682 TOPS, so they can create and run LLMs locally, using retrieval-augmented generation (RAG) to connect to their content for results that are both personalized and private.

NVIDIA Introduces Generative AI Professional Certification

NVIDIA is offering a new professional certification in generative AI to enable developers to establish technical credibility in this important domain. Generative AI is revolutionizing industries worldwide, yet there's a critical skills gap and need to uplevel employees to more fully harness the technology. Available for the first time from NVIDIA, this new professional certification enables developers, career professionals, and others to validate and showcase their generative AI skills and expertise. Our new professional certification program introduces two associate-level generative AI certifications, focusing on proficiency in large language models and multimodal workflow skills.

"Generative AI has moved to center stage as governments, industries and organizations everywhere look to harness its transformative capabilities," NVIDIA founder and CEO Jensen Huang recently said. The certification will become available starting at GTC, where in-person attendees can also access recommended training to prepare for a certification exam. "Organizations in every industry need to increase their expertise in this transformative technology," said Greg Estes, VP of developer programs at NVIDIA. "Our goals are to assist in upskilling workforces, sharpen the skills of qualified professionals, and enable individuals to demonstrate their proficiency in order to gain a competitive advantage in the job market."

Google: CPUs are Leading AI Inference Workloads, Not GPUs

The AI infrastructure of today is mostly fueled by the expansion that relies on GPU-accelerated servers. Google, one of the world's largest hyperscalers, has noted that CPUs are still a leading compute for AI/ML workloads, recorded on their Google Cloud Services cloud internal analysis. During the TechFieldDay event, a speech by Brandon Royal, product manager at Google Cloud, explained the position of CPUs in today's AI game. The AI lifecycle is divided into two parts: training and inference. During training, massive compute capacity is needed, along with enormous memory capacity, to fit ever-expanding AI models into memory. The latest models, like GPT-4 and Gemini, contain billions of parameters and require thousands of GPUs or other accelerators working in parallel to train efficiently.

On the other hand, inference requires less compute intensity but still benefits from acceleration. The pre-trained model is optimized and deployed during inference to make predictions on new data. While less compute is needed than training, latency and throughput are essential for real-time inference. Google found out that, while GPUs are ideal for the training phase, models are often optimized and run inference on CPUs. This means that there are customers who choose CPUs as their medium of AI inference for a wide variety of reasons.

AMD CTO Teases Memory Upgrades for Revised Instinct MI300-series Accelerators

Brett Simpson, Partner and Co-Founder of Arete Research, sat down with AMD CTO Mark Papermaster during the former's "Investor Webinar Conference." A transcript of the Arete + AMD question and answer session appeared online last week—the documented fireside chat concentrated mostly on "AI compute market" topics. Papermaster was asked about his company's competitive approach when taking on NVIDIA's very popular range of A100 and H100 AI GPUs, as well as the recently launched GH200 chip. The CTO did not reveal any specific pricing strategies—a "big picture" was painted instead: "I think what's important when you just step back is to look at total cost of ownership, not just one GPU, one accelerator, but total cost of ownership. But now when you also look at the macro, if there's not competition in the market, you're going to see not only a growth of the price of these devices due to the added content that they have, but you're -- without a check and balance, you're going to see very, very high margins, more than that could be sustained without a competitive environment."

Papermaster continued: "And what I think is very key with -- as AMD has brought competition market for these most powerful AI training and inference devices is you will see that check and balance. And we have a very innovative approach. We've been a leader in chiplet design. And so we have the right technology for the right purpose of the AI build-out that we do. We have, of course, a GPU accelerator. But there's many other circuitry associated with being able to scale and build out these large clusters, and we're very, very efficient in our design." Team Red started to ship its flagship accelerator, Instinct MI300X, to important customers at the start of 2024—Arete Research's Simpson asked about the possibility of follow-up models. In response, AMD's CTO referenced some recent history: "Well, I think the first thing that I'll highlight is what we did to arrive at this point, where we are a competitive force. We've been investing for years in building up our GPU road map to compete in both HPC and AI. We had a very, very strong harbor train that we've been on, but we had to build our muscle in the software enablement."

Microsoft Auto-updating Eligible Windows 11 PCs to Version 23H2

Windows 11 version 23H2 started rolling out last October, but many users of Microsoft's flagship operating system opted out of an upgrade, thanks to a handy "optional" toggle. News outlets have latched onto a freshly published (February 20) Windows 11 "Release Health" notice—the official Microsoft dashboard alert states that Windows 11 2023 Update: "is now entering a new rollout phase." Fastidious users will not be happy to discover that "eligible Windows 11 devices" are now subject to an automatic bump up to version 23H2. Very passive-aggressive tactics have been utilized in the past—Microsoft is seemingly eager to get it audience upgraded onto its latest and greatest feature-rich experience.

According to NeoWin, an official announcement from last week alerted users to an "impending end of optional preview updates on Windows 11 22H2." Yesterday's "23H2" dashboard confessional provided a little bit more context—unsurprisingly involving artificial intelligence: "This automatic update targets Windows 11 devices that have reached or are approaching end of servicing, and it follows the machine learning-based (ML) training we have utilized so far. We will continue to train our intelligent ML model to safely roll out this new Windows version in phases to deliver a smooth update experience."

Groq LPU AI Inference Chip is Rivaling Major Players like NVIDIA, AMD, and Intel

AI workloads are split into two different categories: training and inference. While training requires large computing and memory capacity, access speeds are not a significant contributor; inference is another story. With inference, the AI model must run extremely fast to serve the end-user with as many tokens (words) as possible, hence giving the user answers to their prompts faster. An AI chip startup, Groq, which was in stealth mode for a long time, has been making major moves in providing ultra-fast inference speeds using its Language Processing Unit (LPU) designed for large language models (LLMs) like GPT, Llama, and Mistral LLMs. The Groq LPU is a single-core unit based on the Tensor-Streaming Processor (TSP) architecture which achieves 750 TOPS at INT8 and 188 TeraFLOPS at FP16, with 320x320 fused dot product matrix multiplication, in addition to 5,120 Vector ALUs.

Having massive concurrency with 80 TB/s of bandwidth, the Groq LPU has 230 MB capacity of local SRAM. All of this is working together to provide Groq with a fantastic performance, making waves over the past few days on the internet. Serving the Mixtral 8x7B model at 480 tokens per second, the Groq LPU is providing one of the leading inference numbers in the industry. In models like Llama 2 70B with 4096 token context length, Groq can serve 300 tokens/s, while in smaller Llama 2 7B with 2048 tokens of context, Groq LPU can output 750 tokens/s. According to the LLMPerf Leaderboard, the Groq LPU is beating the GPU-based cloud providers at inferencing LLMs Llama in configurations of anywhere from 7 to 70 billion parameters. In token throughput (output) and time to first token (latency), Groq is leading the pack, achieving the highest throughput and second lowest latency.

SoftBank Founder Wants $100 Billion to Compete with NVIDIA's AI

Japanese tech billionaire and founder of the SoftBank Group, Masayoshi Son, is embarking on a hugely ambitious new project to build an AI chip company that aims to rival NVIDIA, the current leader in AI semiconductor solutions. Codenamed "Izanagi" after the Japanese god of creation, Son aims to raise up to $100 billion in funding for the new venture. With his company SoftBank having recently scaled back investments in startups, Son is now setting his sights on the red-hot AI chip sector. Izanagi would leverage SoftBank's existing chip design firm, Arm, to develop advanced semiconductors tailored for artificial intelligence computing. The startup would use Arm's instruction set for the chip's processing elements. This could pit Izanagi directly against NVIDIA's leadership position in AI chips. Son has a chest of $41 billion in cash at SoftBank that he can deploy for Izanagi.

Additionally, he is courting sovereign wealth funds in the Middle East to contribute up to $70 billion in additional capital. In total, Son may be seeking up to $100 billion to bankroll Izanagi into a chip powerhouse. AI chips are seeing surging demand as machine learning and neural networks require specialized semiconductors that can process massive datasets. NVIDIA and other names like Intel, AMD, and select startups have capitalized on this trend. However, Son believes the market has room for another major player. Izanagi would focus squarely on developing bleeding-edge AI chip architectures to power the next generation of artificial intelligence applications. It is still unclear if this would be an AI training or AI inference project, but given that the training market is currently bigger as we are in the early buildout phase of AI infrastructure, the consensus might settle on training. With his track record of bold bets, Son is aiming very high with Izanagi. It's a hugely ambitious goal, but Son has defied expectations before. Project Izanagi will test the limits of even his vision and financial firepower.

AMD ROCm 6.0 Adds Support for Radeon PRO W7800 & RX 7900 GRE GPUs

Building on our previously announced support of the AMD Radeon RX 7900 XT, XTX and Radeon PRO W7900 GPUs with AMD ROCm 5.7 and PyTorch, we are now expanding our client-based ML Development offering, both from the hardware and software side with AMD ROCm 6.0. Firstly, AI researchers and ML engineers can now also develop on Radeon PRO W7800 and on Radeon RX 7900 GRE GPUs. With support for such a broad product portfolio, AMD is helping the AI community to get access to desktop graphics cards at even more price points and at different performance levels.

Furthermore, we are complementing our solution stack with support for ONNX Runtime. ONNX, short for Open Neural Network Exchange, is an intermediary Machine Learning framework used to convert AI models between different ML frameworks. As a result, users can now perform inference on a wider range of source data on local AMD hardware. This also adds INT8 via MIGraphX—AMD's own graph inference engine—to the available data types (including FP32 and FP16). With AMD ROCm 6.0, we are continuing our support for the PyTorch framework bringing mixed precision with FP32/FP16 to Machine Learning training workflows.

Meta Will Acquire 350,000 H100 GPUs Worth More Than 10 Billion US Dollars

Mark Zuckerberg has shared some interesting insights about Meta's AI infrastructure buildout, which is on track to include an astonishing number of NVIDIA H100 Tensor GPUs. In the post on Instagram, Meta's CEO has noted the following: "We're currently training our next-gen model Llama 3, and we're building massive compute infrastructure to support our future roadmap, including 350k H100s by the end of this year -- and overall almost 600k H100s equivalents of compute if you include other GPUs." That means that the company will enhance its AI infrastructure with 350,000 H100 GPUs on top of the existing GPUs, which is equivalent to 250,000 H100 in terms of computing power, for a total of 600,000 H100-equivalent GPUs.

The raw number of GPUs installed comes at a steep price. With the average selling price of H100 GPU nearing 30,000 US dollars, Meta's investment will settle the company back around $10.5 billion. Other GPUs should be in the infrastructure, but most will comprise the NVIDIA Hopper family. Additionally, Meta is currently training the LLama 3 AI model, which will be much more capable than the existing LLama 2 family and will include better reasoning, coding, and math-solving capabilities. These models will be open-source. Later down the pipeline, as the artificial general intelligence (AGI) comes into play, Zuckerberg has noted that "Our long term vision is to build general intelligence, open source it responsibly, and make it widely available so everyone can benefit." So, expect to see these models in the GitHub repositories in the future.

Dell Partners with Imbue on New AI Compute Cluster Using Nearly 10,000 NVIDIA H100 GPUs

Dell Technologies and Imbue, an independent AI research company, have entered into a $150 million agreement to build a new high-performance computing cluster for training foundation models optimized for reasoning. Imbue is one of the few independent AI labs that develops its own foundation models, and trains them to have more advanced reasoning capabilities—like knowing when to ask for more information, analyzing and critiquing their own outputs, or breaking down a difficult goal into a plan and then executing on it. Imbue trains AI agents on top of those models that can do work for people across diverse fields in ways that are robust, safe, and useful. Imbue's goal is to create practical tools for building agents that could enable workers across a broad set of domains, including helping engineers write new code, analysts understand and draft complex policy proposals, and much more.

AWS Unveils Next Generation AWS-Designed Graviton4 and Trainium2 Chips

At AWS re:Invent, Amazon Web Services, Inc. (AWS), an Amazon.com, Inc. company (NASDAQ: AMZN), today announced the next generation of two AWS-designed chip families—AWS Graviton4 and AWS Trainium2—delivering advancements in price performance and energy efficiency for a broad range of customer workloads, including machine learning (ML) training and generative artificial intelligence (AI) applications. Graviton4 and Trainium2 mark the latest innovations in chip design from AWS. With each successive generation of chip, AWS delivers better price performance and energy efficiency, giving customers even more options—in addition to chip/instance combinations featuring the latest chips from third parties like AMD, Intel, and NVIDIA—to run virtually any application or workload on Amazon Elastic Compute Cloud (Amazon EC2).

NVIDIA NeMo: Designers Tap Generative AI for a Chip Assist

A research paper released this week describes ways generative AI can assist one of the most complex engineering efforts: designing semiconductors. The work demonstrates how companies in highly specialized fields can train large language models (LLMs) on their internal data to build assistants that increase productivity.

Few pursuits are as challenging as semiconductor design. Under a microscope, a state-of-the-art chip like an NVIDIA H100 Tensor Core GPU (above) looks like a well-planned metropolis, built with tens of billions of transistors, connected on streets 10,000x thinner than a human hair. Multiple engineering teams coordinate for as long as two years to construct one of these digital mega cities. Some groups define the chip's overall architecture, some craft and place a variety of ultra-small circuits, and others test their work. Each job requires specialized methods, software programs and computer languages.
Return to Keyword Browsing
Nov 21st, 2024 07:26 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts