News Posts matching #ML

Return to Keyword Browsing

New Performance Optimizations Supercharge NVIDIA RTX AI PCs for Gamers, Creators and Developers

NVIDIA today announced at Microsoft Build new AI performance optimizations and integrations for Windows that help deliver maximum performance on NVIDIA GeForce RTX AI PCs and NVIDIA RTX workstations. Large language models (LLMs) power some of the most exciting new use cases in generative AI and now run up to 3x faster with ONNX Runtime (ORT) and DirectML using the new NVIDIA R555 Game Ready Driver. ORT and DirectML are high-performance tools used to run AI models locally on Windows PCs.

WebNN, an application programming interface for web developers to deploy AI models, is now accelerated with RTX via DirectML, enabling web apps to incorporate fast, AI-powered capabilities. And PyTorch will support DirectML execution backends, enabling Windows developers to train and infer complex AI models on Windows natively. NVIDIA and Microsoft are collaborating to scale performance on RTX GPUs. These advancements build on NVIDIA's world-leading AI platform, which accelerates more than 500 applications and games on over 100 million RTX AI PCs and workstations worldwide.

ChatGPT Comes to Desktop with OpenAI's Latest GPT-4o Model That Talks With Users

At OpenAI's spring update, a lot of eyes were fixed on the company, which spurred the AI boom with the ChatGPT application. Now being almost a must-have app for consumers and prosumers alike, ChatGPT is a de-facto application for the latest AI innovation, backed by researchers and scientists from OpenAI. Today, OpenAI announced a new model called GPT-4o (Omni), which hopes to bring advanced intelligence, improved overall capabilities, and real-time voice interaction with users. Now, the ChatGPT application wants to become like a personal assistant that actively communicates with users and provides much broader capabilities. OpenAI claims that it can respond to audio inputs as quickly as 232 milliseconds, with an average of 320 milliseconds, similar to human response time in conversations.

However, OpenAI states that it wants ChatGPT's latest GPT-4o model to be available to the free, Plus, and Team paid subscribers, where paid subscribers get 5x higher usage and early access to the model. Interestingly, the GPT-4o model is much improved across a variety of standard benchmarks like MMLU, Math, HumanEval, GPQA, and others, where it now surpasses almost all models except Claude 3 Opus in MGSM. It now understands more than 50 languages and can do real time translation. In addition to the new model, OpenAI announced that they are launching a desktop ChatGPT app, which can act as a personal assistant and see what is happening on the screen, but it is only allowed by user command. This is supposed to bring a much more refined user experience and enable users to use AI as a third person to help understand the screen's content. Initially only available on macOS, we are waiting for OpenAI to launch the Windows ChatGPT application so everyone can also experience the new technology.

SpiNNcloud Systems Announces First Commercially Available Neuromorphic Supercomputer

Today, in advance of ISC High Performance 2024, SpiNNcloud Systems announced the commercial availability of its SpiNNaker2 platform, a supercomputer-level hybrid AI high-performance computer system based on principles of the human brain. Pioneered by Steve Furber, designer of the original ARM and SpiNNaker1 architectures, the SpiNNaker2 supercomputing platform uses a large number of low-power processors for efficiently computing AI and other workloads.

First-generation SpiNNaker1 architecture is currently used in dozens of research groups across 23 countries worldwide. Sandia National Laboratories, Technical University of München and Universität Göttingen are among the first customers placing orders for SpiNNaker2, which was developed around commercialized IP invented in the Human Brain Project, a billion-euro research project funded by the European Union to design intelligent, efficient artificial systems.

Apple Introduces the M4 Chip

Apple today announced M4, the latest chip delivering phenomenal performance to the all-new iPad Pro. Built using second-generation 3-nanometer technology, M4 is a system on a chip (SoC) that advances the industry-leading power efficiency of Apple silicon and enables the incredibly thin design of iPad Pro. It also features an entirely new display engine to drive the stunning precision, color, and brightness of the breakthrough Ultra Retina XDR display on iPad Pro. A new CPU has up to 10 cores, while the new 10-core GPU builds on the next-generation GPU architecture introduced in M3, and brings Dynamic Caching, hardware-accelerated ray tracing, and hardware-accelerated mesh shading to iPad for the first time. M4 has Apple's fastest Neural Engine ever, capable of up to 38 trillion operations per second, which is faster than the neural processing unit of any AI PC today. Combined with faster memory bandwidth, along with next-generation machine learning (ML) accelerators in the CPU, and a high-performance GPU, M4 makes the new iPad Pro an outrageously powerful device for artificial intelligence.

"The new iPad Pro with M4 is a great example of how building best-in-class custom silicon enables breakthrough products," said Johny Srouji, Apple's senior vice president of Hardware Technologies. "The power-efficient performance of M4, along with its new display engine, makes the thin design and game-changing display of iPad Pro possible, while fundamental improvements to the CPU, GPU, Neural Engine, and memory system make M4 extremely well suited for the latest applications leveraging AI. Altogether, this new chip makes iPad Pro the most powerful device of its kind."

Intel Builds World's Largest Neuromorphic System to Enable More Sustainable AI

Today, Intel announced that it has built the world's largest neuromorphic system. Code-named Hala Point, this large-scale neuromorphic system, initially deployed at Sandia National Laboratories, utilizes Intel's Loihi 2 processor, aims at supporting research for future brain-inspired artificial intelligence (AI), and tackles challenges related to the efficiency and sustainability of today's AI. Hala Point advances Intel's first-generation large-scale research system, Pohoiki Springs, with architectural improvements to achieve over 10 times more neuron capacity and up to 12 times higher performance.

"The computing cost of today's AI models is rising at unsustainable rates. The industry needs fundamentally new approaches capable of scaling. For that reason, we developed Hala Point, which combines deep learning efficiency with novel brain-inspired learning and optimization capabilities. We hope that research with Hala Point will advance the efficiency and adaptability of large-scale AI technology." -Mike Davies, director of the Neuromorphic Computing Lab at Intel Labs

Intel Announces New Program for AI PC Software Developers and Hardware Vendors

Intel Corporation today announced the creation of two new artificial intelligence (AI) initiatives as part of the AI PC Acceleration Program: the AI PC Developer Program and the addition of independent hardware vendors to the program. These are critical milestones in Intel's pursuit of enabling the software and hardware ecosystem to optimize and maximize AI on more than 100 million Intel-based AI PCs through 2025.

"We have made great strides with our AI PC Acceleration Program by working with the ecosystem. Today, with the addition of the AI PC Developer Program, we are expanding our reach to go beyond large ISVs and engage with small- and medium-sized players and aspiring developers. Our goal is to drive a frictionless experience by offering a broad set of tools including the new AI-ready Developer Kit," said Carla Rodriguez, Intel vice president and general manager of Client Software Ecosystem Enabling.

UL Announces the Procyon AI Image Generation Benchmark Based on Stable Diffusion

We're excited to announce we're expanding our AI Inference benchmark offerings with the UL Procyon AI Image Generation Benchmark, coming Monday, 25th March. AI has the potential to be one of the most significant new technologies hitting the mainstream this decade, and many industry leaders are competing to deliver the best AI Inference performance through their hardware. Last year, we launched the first of our Procyon AI Inference Benchmarks for Windows, which measured AI Inference performance with a workload using Computer Vision.

The upcoming UL Procyon AI Image Generation Benchmark provides a consistent, accurate and understandable workload for measuring the AI performance of high-end hardware, built with input from members of the industry to ensure fair and comparable results across all supported hardware.

Lenovo and Anaconda Announce Agreement to Accelerate AI Development and Deployment

Today, Lenovo announced a strategic partnership with Anaconda Inc., the leading provider of the world's most popular artificial intelligence (AI), machine learning (ML) and data science platform, to empower Lenovo's high performance data science workstations. The partnership will couple Lenovo's trusted ThinkStation and ThinkPad workstation product portfolio heritage and leadership with Anaconda's enterprise strengths for open-source leadership, security, and reliability.

The rapidly evolving world of artificial intelligence, deep learning and generative AI is opening up new opportunities for businesses and data scientists. Much of the AI innovation taking place today is driven by open-source software and cloud-based solutions, with Python being a leading software language for AI applications. However, the data security risks associated with utilizing open-source software at an enterprise level, privacy concerns and often prohibitive cost of cloud-based AI solutions, is causing many organizations to rethink their approach to investment in AI development. With Intel -powered Lenovo workstations architected with the latest generations of professional NVIDIA GPUs built for large-language model fine-tuning, and the Anaconda Navigator's ability to enable businesses to leverage open-source and AI with enhanced security, scale, and governance mechanisms in place, the partnership allows data scientists to create and deploy AI solutions with first class hardware and enterprise-grade AI software support within a more manageable investment framework.

Ethernet Switch Chips are Now Infected with AI: Broadcom Announces Trident 5-X12

Artificial intelligence has been a hot topic this year, and everything is now an AI processor, from CPUs to GPUs, NPUs, and many others. However, it was only a matter of time before we saw an integration of AI processing elements into the networking chips. Today, Broadcom announced its new Ethernet switching silicon called Trident 5-X12. The Trident 5-X12 delivers 16 Tb/s of bandwidth, double that of the previous Trident generation while adding support for fast 800G ports for connection to Tomahawk 5 spine switch chips. The 5-X12 is software-upgradable and optimized for dense 1RU top-of-rack designs, enabling configurations with up to 48x200G downstream server ports and 8x800G upstream fabric ports. The 800G support is added using 100G-PAM4 SerDes, which enables up to 4 m DAC and linear optics.

However, this is not only a switch chip on its own. Broadcom has added AI processing elements in an inference engine called NetGNT (Networking General-purpose Neural-network Traffic-analyzer). It can detect common traffic patterns and optimize data movement across the chip. Specifically, the company has listed an example of the system doing AI/ML workloads. In that case, NetGNT performs intelligent traffic analysis to avoid network congestion in these workloads. For example, it can detect the so-called "incast" patterns in real-time, where many flows converge simultaneously on the same port. By recognizing the start of incast early, NetGNT can invoke hardware-based congestion control techniques to prevent performance degradation without added latency.

Intel Advances Scientific Research and Performance for New Wave of Supercomputers

At SC23, Intel showcased AI-accelerated high performance computing (HPC) with leadership performance for HPC and AI workloads across Intel Data Center GPU Max Series, Intel Gaudi 2 AI accelerators and Intel Xeon processors. In partnership with Argonne National Laboratory, Intel shared progress on the Aurora generative AI (genAI) project, including an update on the 1 trillion parameter GPT-3 LLM on the Aurora supercomputer that is made possible by the unique architecture of the Max Series GPU and the system capabilities of the Aurora supercomputer. Intel and Argonne demonstrated the acceleration of science with applications from the Aurora Early Science Program (ESP) and the Exascale Computing Project. The company also showed the path to Intel Gaudi 3 AI accelerators and Falcon Shores.

"Intel has always been committed to delivering innovative technology solutions to meet the needs of the HPC and AI community. The great performance of our Xeon CPUs along with our Max GPUs and CPUs help propel research and science. That coupled with our Gaudi accelerators demonstrate our full breadth of technology to provide our customers with compelling choices to suit their diverse workloads," said Deepak Patil, Intel corporate vice president and general manager of Data Center AI Solutions.

Ayar Labs Showcases 4 Tbps Optically-enabled Intel FPGA at Supercomputing 2023

Ayar Labs, a leader in silicon photonics for chip-to-chip connectivity, will showcase its in-package optical I/O solution integrated with Intel's industry-leading Agilex Field-Programmable Gate Array (FPGA) technology. In demonstrating 5x current industry bandwidth at 5x lower power and 20x lower latency, the optical FPGA - packaged in a common PCIe card form factor - has the potential to transform the high performance computing (HPC) landscape for data-intensive workloads such as generative artificial intelligence (AI), machine learning, and support novel new disaggregated compute and memory architectures and more.

"We're on the cusp of a new era in high performance computing as optical I/O becomes a 'must have' building block for meeting the exponentially growing, data-intensive demands of emerging technologies like generative AI," said Charles Wuischpard, CEO of Ayar Labs. "Showcasing the integration of Ayar Labs' silicon photonics and Intel's cutting-edge FPGA technology at Supercomputing is a concrete demonstration that optical I/O has the maturity and manufacturability needed to meet these critical demands."

NVIDIA Turbocharges Generative AI Training in MLPerf Benchmarks

NVIDIA's AI platform raised the bar for AI training and high performance computing in the latest MLPerf industry benchmarks. Among many new records and milestones, one in generative AI stands out: NVIDIA Eos - an AI supercomputer powered by a whopping 10,752 NVIDIA H100 Tensor Core GPUs and NVIDIA Quantum-2 InfiniBand networking - completed a training benchmark based on a GPT-3 model with 175 billion parameters trained on one billion tokens in just 3.9 minutes. That's a nearly 3x gain from 10.9 minutes, the record NVIDIA set when the test was introduced less than six months ago.

The benchmark uses a portion of the full GPT-3 data set behind the popular ChatGPT service that, by extrapolation, Eos could now train in just eight days, 73x faster than a prior state-of-the-art system using 512 A100 GPUs. The acceleration in training time reduces costs, saves energy and speeds time-to-market. It's heavy lifting that makes large language models widely available so every business can adopt them with tools like NVIDIA NeMo, a framework for customizing LLMs. In a new generative AI test ‌this round, 1,024 NVIDIA Hopper architecture GPUs completed a training benchmark based on the Stable Diffusion text-to-image model in 2.5 minutes, setting a high bar on this new workload. By adopting these two tests, MLPerf reinforces its leadership as the industry standard for measuring AI performance, since generative AI is the most transformative technology of our time.

AMD, Arm, Intel, Meta, Microsoft, NVIDIA, and Qualcomm Standardize Next-Generation Narrow Precision Data Formats for AI

Realizing the full potential of next-generation deep learning requires highly efficient AI infrastructure. For a computing platform to be scalable and cost efficient, optimizing every layer of the AI stack, from algorithms to hardware, is essential. Advances in narrow-precision AI data formats and associated optimized algorithms have been pivotal to this journey, allowing the industry to transition from traditional 32-bit floating point precision to presently only 8 bits of precision (i.e. OCP FP8).

Narrower formats allow silicon to execute more efficient AI calculations per clock cycle, which accelerates model training and inference times. AI models take up less space, which means they require fewer data fetches from memory, and can run with better performance and efficiency. Additionally, fewer bit transfers reduces data movement over the interconnect, which can enhance application performance or cut network costs.

Comcast and Broadcom to Develop the World's First AI-Powered Access Network With Pioneering New Chipset

Comcast and Broadcom today announced joint efforts to develop the world's first AI-powered access network with a new chipset that embeds artificial intelligence (AI) and machine learning (ML) within the nodes, amps and modems that comprise the last few miles of Comcast's network. With these new capabilities broadly deployed throughout the network, Comcast will be able to transform its operations by automating more network functions and deliver an improved customer experience through better and more actionable intelligence.

Additionally, the new chipset will be the first in the world to incorporate DOCSIS 4.0 Full Duplex (FDX), Extended Spectrum (ESD) and the ability to run both simultaneously, enabling Internet service providers across the globe to deliver DOCSIS 4.0 services using a toolkit with technology options to meet their business needs. DOCSIS 4.0 is the next-generation network technology that will introduce symmetrical multi-gigabit Internet speeds, lower latency, and even better security and reliability to hundreds of millions of people and businesses over their existing connections without the need for major construction of new network infrastructure.

AMD to Acquire Open-Source AI Software Expert Nod.ai

AMD today announced the signing of a definitive agreement to acquire Nod.ai to expand the company's open AI software capabilities. The addition of Nod.ai will bring an experienced team that has developed an industry-leading software technology that accelerates the deployment of AI solutions optimized for AMD Instinct data center accelerators, Ryzen AI processors, EPYC processors, Versal SoCs and Radeon GPUs to AMD. The agreement strongly aligns with the AMD AI growth strategy centered on an open software ecosystem that lowers the barriers of entry for customers through developer tools, libraries and models.

"The acquisition of Nod.ai is expected to significantly enhance our ability to provide AI customers with open software that allows them to easily deploy highly performant AI models tuned for AMD hardware," said Vamsi Boppana, senior vice president, Artificial Intelligence Group at AMD. "The addition of the talented Nod.ai team accelerates our ability to advance open-source compiler technology and enable portable, high-performance AI solutions across the AMD product portfolio. Nod.ai's technologies are already widely deployed in the cloud, at the edge and across a broad range of end point devices today."

Broadcom Partners with Google Cloud to Strengthen Gen AI-Powered Cybersecurity

Symantec, a division of Broadcom Inc., is partnering with Google Cloud to embed generative AI (gen AI) into the Symantec Security platform in a phased rollout that will give customers a significant technical edge for detecting, understanding, and remediating sophisticated cyber attacks.

Symantec is leveraging the Google Cloud Security AI Workbench and security-specific large language model (LLM)--Sec-PaLM 2-across its portfolio to enable natural language interfaces and generate more comprehensive and easy-to-understand threat analyses. With Security AI Workbench-powered summarization of complex incidents and alignment to MITRE ATT&CK context, security operations center (SOC) analysts of all levels can better understand threats and be able to respond faster. That, in turn, translates into greater security and higher SOC productivity.

Supermicro Introduces a Number of Density and Power Optimized Edge Platforms for Telco Providers, Based on the New AMD EPYC 8004 Series Processor

Supermicro, Inc., a Total IT Solution Provider for Cloud, AI/ML, Storage, and 5G/Edge, is announcing the AMD based Supermicro H13 generation of WIO Servers, optimized to deliver strong performance and energy efficiency for edge and telco datacenters powered by the new AMD EPYC 8004 Series processors. The new Supermicro H13 WIO and short-depth front I/O systems deliver energy-efficient single socket servers that lower operating costs for enterprise, telco, and edge applications. These systems are designed with a dense form factor and flexible I/O options for storage and networking, making the new servers ideal for deploying in edge networks.

"We are excited to expand our AMD EPYC-based server offerings optimized to deliver excellent TCO and energy efficiency for data center networking and edge computing," said Charles Liang, president and CEO of Supermicro. "Adding to our already industry leading edge-to-cloud rack scale IT solutions, the new Supermicro H13 WIO systems with PCIe 5.0 and DDR5-4800 MHz memory show tremendous performance for edge applications."

Andes Announces General Availability of the New AndesCore RISC-V Multicore Vector Processor AX45MPV

Andes Technology, a leading supplier of high efficiency, low-power 32/64-bit RISC-V processor cores and Founding Premier member of RISC-V International, today proudly announces general availability of the high-performance AndesCore AX45MPV multicore vector processor IP. The AX45MPV is the third generation of the award winning AndesCore vector processor series. Equipped with powerful RISC-V vector processing and parallel execution capability, it targets the applications with large volumes of data such as ADAS, AI inference and training, AR/VR, multimedia, robotics, and signal processing.

Andes and Meta started collaboration on datacenter AI with RISC-V vector core from early 2019. Andes later unveiled the AndesCore NX27V, marking a significant milestone as the industry's first commercial RISC-V vector processor core with the capability of generating up to 4 512-bit vector (VLEN) results per cycle, at the end of 2019. It immediately attracted the attention of worldwide SoC design teams working on AI accelerators, and has landed over a dozen datacenter AI projects. Since then, the RISC-V vector processor cores have become the choice for ML and AI chip vendors.

Google Cloud and NVIDIA Expand Partnership to Advance AI Computing, Software and Services

Google Cloud Next—Google Cloud and NVIDIA today announced new AI infrastructure and software for customers to build and deploy massive models for generative AI and speed data science workloads.

In a fireside chat at Google Cloud Next, Google Cloud CEO Thomas Kurian and NVIDIA founder and CEO Jensen Huang discussed how the partnership is bringing end-to-end machine learning services to some of the largest AI customers in the world—including by making it easy to run AI supercomputers with Google Cloud offerings built on NVIDIA technologies. The new hardware and software integrations utilize the same NVIDIA technologies employed over the past two years by Google DeepMind and Google research teams.

Tachyum Achieves 192-Core Chip After Switch to New EDA Tools

Tachyum today announced that new EDA tools, utilized during the physical design phase of the Prodigy Universal Processor, have allowed the company to achieve significantly better results with chip specifications than previously anticipated, after the successful change in physical design tools - including an increase in the number of Prodigy cores to 192.

After RTL design coding, Tachyum began work on completing the physical design (the actual placement of transistors and wires) for Prodigy. After the Prodigy design team had to replace IPs, it also had to replace RTL simulation and physical design tools. Armed with a new set of EDA tools, Tachyum was able to optimize settings and options that increased the number of cores by 50 percent, and SERDES from 64 to 96 on each chip. Die size grew minimally, from 500mm2 to 600mm2 to accommodate improved physical capabilities. While Tachyum could add more of its very efficient cores and still fit into the 858mm2 reticle limit, these cores would be memory bandwidth limited, even with 16 DDR5 controllers running in excess of 7200MT/s. Tachyum cores have much higher performance than any other processor cores.

Lightelligence Introduces Optical Interconnect for Composable Data Center Architectures

Lightelligence, the global leader in photonic computing and connectivity systems, today announced Photowave, the first optical communications hardware designed for PCIe and Compute Express Link (CXL) connectivity, unleashing next-generation workload efficiency.

Photowave, an Optical Networking (oNET) transceiver leveraging the significant latency and energy efficiency of photonics technology, empowers data center managers to scale resources within or across server racks. The first public demonstration of Photowave will be at Flash Memory Summit today through Thursday, August 10, in Santa Clara, Calif.

Supermicro Expands AMD Product Lines with New Servers and New Processors Optimized for Cloud Native Infrastructure

Supermicro, Inc., a Total IT Solution Provider for Cloud, AI/ML, Storage, and 5G/Edge, is announcing that its entire line of H13 AMD based-systems is now available with support for 4th Gen AMD EPYC processors, based on "Zen 4c" architecture, and 4th Gen AMD EPYC processors with AMD 3D V-Cache technology. Supermicro servers powered by 4th Gen AMD EPYC processors for cloud-native computing, with leading thread density and 128 cores per socket, deliver impressive rack density and scalable performance with energy efficiency to deploy cloud native workloads in more consolidated infrastructure. These systems are targeted for cloud operators to meet the ever-growing demands of user sessions and deliver AI-enabled new services. Servers featuring AMD 3D V-Cache technology excel in running technical applications in FEA, CFD, and EDA. The large Level 3 cache enables these types of applications to run faster than ever before. Over 50 world record benchmarks have been set with AMD EPYC processors over the past few years.

"Supermicro continues to push the boundary of our product lines to meet customers' requirements. We design and deliver resource-saving, application-optimized servers with rack scale integration for rapid deployments," said Charles Liang, president, and CEO of Supermicro. "With our growing broad portfolio of systems fully optimized for the latest 4th Gen AMD EPYC processors, cloud operators can now achieve extreme density and efficiency for numerous users and cloud-native services even in space-constrained data centers. In addition, our enhanced high performance, multi-socket, multi-node systems address a wide range of technical computing workloads and dramatically reduce time-to-market for manufacturing companies to design, develop, and validate new products leveraging the accelerated performance of memory intensive applications."

Arm Launches the Cortex-X4, A720 and A520, Immortalis-G715 GPU

Mobile devices touch every aspect of our digital lives. In the palm of your hand is the ability to both create and consume increasingly immersive, AI-accelerated experiences that continue to drive the need for more compute. Arm is at the heart of many of these, bringing unlimited delight, productivity and success to more people than ever. Every year we build foundational platforms designed to meet these increasing compute demands, with a relentless focus on high performance and efficiency. Working closely with our broader ecosystem, we're delivering the performance, efficiency and intelligence needed on every generation of consumer device to expand our digital lifestyles.

Today we are announcing Arm Total Compute Solutions 2023 (TCS23), which will be the platform for mobile computing, offering our best ever premium solution for smartphones. TCS23 delivers a complete package of the latest IP designed and optimized for specific workloads to work seamlessly together as a complete system. This includes a new world-class Arm Immortalis GPU based on our brand-new 5th Generation GPU architecture for ultimate visual experiences, a new cluster of Armv9 CPUs that continue our performance leadership for next-gen artificial intelligence (AI), and new enhancements to deliver more accessible software for the millions of Arm developers.

Google Merges its AI Subsidiaries into Google DeepMind

Google has announced that the company is officially merging its subsidiaries focused on artificial intelligence to form a single group. More specifically, Google Brain and DeepMind companies are now joining forces to become a single unit called Google DeepMind. As Google CEO Sundar Pichai notes: "This group, called Google DeepMind, will bring together two leading research groups in the AI field: the Brain team from Google Research, and DeepMind. Their collective accomplishments in AI over the last decade span AlphaGo, Transformers, word2vec, WaveNet, AlphaFold, sequence to sequence models, distillation, deep reinforcement learning, and distributed systems and software frameworks like TensorFlow and JAX for expressing, training and deploying large scale ML models."

As a CEO of this group, Demis Hassabis, a previous CEO of DeepMind, will work together with Jeff Dean, now promoted to Google's Chief Scientist, where he will report to the Sundar. In the spirit of a new role, Jeff Dean will work as a Chief Scientist at Google Research and Google DeepMind, where he will set the goal for AI research at both units. This corporate restructuring will help the two previously separate teams work together on a single plan and help advance AI capabilities faster. We are eager to see the upcoming developments these teams accomplish.

NVIDIA H100 AI Performance Receives up to 54% Uplift with Optimizations

On Wednesday, the MLCommons team released the MLPerf 3.0 Inference numbers, and there was an exciting submission from NVIDIA. Reportedly, NVIDIA has used software optimization to improve the already staggering performance of its latest H100 GPU by up to 54%. For reference, NVIDIA's H100 GPU first appeared on MLPerf 2.1 back in September of 2022. In just six months, NVIDIA engineers worked on AI optimizations for the MLPerf 3.0 release to find that basic software optimization can catalyze performance increases anywhere from 7-54%. The workloads for measuring the inferencing speed suite included RNN-T speech recognition, 3D U-Net medical imaging, RetinaNet object detection, ResNet-50 object classification, DLRM recommendation, and BERT 99/99.9% natural language processing.

What is interesting is that NVIDIA's submission is a bit modified. There are open and closed categories that vendors have to compete in, where closed is the mathematical equivalent of a neural network. In contrast, the open category is flexible and allows vendors to submit results based on optimizations for their hardware. The closed submission aims to provide an "apples-to-apples" hardware comparison. Given that NVIDIA opted to use the closed category, performance optimization of other vendors such as Intel and Qualcomm are not accounted for here. Still, it is interesting that optimization can lead to a performance increase of up to 54% in NVIDIA's case with its H100 GPU. Another interesting takeaway is that some comparable hardware, like Qualcomm Cloud AI 100, Intel Xeon Platinum 8480+, and NeuChips's ReccAccel N3000, failed to finish all the workloads. This is shown as "X" on the slides made by NVIDIA, stressing the need for proper ML system software support, which is NVIDIA's strength and an extensive marketing claim.
Return to Keyword Browsing
May 21st, 2024 18:02 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts