News Posts matching #LLM

NVIDIA Turbocharges Generative AI Training in MLPerf Benchmarks

Press Release by

Nov 8th, 2023 14:03 Discuss (2 Comments)

NVIDIA's AI platform raised the bar for AI training and high performance computing in the latest MLPerf industry benchmarks. Among many new records and milestones, one in generative AI stands out: NVIDIA Eos - an AI supercomputer powered by a whopping 10,752 NVIDIA H100 Tensor Core GPUs and NVIDIA Quantum-2 InfiniBand networking - completed a training benchmark based on a GPT-3 model with 175 billion parameters trained on one billion tokens in just 3.9 minutes. That's a nearly 3x gain from 10.9 minutes, the record NVIDIA set when the test was introduced less than six months ago.

The benchmark uses a portion of the full GPT-3 data set behind the popular ChatGPT service that, by extrapolation, Eos could now train in just eight days, 73x faster than a prior state-of-the-art system using 512 A100 GPUs. The acceleration in training time reduces costs, saves energy and speeds time-to-market. It's heavy lifting that makes large language models widely available so every business can adopt them with tools like NVIDIA NeMo, a framework for customizing LLMs. In a new generative AI test ‌this round, 1,024 NVIDIA Hopper architecture GPUs completed a training benchmark based on the Stable Diffusion text-to-image model in 2.5 minutes, setting a high bar on this new workload. By adopting these two tests, MLPerf reinforces its leadership as the industry standard for measuring AI performance, since generative AI is the most transformative technology of our time.

Read full story

MediaTek Announces the Dimensity 9300 Flagship SoC, with Big Cores Only

Press Release by

TheLostSwede

Nov 6th, 2023 09:23 Discuss (24 Comments)

MediaTek today announced the Dimensity 9300, its newest flagship mobile chip with a one-of-a-kind All Big Core design. The unique configuration combines extreme performance with MediaTek's industry-leading power efficiency to deliver unmatched user experiences in gaming, video capture and on-device generative AI processing.

"The Dimensity 9300 is MediaTek's most powerful flagship chip yet, bringing a huge boost in raw computing power to flagship smartphones with our groundbreaking All Big Core design," said Joe Chen, President at MediaTek. "This unique architecture, combined with our upgraded on-chip AI Processing Unit, will usher in a new era of generative AI applications as developers push the limits with edge AI and hybrid AI computing capabilities."

Read full story

AMD Reports Third Quarter 2023 Financial Results, Revenue Up 4% YoY

Press Release by

TheLostSwede

Oct 31st, 2023 16:28 Discuss (23 Comments)

AMD (NASDAQ:AMD) today announced revenue for the third quarter of 2023 of $5.8 billion, gross margin of 47%, operating income of $224 million, net income of $299 million and diluted earnings per share of $0.18. On a non-GAAP basis, gross margin was 51%, operating income was $1.3 billion, net income was $1.1 billion and diluted earnings per share was $0.70.

"We delivered strong revenue and earnings growth driven by demand for our Ryzen 7000 series PC processors and record server processor sales," said AMD Chair and CEO Dr. Lisa Su. "Our data center business is on a significant growth trajectory based on the strength of our EPYC CPU portfolio and the ramp of Instinct MI300 accelerator shipments to support multiple deployments with hyperscale, enterprise and AI customers."

Read full story

NVIDIA NeMo: Designers Tap Generative AI for a Chip Assist

Press Release by

T0@st

Oct 31st, 2023 14:30 Discuss (3 Comments)

A research paper released this week describes ways generative AI can assist one of the most complex engineering efforts: designing semiconductors. The work demonstrates how companies in highly specialized fields can train large language models (LLMs) on their internal data to build assistants that increase productivity.

Few pursuits are as challenging as semiconductor design. Under a microscope, a state-of-the-art chip like an NVIDIA H100 Tensor Core GPU (above) looks like a well-planned metropolis, built with tens of billions of transistors, connected on streets 10,000x thinner than a human hair. Multiple engineering teams coordinate for as long as two years to construct one of these digital mega cities. Some groups define the chip's overall architecture, some craft and place a variety of ultra-small circuits, and others test their work. Each job requires specialized methods, software programs and computer languages.

Read full story

Intel Joins the MLCommons AI Safety Working Group

Press Release by

GFreeman

Oct 26th, 2023 09:55 Discuss (0 Comments)

Today, Intel announced it is joining the new MLCommons AI Safety (AIS) working group alongside artificial intelligence experts from industry and academia. As a founding member, Intel will contribute its expertise and knowledge to help create a flexible platform for benchmarks that measure the safety and risk factors of AI tools and models. As testing matures, the standard AI safety benchmarks developed by the working group will become a vital element of our society's approach to AI deployment and safety.

"Intel is committed to advancing AI responsibly and making it accessible to everyone. We approach safety concerns holistically and develop innovations across hardware and software to enable the ecosystem to build trustworthy AI. Due to the ubiquity and pervasiveness of large language models, it is crucial to work across the ecosystem to address safety concerns in the development and deployment of AI. To this end, we're pleased to join the industry in defining the new processes, methods and benchmarks to improve AI everywhere," said Deepak Patil, Intel corporate vice president and general manager, Data Center AI Solutions.

Read full story

SK Hynix's LPDDR5T, World's Fastest Mobile DRAM, Completes Compatibility Validation with Qualcomm

Press Release by

AleksandarK

Oct 25th, 2023 02:30 Discuss (6 Comments)

SK hynix Inc. announced today that it has started commercialization of the LPDDR5T (Low Power Double Data Rate 5 Turbo), the world's fastest DRAM for mobile with 9.6 Gbps speed. The company said that it has obtained the validation that the LPDDR5T is compatible with Qualcomm Technologies' new Snapdragon 8 Gen 3 Mobile Platform, marking the industry's first case for such product to be verified by the U.S. company.

SK hynix has proceeded with the compatibility validation of the LPDDR5T, following the completion of the development in January, with support from Qualcomm Technologies. The completion of the process means that it is compatible with Snapdragon 8 Gen 3. With the validation process with Qualcomm Technologies, a leader in wireless telecommunication products and services, and other major mobile AP (Application Processor) providers successfully completed, SK hynix expects the range of the LPDDR5T adoption to grow rapidly.

Read full story

Qualcomm Snapdragon Elite X SoC for Laptop Leaks: 12 Cores, LPDDR5X Memory, and WiFi7

AleksandarK

Oct 23rd, 2023 02:01 Discuss (32 Comments)

Thanks to the information from Windows Report, we have received numerous details regarding Qualcomm's upcoming Snapdragon Elite X chip for laptops. The Snapdragon Elite X SoC is built on top of Nuvia-derived Oryon cores, which Qualcomm put 12 off in the SoC. While we don't know their base frequencies, the all-core boost reaches 3.8 GHz. The SoC can reach up to 4.3 GHz on single and dual-core boosting. However, the slide notes that this is all pure "big" core configuration of the SoC, so no big.LITTLE design is done. The GPU part of Snapdragon Elite X is still based on Qualcomm's Adreno IP; however, the performance figures are up significantly to reach 4.6 TeraFLOPS of supposedly FP32 single-precision power. Accompanying the CPU and GPU, there are dedicated AI and image processing accelerators, like Hexagon Neural Processing Unit (NPU), which can process 45 trillion operations per second (TOPS). For the camera, the Spectra Image Sensor Processor (ISP) is there to support up to 4K HDR video capture on a dual 36 MP or a single 64 MP camera setup.

The SoC supports LPDDR5X memory running at 8533 MT/s and a maximum capacity of 64 GB. Apparently, the memory controller is an 8-channel one with a 16-bit width and a maximum bandwidth of 136 GB/s. Snapdragon Elite X has PCIe 4.0 and supports UFS 4.0 for outside connection. All of this is packed on a die manufactured by TSMC on a 4 nm node. In addition to marketing excellent performance compared to x86 solutions, Qualcomm also advertises the SoC as power efficient. The slide notes that it uses 1/3 of the power at the same peak PC performance of x86 offerings. It is also interesting to note that the package will support WiFi7 and Bluetooth 5.4. Officially coming in 2024, the Snapdragon Elite X will have to compete with Intel's Meteor Lake and/or Arrow Lake, in addition to AMD Strix Point.

Read full story

AMD, Arm, Intel, Meta, Microsoft, NVIDIA, and Qualcomm Standardize Next-Generation Narrow Precision Data Formats for AI

Press Release by

AleksandarK

Oct 18th, 2023 09:37 Discuss (9 Comments)

Realizing the full potential of next-generation deep learning requires highly efficient AI infrastructure. For a computing platform to be scalable and cost efficient, optimizing every layer of the AI stack, from algorithms to hardware, is essential. Advances in narrow-precision AI data formats and associated optimized algorithms have been pivotal to this journey, allowing the industry to transition from traditional 32-bit floating point precision to presently only 8 bits of precision (i.e. OCP FP8).

Narrower formats allow silicon to execute more efficient AI calculations per clock cycle, which accelerates model training and inference times. AI models take up less space, which means they require fewer data fetches from memory, and can run with better performance and efficiency. Additionally, fewer bit transfers reduces data movement over the interconnect, which can enhance application performance or cut network costs.

Read full story

Striking Performance: LLMs up to 4x Faster on GeForce RTX With TensorRT-LLM

Press Release by

T0@st

Oct 17th, 2023 12:40 Discuss (0 Comments)

Generative AI is one of the most important trends in the history of personal computing, bringing advancements to gaming, creativity, video, productivity, development and more. And GeForce RTX and NVIDIA RTX GPUs, which are packed with dedicated AI processors called Tensor Cores, are bringing the power of generative AI natively to more than 100 million Windows PCs and workstations.

Today, generative AI on PC is getting up to 4x faster via TensorRT-LLM for Windows, an open-source library that accelerates inference performance for the latest AI large language models, like Llama 2 and Code Llama. This follows the announcement of TensorRT-LLM for data centers last month. NVIDIA has also released tools to help developers accelerate their LLMs, including scripts that optimize custom models with TensorRT-LLM, TensorRT-optimized open-source models and a developer reference project that showcases both the speed and quality of LLM responses.

Read full story

Baidu Launches ERNIE 4.0 Foundation Model, Leading a New Wave of AI-Native Applications

Press Release by

TheLostSwede

Oct 17th, 2023 06:41 Discuss (2 Comments)

Baidu, Inc., a leading AI company with strong Internet foundation, today hosted its annual flagship technology conference Baidu World 2023 in Beijing, marking the conference's return to an offline format after four years. With the theme "Prompt the World," this year's Baidu World conference saw Baidu launch ERNIE 4.0, Baidu's next-generation and most powerful foundation model offering drastically enhanced core AI capabilities. Baidu also showcased some of its most popular applications, solutions, and products re-built around the company's state-of-the-art generative AI.

"ERNIE 4.0 has achieved a full upgrade with drastically improved performance in understanding, generation, reasoning, and memory," Robin Li, Co-founder, Chairman and CEO of Baidu, said at the event. "These four core capabilities form the foundation of AI-native applications and have now unleashed unlimited opportunities for new innovations."

Read full story

ASUS Showcases Cutting-Edge Cloud Solutions at OCP Global Summit 2023

Press Release by

TheLostSwede

Oct 11th, 2023 04:05 Discuss (0 Comments)

ASUS, a global infrastructure solution provider, is excited to announce its participation in the 2023 OCP Global Summit, which is taking place from October 17-19, 2023, at the San Jose McEnery Convention Center. The prestigious annual event brings together industry leaders, innovators and decision-makers from around the world to explore and discuss the latest advancements in open infrastructure and cloud technologies, providing a perfect stage for ASUS to unveil its latest cutting-edge products.

The ASUS theme for the OCP Global Summit is Solutions beyond limits—ASUS empowers AI, cloud, telco and more. We will showcase an array of products:

Read full story

Broadcom Partners with Google Cloud to Strengthen Gen AI-Powered Cybersecurity

Press Release by

TheLostSwede

Sep 26th, 2023 10:11 Discuss (1 Comment)

Symantec, a division of Broadcom Inc., is partnering with Google Cloud to embed generative AI (gen AI) into the Symantec Security platform in a phased rollout that will give customers a significant technical edge for detecting, understanding, and remediating sophisticated cyber attacks.

Symantec is leveraging the Google Cloud Security AI Workbench and security-specific large language model (LLM)--Sec-PaLM 2-across its portfolio to enable natural language interfaces and generate more comprehensive and easy-to-understand threat analyses. With Security AI Workbench-powered summarization of complex incidents and alignment to MITRE ATT&CK context, security operations center (SOC) analysts of all levels can better understand threats and be able to respond faster. That, in turn, translates into greater security and higher SOC productivity.

Read full story

Useful Sensors Launches AI-In-A-Box Module, a Low Cost Offline Solution

Press Release by

T0@st

Sep 25th, 2023 12:48 Discuss (3 Comments)

Useful Sensors, an AI-focused start-up, today launched the world's first low-cost, off-the-shelf AI module to enable intuitive, natural language interaction with electronic devices, locally and privately, with no need for an account or internet connection. The new AI-In-A-Box module can answer queries and solve problems in a way similar to well-known AI tools based on a large language model (LLM). But thanks to compression and acceleration technologies developed by Useful Sensors, the module hosts its LLM file locally, enabling its low-cost microprocessor to understand and respond instantly to spoken natural language queries or commands without reference to a data center.

Disconnected from the internet, the AI-In-A-Box module definitively eliminates user concerns about privacy, snooping, or dependence on third-party cloud services that are prevalent with conventional LLM-based AI products and services marketed by large technology companies. The AI-In-A-Box module is available to buy now at CrowdSupply, priced at $299.

Read full story

UPMEM Raises €7M to Revolutionize AI and Analytics Processing

Press Release by

TheLostSwede

Sep 12th, 2023 11:48 Discuss (0 Comments)

UPMEM, a fabless semiconductor startup has raised €4.1 M equity from the European Innovation Council (EIC) Fund and Venture Capitalists (Partech, Western Digital Capital, C4 Ventures…), and a €2.5M grant from the EIC. Founded by Fabrice Devaux and Gilles Hamou, the company is pioneering ultra-efficient Processing In Memory (PIM) accelerators to tackle the significant challenge of compute efficiency for AI and big data applications.

UPMEM's PIM solution, integrating UPMEM's first commercial-grade PIM chip on the market, is now available to cloud markets across the globe (US, Asia...) to provide the most cost-effective and energy-efficient solutions for AI and analytics applications in data centers and at the edge, such as large language models (LLM e.g. GPT), genomics, large analytics.

Read full story

d-Matrix Announces $110 Million in Funding for Corsair Inference Compute Platform

Press Release by

T0@st

Sep 10th, 2023 11:26 Discuss (14 Comments)

d-Matrix, the leader in high-efficiency generative AI compute for data centers, has closed $110 million in a Series-B funding round led by Singapore-based global investment firm Temasek. The goal of the fundraise is to enable d-Matrix to begin commercializing Corsair, the world's first Digital-In Memory Compute (DIMC), chiplet-based inference compute platform, after the successful launches of its prior Nighthawk, Jayhawk-I and Jayhawk II chiplets.

d-Matrix's recent silicon announcement, Jayhawk II, is the latest example of how the company is working to fundamentally change the physics of memory-bound compute workloads common in generative AI and large language model (LLM) applications. With the explosion of this revolutionary technology over the past nine months, there has never been a greater need to overcome the memory bottleneck and current technology approaches that limit performance and drive up AI compute costs.

Read full story

NVIDIA Paves the Way for Natural Speech Conversations with Game NPCs

btarunr

Aug 22nd, 2023 09:01 Discuss (26 Comments)

Imagine you're in a vast RPG fill with hundreds, if not thousands, of interactive NPCs (non-playable characters). All current RPGs conduct your interactions with them over a bunch of pre-defined statement selections, where you choose among a bunch of text-based options on the screen, which elicits a certain response from the NPC. This feels very unnatural and railroaded, but NVIDIA plans to change this. With ACE (character engine) and NeMo SteerLM (a natural language model), NVIDIA wants to make voice based interactions with NPCs possible. This is a very necessary stepping stone toward the near-future, where NPCs will be backed by large GPTs letting you have lengthy conversations with them.

The way this works is, the player gives an NPC a natural language voice input. A speech-to-text engine and LLM process the voice input, and generate a natural language response. Omniverse Audio2Face is leveraged to create the NPC's response in real time. Announcing this Gamescom, NVIDIA's new NeMo SteerLLM adds life to the part of ACE that processes the natural voice input, and based on the kind of personality traits the game developer gives an NPC, generates responses with varying degree of creativity, humor, and toxicity among other attributes.

OpenAI Degrades GPT-4 Performance While GPT-3.5 Gets Better

AleksandarK

Jul 19th, 2023 09:17 Discuss (9 Comments)

When OpenAI announced its GPT-4 model, it first became a part of ChatGPT, behind the paywall for premium users. The GPT-4 is the latest installment in the Generative Pretrained Transformer (GPT) Large Language Models (LLMs). The GPT-4 aims to be a more capable version than the GPT-3.5 that powered ChatGPT at first, which was capable once it launched. However, it seems like the performance of GPT-4 has been steadily dropping since its introduction. Many users noted the regression, and today we have researchers from Stanford University and UC Berkeley, who benchmarked the GPT-4 performance in March 2023, and the model's performance in June 2023 in tasks like solving math problems, visual reasoning, code generation, and answering sensitive questions.

The results? The paper shows that GPT-4 performance has been significantly degraded in all the tasks. This could be attributed to improving stability, lowering the massive compute demand, and much more. What is unexpected, GPT-3.5 experienced a significant uplift in the same period. Below, you can see the examples that were benchmarked by the researchers, which also compare GTP-4 and GPT-3.5 performance in all cases.

NVIDIA Espouses Generative AI for Improved Productivity Across Industries

Press Release by

T0@st

Jul 17th, 2023 09:41 Discuss (1 Comment)

A watershed moment on Nov. 22, 2022, was mostly virtual, yet it shook the foundations of nearly every industry on the planet. On that day, OpenAI released ChatGPT, the most advanced artificial intelligence chatbot ever developed. This set off demand for generative AI applications that help businesses become more efficient, from providing consumers with answers to their questions to accelerating the work of researchers as they seek scientific breakthroughs, and much, much more.

Businesses that previously dabbled in AI are now rushing to adopt and deploy the latest applications. Generative AI—the ability of algorithms to create new text, images, sounds, animations, 3D models and even computer code—is moving at warp speed, transforming the way people work and play. By employing large language models (LLMs) to handle queries, the technology can dramatically reduce the time people devote to manual tasks like searching for and compiling information.

Read full story

AMD CEO Lisa Su Notes: AI to Dominate Chip Design

AleksandarK

Jul 11th, 2023 03:43 Discuss (37 Comments)

Artificial intelligence (AI) has emerged as a transformative force in chip design, with recent examples from China and the United States showcasing its potential. Jensen Huang, CEO of Nvidia, believes that AI can empower individuals to become programmers, while Lisa Su, CEO of AMD, predicts an era where AI dominates chip design. During the 2023 World Artificial Intelligence Conference (WAIC) in Shanghai, Su emphasized the importance of interdisciplinary collaboration for the next generation of chip designers. To excel in this field, engineers must possess a holistic understanding of hardware, software, and algorithms, enabling them to create superior chip designs that meet system usage, customer deployment, and application requirements.

The integration of AI into chip design processes has gained momentum, fueled by the AI revolution catalyzed by large language models (LLMs). Both Huang and Mark Papermaster, CTO of AMD, acknowledge the benefits of AI in accelerating computation and facilitating chip design. AMD has already started leveraging AI in semiconductor design, testing, and verification, with plans to expand its use of generative AI in chip design applications. Companies are now actively exploring the fusion of AI technology with Electronic Design Automation (EDA) tools to streamline complex tasks and minimize manual intervention in chip design. Despite limited data and accuracy challenges, the "EDA+AI" approach holds great promise. For instance, Synopsys has invested significantly in AI tool research and recently launched Synopsys.ai, the industry's first end-to-end AI-driven EDA solution. This comprehensive solution empowers developers to harness AI at every stage of chip development, from system architecture and design to manufacturing, marking a significant leap forward in AI's integration into chip design workflows.

Oracle Fusion Cloud HCM Enhanced with Generative AI, Projected to Boost HR Productivity

Press Release by

T0@st

Jun 28th, 2023 10:57 Discuss (2 Comments)

Oracle today announced the addition of generative AI-powered capabilities within Oracle Fusion Cloud Human Capital Management (HCM). Supported by the Oracle Cloud Infrastructure (OCI) generative AI service, the new capabilities are embedded in existing HR processes to drive faster business value, improve productivity, enhance the candidate and employee experience, and streamline HR processes.

"Generative AI is boosting productivity and unlocking a new world of skills, ideas, and creativity that can have an immediate impact in the workplace," said Chris Leone, executive vice president, applications development, Oracle Cloud HCM. "With the ability to summarize, author, and recommend content, generative AI helps to reduce friction as employees complete important HR functions. For example, with the new embedded generative AI capabilities in Oracle Cloud HCM, our customers will be able to take advantage of large language models to drastically reduce the time required to complete tasks, improve the employee experience, enhance the accuracy of workforce insights, and ultimately increase business value."

Read full story

NVIDIA Cambridge-1 AI Supercomputer Hooked up to DGX Cloud Platform

Press Release by

T0@st

May 23rd, 2023 08:32 Discuss (0 Comments)

Scientific researchers need massive computational resources that can support exploration wherever it happens. Whether they're conducting groundbreaking pharmaceutical research, exploring alternative energy sources or discovering new ways to prevent financial fraud, accessible state-of-the-art AI computing resources are key to driving innovation. This new model of computing can solve the challenges of generative AI and power the next wave of innovation. Cambridge-1, a supercomputer NVIDIA launched in the U.K. during the pandemic, has powered discoveries from some of the country's top healthcare researchers. The system is now becoming part of NVIDIA DGX Cloud to accelerate the pace of scientific innovation and discovery - across almost every industry.

As a cloud-based resource, it will broaden access to AI supercomputing for researchers in climate science, autonomous machines, worker safety and other areas, delivered with the simplicity and speed of the cloud, ideally located for the U.K. and European access. DGX Cloud is a multinode AI training service that makes it possible for any enterprise to access leading-edge supercomputing resources from a browser. The original Cambridge-1 infrastructure included 80 NVIDIA DGX systems; now it will join with DGX Cloud, to allow customers access to world-class infrastructure.

Read full story

ASUS Demonstrates Liquid Cooling and AI Solutions at ISC High Performance 2023

Press Release by

TheLostSwede

May 22nd, 2023 09:52 Discuss (0 Comments)

ASUS today announced a showcase of the latest HPC solutions to empower innovation and push the boundaries of supercomputing, at ISC High Performance 2023 in Hamburg, Germany on May 21-25, 2023. The ASUS exhibition, at booth H813, will reveal the latest supercomputing advances, including liquid-cooling and AI solutions, as well as outlining a slew of sustainability breakthroughs - plus a whole lot more besides.

Comprehensive Liquid-Cooling Solutions
ASUS is working with Submer, the industry-leading liquid-cooling provider to demonstrate immersion-cooling solutions at ISC High Performance 2023, focused on ASUS RS720-E11-IM - the Intel -based 2U4N server that leverages our trusted legacy server architecture and popular features to create a compact new design. This fresh outlook improves the accessibility on I/O ports, storage and cable routing, and strengthens the structure to allow the server to be placed vertically in the tank, with durability assured.

Read full story

NVIDIA A800 China-Tailored GPU Performance within 70% of A100

AleksandarK

May 8th, 2023 02:24 Discuss (8 Comments)

The recent growth in demand for training Large Language Models (LLMs) like Generative Pre-trained Transformer (GPT) has sparked the interest of many companies to invest in GPU solutions that are used to train these models. However, countries like China have struggled with US sanctions, and NVIDIA has to create custom models that meet US export regulations. Carrying two GPUs, H800 and A800, they represent cut-down versions of the original H100 and A100, respectively. We reported about H800; however, it remained as mysterious as A800 that we are talking about today. Thanks to MyDrivers, we have information that the A800 GPU performance is within 70% of the regular A100.

The regular A100 GPU manages 9.7 TeraFLOPs of FP64, 19.5 TeraFLOPS of FP64 Tensor, and up to 624 BF16/FP16 TeraFLOPS with sparsity. A rough napkin math would suggest that 70% performance of the original (a 30% cut) would equal 6.8 TeraFLOPs of FP64 precision, 13.7 TeraFLOPs of FP64 Tensor, and 437 BF16/FP16 TeraFLOPs with sparsity. MyDrivers notes that A800 can be had for 100,000 Yuan, translating to about 14,462 USD at the time of writing. This is not the most capable GPU that Chinese companies can acquire, as H800 exists. However, we don't have any information about its performance for now.

NVIDIA H100 Compared to A100 for Training GPT Large Language Models

AleksandarK

Apr 28th, 2023 02:59 Discuss (2 Comments)

NVIDIA's H100 has recently become available to use via Cloud Service Providers (CSPs), and it was only a matter of time before someone decided to benchmark its performance and compare it to the previous generation's A100 GPU. Today, thanks to the benchmarks of MosaicML, a startup company led by the ex-CEO of Nervana and GM of Artificial Intelligence (AI) at Intel, Naveen Rao, we have some comparison between these two GPUs with a fascinating insight about the cost factor. Firstly, MosaicML has taken Generative Pre-trained Transformer (GPT) models of various sizes and trained them using bfloat16 and FP8 Floating Point precision formats. All training occurred on CoreWeave cloud GPU instances.

Regarding performance, the NVIDIA H100 GPU achieved anywhere from 2.2x to 3.3x speedup. However, an interesting finding emerges when comparing the cost of running these GPUs in the cloud. CoreWeave prices the H100 SXM GPUs at $4.76/hr/GPU, while the A100 80 GB SXM gets $2.21/hr/GPU pricing. While the H100 is 2.2x more expensive, the performance makes it up, resulting in less time to train a model and a lower price for the training process. This inherently makes H100 more attractive for researchers and companies wanting to train Large Language Models (LLMs) and makes choosing the newer GPU more viable, despite the increased cost. Below, you can see tables of comparison between two GPUs in training time, speedup, and cost of training.

NVIDIA Wants to Set Guardrails for Large Language Models Such as ChatGPT

AleksandarK

Apr 25th, 2023 10:55 Discuss (25 Comments)

ChatGPT has surged in popularity over a few months, and usage of this software has been regarded as one of the fastest-growing apps ever. Based on a Large Language Model (LLM) called GPT-3.5/4, ChatGPT uses user input to form answers based on its extensive database used in the training process. Having billions of parameters, the GPT models used for GPT can give precise answers; however, sometimes, these models hallucinate. Given a question about a non-existing topic/subject, ChatGPT can induce hallucination and make up the information. To prevent these hallucinations, NVIDIA, the maker of GPUs used for training and inferencing LLMs, has released a software library to put AI in place, called NeMo Guardrails.

As the NVIDIA repository states: "NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems. Guardrails (or "rails" for short) are specific ways of controlling the output of a large language model, such as not talking about politics, responding in a particular way to specific user requests, following a predefined dialog path, using a particular language style, extracting structured data, and more." These guardrails are easily programmable and can stop LLMs from outputting unwanted content. For a company that invests heavily in the hardware and software landscape, this launch is a logical decision to keep the lead in setting the infrastructure for future LLM-based applications.

Return to Keyword Browsing

News Posts matching #LLM

NVIDIA Turbocharges Generative AI Training in MLPerf Benchmarks

MediaTek Announces the Dimensity 9300 Flagship SoC, with Big Cores Only

AMD Reports Third Quarter 2023 Financial Results, Revenue Up 4% YoY

NVIDIA NeMo: Designers Tap Generative AI for a Chip Assist

Intel Joins the MLCommons AI Safety Working Group

SK Hynix's LPDDR5T, World's Fastest Mobile DRAM, Completes Compatibility Validation with Qualcomm

Qualcomm Snapdragon Elite X SoC for Laptop Leaks: 12 Cores, LPDDR5X Memory, and WiFi7

AMD, Arm, Intel, Meta, Microsoft, NVIDIA, and Qualcomm Standardize Next-Generation Narrow Precision Data Formats for AI

Striking Performance: LLMs up to 4x Faster on GeForce RTX With TensorRT-LLM

Baidu Launches ERNIE 4.0 Foundation Model, Leading a New Wave of AI-Native Applications

ASUS Showcases Cutting-Edge Cloud Solutions at OCP Global Summit 2023

Broadcom Partners with Google Cloud to Strengthen Gen AI-Powered Cybersecurity

Useful Sensors Launches AI-In-A-Box Module, a Low Cost Offline Solution

UPMEM Raises €7M to Revolutionize AI and Analytics Processing

d-Matrix Announces $110 Million in Funding for Corsair Inference Compute Platform

NVIDIA Paves the Way for Natural Speech Conversations with Game NPCs

OpenAI Degrades GPT-4 Performance While GPT-3.5 Gets Better

NVIDIA Espouses Generative AI for Improved Productivity Across Industries

AMD CEO Lisa Su Notes: AI to Dominate Chip Design

Oracle Fusion Cloud HCM Enhanced with Generative AI, Projected to Boost HR Productivity

NVIDIA Cambridge-1 AI Supercomputer Hooked up to DGX Cloud Platform

ASUS Demonstrates Liquid Cooling and AI Solutions at ISC High Performance 2023

NVIDIA A800 China-Tailored GPU Performance within 70% of A100

NVIDIA H100 Compared to A100 for Training GPT Large Language Models

NVIDIA Wants to Set Guardrails for Large Language Models Such as ChatGPT

Latest GPU Drivers

New Forum Posts

Popular Reviews

Controversial News Posts