News Posts matching #Llama 3

Return to Keyword Browsing

Meta Shows Open-Architecture NVIDIA "Blackwell" GB200 System for Data Center

During the Open Compute Project (OCP) Summit 2024, Meta, one of the prime members of the OCP project, showed its NVIDIA "Blackwell" GB200 systems for its massive data centers. We previously covered Microsoft's Azure server rack with GB200 GPUs featuring one-third of the rack space for computing and two-thirds for cooling. A few days later, Google showed off its smaller GB200 system, and today, Meta is showing off its GB200 system—the smallest of the bunch. To train a dense transformer large language model with 405B parameters and a context window of up to 128k tokens, like the Llama 3.1 405B, Meta must redesign its data center infrastructure to run a distributed training job on two 24,000 GPU clusters. That is 48,000 GPUs used for training a single AI model.

Called "Catalina," it is built on the NVIDIA Blackwell platform, emphasizing modularity and adaptability while incorporating the latest NVIDIA GB200 Grace Blackwell Superchip. To address the escalating power requirements of GPUs, Catalina introduces the Orv3, a high-power rack capable of delivering up to 140kW. The comprehensive liquid-cooled setup encompasses a power shelf supporting various components, including a compute tray, switch tray, the Orv3 HPR, Wedge 400 fabric switch with 12.8 Tbps switching capacity, management switch, battery backup, and a rack management controller. Interestingly, Meta also upgraded its "Grand Teton" system for internal usage, such as deep learning recommendation models (DLRMs) and content understanding with AMD Instinct MI300X. Those are used to inference internal models, and MI300X appears to provide the best performance per Dollar for inference. According to Meta, the computational demand stemming from AI will continue to increase exponentially, so more NVIDIA and AMD GPUs is needed, and we can't wait to see what the company builds.

NVIDIA Fine-Tunes Llama3.1 Model to Beat GPT-4o and Claude 3.5 Sonnet with Only 70 Billion Parameters

NVIDIA has officially released its Llama-3.1-Nemotron-70B-Instruct model. Based on META's Llama3.1 70B, the Nemotron model is a large language model customized by NVIDIA in order to improve the helpfulness of LLM-generated responses. NVIDIA uses fine-tuning structured data to steer the model and allow it to generate more helpful responses. With only 70 billion parameters, the model is punching far above its weight class. The company claims that the model is beating the current top models from leading labs like OpenAI's GPT-4o and Anthropic's Claude 3.5 Sonnet, which are the current leaders across AI benchmarks. In evaluations such as Arena Hard, the NVIDIA Llama3.1 Nemotron 70B is scoring 85 points, while GPT-4o and Sonnet 3.5 score 79.3 and 79.2, respectively. Other benchmarks like AlpacaEval and MT-Bench spot NVIDIA also hold the top spot, with 57.6 and 8.98 scores earned. Claude and GPT reach 52.4 / 8.81 and 57.5 / 8.74, just below Nemotron.

This language model underwent training using reinforcement learning from human feedback (RLHF), specifically employing the REINFORCE algorithm. The process involved a reward model based on a large language model architecture and custom preference prompts designed to guide the model's behavior. The training began with a pre-existing instruction-tuned language model as the starting point. It was trained on Llama-3.1-Nemotron-70B-Reward and HelpSteer2-Preference prompts on a Llama-3.1-70B-Instruct model as the initial policy. Running the model locally requires either four 40 GB or two 80 GB VRAM GPUs and 150 GB of free disk space. We managed to take it for a spin on NVIDIA's website to say hello to TechPowerUp readers. The model also passes the infamous "strawberry" test, where it has to count the number of specific letters in a word, however, it appears that it was part of the fine-tuning data as it fails the next test, shown in the image below.

ASUS ROG Updates Virtual Assistant With New AI Module

ASUS Republic of Gamers (ROG) today released a significant update to its bundled Virtual Assistant software (formerly known as Virtual Pet). This new software package comes preinstalled on the ROG Zephyrus G16 gaming laptop and leverages the incredible power of AI to significantly level up the capabilities of the Virtual Assistant, including an intelligent chat and Q&A interface, written document summarization, and voice transcription tools. This update is available on laptop models with AMD Ryzen AI 300 Series processors as a free download via ASUS Live Update.

Intelligent chat support
The Virtual Assistant gives users a leg up when they're using an unfamiliar program or system tool. With a local chat and Q&A feature, even when disconnected from the internet, the Virtual Assistant can help users navigate complicated menus and activate the features and settings they need. For example, if a new user is looking to adjust fan settings, they can request that from the Virtual Assistant, and it will direct them to the appropriate settings menu within the Armoury Crate app. Applications like MyASUS, GlideX, and ProArt Creator Hub are supported, and the chat functionality adds a new layer of support for end users.

SK Hynix Begins Mass-production of 12-layer HBM3E Memory

SK hynix Inc. announced today that it has begun mass production of the world's first 12-layer HBM3E product with 36 GB, the largest capacity of existing HBM to date. The company plans to supply mass-produced products to customers within the year, proving its overwhelming technology once again six months after delivering the HBM3E 8-layer product to customers for the first time in the industry in March this year.

SK hynix is the only company in the world that has developed and supplied the entire HBM lineup from the first generation (HBM1) to the fifth generation (HBM3E), since releasing the world's first HBM in 2013. The company plans to continue its leadership in the AI memory market, addressing the growing needs of AI companies by being the first in the industry to mass-produce the 12-layer HBM3E.

SK hynix Presents Upgraded AiMX Solution at AI Hardware and Edge AI Summit 2024

SK hynix unveiled an enhanced Accelerator-in-Memory based Accelerator (AiMX) card at the AI Hardware & Edge AI Summit 2024 held September 9-12 in San Jose, California. Organized annually by Kisaco Research, the summit brings together representatives from the AI and machine learning ecosystem to share industry breakthroughs and developments. This year's event focused on exploring cost and energy efficiency across the entire technology stack. Marking its fourth appearance at the summit, SK hynix highlighted how its AiM products can boost AI performance across data centers and edge devices.

Booth Highlights: Meet the Upgraded AiMX
In the AI era, high-performance memory products are vital for the smooth operation of LLMs. However, as these LLMs are trained on increasingly larger datasets and continue to expand, there is a growing need for more efficient solutions. SK hynix addresses this demand with its PIM product AiMX, an AI accelerator card that combines multiple GDDR6-AiMs to provide high bandwidth and outstanding energy efficiency. At the AI Hardware & Edge AI Summit 2024, SK hynix presented its updated 32 GB AiMX prototype which offers double the capacity of the original card featured at last year's event. To highlight the new AiMX's advanced processing capabilities in a multi-batch environment, SK hynix held a demonstration of the prototype card with the Llama 3 70B model, an open source LLM. In particular, the demonstration underlined AiMX's ability to serve as a highly effective attention accelerator in data centers.
Return to Keyword Browsing
Nov 21st, 2024 09:01 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts