News Posts matching #Llama 3

IBM Introduces New Multi-Modal and Reasoning AI "Granite" Models Built for the Enterprise

Press Release by

Wednesday, 13:13 Discuss (0 Comments)

IBM today debuted the next generation of its Granite large language model (LLM) family, Granite 3.2, in a continued effort to deliver small, efficient, practical enterprise AI for real-world impact. All Granite 3.2 models are available under the permissive Apache 2.0 license on Hugging Face. Select models are available today on IBM watsonx.ai, Ollama, Replicate, and LM Studio, and expected soon in RHEL AI 1.5 - bringing advanced capabilities to businesses and the open-source community.

Read full story

AMD & Nexa AI Reveal NexaQuant's Improvement of DeepSeek R1 Distill 4-bit Capabilities

Press Release by

T0@st

Feb 18th, 2025 10:01 Discuss (3 Comments)

Nexa AI, today, announced NexaQuants of two DeepSeek R1 Distills: The DeepSeek R1 Distill Qwen 1.5B and DeepSeek R1 Distill Llama 8B. Popular quantization methods like the llama.cpp based Q4 K M allow large language models to significantly reduce their memory footprint and typically offer low perplexity loss for dense models as a tradeoff. However, even low perplexity loss can result in a reasoning capability hit for (dense or MoE) models that use Chain of Thought traces. Nexa AI has stated that NexaQuants are able to recover this reasoning capability loss (compared to the full 16-bit precision) while keeping the 4-bit quantization and all the while retaining the performance advantage. Benchmarks provided by Nexa AI can be seen below.

We can see that the Q4 K M quantized DeepSeek R1 distills score slightly less (except for the AIME24 bench on Llama 3 8b distill, which scores significantly lower) in LLM benchmarks like GPQA and AIME24 compared to their full 16-bit counter parts. Moving to a Q6 or Q8 quantization would be one way to fix this problem - but would result in the model becoming slightly slower to run and requiring more memory. Nexa AI has stated that NexaQuants use a proprietary quantization method to recover the loss while keeping the quantization at 4-bits. This means users can theoretically get the best of both worlds: accuracy and speed.

Read full story

US Investigates Possible "Singapore" Loophole in China's Access to NVIDIA GPUs

AleksandarK

Jan 31st, 2025 13:50 Discuss (36 Comments)

Today, Bloomberg reported that the US government under Trump administration is probing whether Chinese AI company DeepSeek circumvented export restrictions to acquire advanced NVIDIA GPUs through Singaporean intermediaries. The investigation follows concerns that DeepSeek's AI model, R1—reportedly rivaling leading systems from OpenAI and Google—may have been trained using restricted hardware that is blocked from exporting to China. Singapore's role in NVIDIA's global sales has surged, with the nation accounting for 22% of the chipmaker's revenue in Q3 FY2025, up from 9% in Q3 FY2023. This spike coincides with tightened US export controls on AI chips to China, prompting speculation that Singapore serves as a pipe for Chinese firms to access high-end GPUs like the H100, which cannot be sold directly to China.

DeepSeek has not disclosed hardware details for R1 but revealed its earlier V3 model was trained using 2,048 H800 GPUs (2.8 million GPU hours), achieving efficiency surpassing Meta's Llama 3, which required 30.8 million GPU hours. Analysts suggest R1's performance implies even more powerful infrastructure, potentially involving restricted chips. US authorities, including the White House and FBI, are examining whether third parties in Singapore facilitated the transfer of controlled GPUs to DeepSeek. A well-known semiconductor analyst firm, SemiAnalysis, believes that DeepSeek acquired around 50,000 NVIDIA Hopper GPUs, which includes a mix of H100, H800, and H20. NVIDIA clarified that its reported Singapore revenue reflects "bill to" customer locations, not final destinations, stating most products are routed to the US or Western markets.

Read full story

Meta Shows Open-Architecture NVIDIA "Blackwell" GB200 System for Data Center

AleksandarK

Oct 18th, 2024 10:52 Discuss (11 Comments)

During the Open Compute Project (OCP) Summit 2024, Meta, one of the prime members of the OCP project, showed its NVIDIA "Blackwell" GB200 systems for its massive data centers. We previously covered Microsoft's Azure server rack with GB200 GPUs featuring one-third of the rack space for computing and two-thirds for cooling. A few days later, Google showed off its smaller GB200 system, and today, Meta is showing off its GB200 system—the smallest of the bunch. To train a dense transformer large language model with 405B parameters and a context window of up to 128k tokens, like the Llama 3.1 405B, Meta must redesign its data center infrastructure to run a distributed training job on two 24,000 GPU clusters. That is 48,000 GPUs used for training a single AI model.

Called "Catalina," it is built on the NVIDIA Blackwell platform, emphasizing modularity and adaptability while incorporating the latest NVIDIA GB200 Grace Blackwell Superchip. To address the escalating power requirements of GPUs, Catalina introduces the Orv3, a high-power rack capable of delivering up to 140kW. The comprehensive liquid-cooled setup encompasses a power shelf supporting various components, including a compute tray, switch tray, the Orv3 HPR, Wedge 400 fabric switch with 12.8 Tbps switching capacity, management switch, battery backup, and a rack management controller. Interestingly, Meta also upgraded its "Grand Teton" system for internal usage, such as deep learning recommendation models (DLRMs) and content understanding with AMD Instinct MI300X. Those are used to inference internal models, and MI300X appears to provide the best performance per Dollar for inference. According to Meta, the computational demand stemming from AI will continue to increase exponentially, so more NVIDIA and AMD GPUs is needed, and we can't wait to see what the company builds.

NVIDIA Fine-Tunes Llama3.1 Model to Beat GPT-4o and Claude 3.5 Sonnet with Only 70 Billion Parameters

AleksandarK

Oct 17th, 2024 04:21 Discuss (31 Comments)

NVIDIA has officially released its Llama-3.1-Nemotron-70B-Instruct model. Based on META's Llama3.1 70B, the Nemotron model is a large language model customized by NVIDIA in order to improve the helpfulness of LLM-generated responses. NVIDIA uses fine-tuning structured data to steer the model and allow it to generate more helpful responses. With only 70 billion parameters, the model is punching far above its weight class. The company claims that the model is beating the current top models from leading labs like OpenAI's GPT-4o and Anthropic's Claude 3.5 Sonnet, which are the current leaders across AI benchmarks. In evaluations such as Arena Hard, the NVIDIA Llama3.1 Nemotron 70B is scoring 85 points, while GPT-4o and Sonnet 3.5 score 79.3 and 79.2, respectively. Other benchmarks like AlpacaEval and MT-Bench spot NVIDIA also hold the top spot, with 57.6 and 8.98 scores earned. Claude and GPT reach 52.4 / 8.81 and 57.5 / 8.74, just below Nemotron.

This language model underwent training using reinforcement learning from human feedback (RLHF), specifically employing the REINFORCE algorithm. The process involved a reward model based on a large language model architecture and custom preference prompts designed to guide the model's behavior. The training began with a pre-existing instruction-tuned language model as the starting point. It was trained on Llama-3.1-Nemotron-70B-Reward and HelpSteer2-Preference prompts on a Llama-3.1-70B-Instruct model as the initial policy. Running the model locally requires either four 40 GB or two 80 GB VRAM GPUs and 150 GB of free disk space. We managed to take it for a spin on NVIDIA's website to say hello to TechPowerUp readers. The model also passes the infamous "strawberry" test, where it has to count the number of specific letters in a word, however, it appears that it was part of the fine-tuning data as it fails the next test, shown in the image below.

ASUS ROG Updates Virtual Assistant With New AI Module

Press Release by

GFreeman

Sep 30th, 2024 12:56 Discuss (0 Comments)

ASUS Republic of Gamers (ROG) today released a significant update to its bundled Virtual Assistant software (formerly known as Virtual Pet). This new software package comes preinstalled on the ROG Zephyrus G16 gaming laptop and leverages the incredible power of AI to significantly level up the capabilities of the Virtual Assistant, including an intelligent chat and Q&A interface, written document summarization, and voice transcription tools. This update is available on laptop models with AMD Ryzen AI 300 Series processors as a free download via ASUS Live Update.

Intelligent chat support
The Virtual Assistant gives users a leg up when they're using an unfamiliar program or system tool. With a local chat and Q&A feature, even when disconnected from the internet, the Virtual Assistant can help users navigate complicated menus and activate the features and settings they need. For example, if a new user is looking to adjust fan settings, they can request that from the Virtual Assistant, and it will direct them to the appropriate settings menu within the Armoury Crate app. Applications like MyASUS, GlideX, and ProArt Creator Hub are supported, and the chat functionality adds a new layer of support for end users.

Read full story

SK Hynix Begins Mass-production of 12-layer HBM3E Memory

Press Release by

btarunr

Sep 25th, 2024 22:43 Discuss (0 Comments)

SK hynix Inc. announced today that it has begun mass production of the world's first 12-layer HBM3E product with 36 GB, the largest capacity of existing HBM to date. The company plans to supply mass-produced products to customers within the year, proving its overwhelming technology once again six months after delivering the HBM3E 8-layer product to customers for the first time in the industry in March this year.

SK hynix is the only company in the world that has developed and supplied the entire HBM lineup from the first generation (HBM1) to the fifth generation (HBM3E), since releasing the world's first HBM in 2013. The company plans to continue its leadership in the AI memory market, addressing the growing needs of AI companies by being the first in the industry to mass-produce the 12-layer HBM3E.

Read full story

SK hynix Presents Upgraded AiMX Solution at AI Hardware and Edge AI Summit 2024

Press Release by

GFreeman

Sep 13th, 2024 02:31 Discuss (6 Comments)

SK hynix unveiled an enhanced Accelerator-in-Memory based Accelerator (AiMX) card at the AI Hardware & Edge AI Summit 2024 held September 9-12 in San Jose, California. Organized annually by Kisaco Research, the summit brings together representatives from the AI and machine learning ecosystem to share industry breakthroughs and developments. This year's event focused on exploring cost and energy efficiency across the entire technology stack. Marking its fourth appearance at the summit, SK hynix highlighted how its AiM products can boost AI performance across data centers and edge devices.

Booth Highlights: Meet the Upgraded AiMX
In the AI era, high-performance memory products are vital for the smooth operation of LLMs. However, as these LLMs are trained on increasingly larger datasets and continue to expand, there is a growing need for more efficient solutions. SK hynix addresses this demand with its PIM product AiMX, an AI accelerator card that combines multiple GDDR6-AiMs to provide high bandwidth and outstanding energy efficiency. At the AI Hardware & Edge AI Summit 2024, SK hynix presented its updated 32 GB AiMX prototype which offers double the capacity of the original card featured at last year's event. To highlight the new AiMX's advanced processing capabilities in a multi-batch environment, SK hynix held a demonstration of the prototype card with the Llama 3 70B model, an open source LLM. In particular, the demonstration underlined AiMX's ability to serve as a highly effective attention accelerator in data centers.

Read full story

Return to Keyword Browsing

News Posts matching #Llama 3

IBM Introduces New Multi-Modal and Reasoning AI "Granite" Models Built for the Enterprise

AMD & Nexa AI Reveal NexaQuant's Improvement of DeepSeek R1 Distill 4-bit Capabilities

US Investigates Possible "Singapore" Loophole in China's Access to NVIDIA GPUs

Meta Shows Open-Architecture NVIDIA "Blackwell" GB200 System for Data Center

NVIDIA Fine-Tunes Llama3.1 Model to Beat GPT-4o and Claude 3.5 Sonnet with Only 70 Billion Parameters

ASUS ROG Updates Virtual Assistant With New AI Module

SK Hynix Begins Mass-production of 12-layer HBM3E Memory

SK hynix Presents Upgraded AiMX Solution at AI Hardware and Edge AI Summit 2024

Latest GPU Drivers

New Forum Posts

Popular Reviews

Controversial News Posts