News Posts matching #LLM

Return to Keyword Browsing

Seagate Anticipates Cloud Storage Growth due to AI-Driven Data Creation

According to a recent, global Recon Analytics survey commissioned by Seagate Technology, business leaders from across 15 industry sectors and 10 countries expect that adoption of artificial intelligence (AI) applications will generate unprecedented volumes of data, driving a boom in demand for data storage, in particular cloud-based storage. With hard drives delivering scalability relative to terabyte-per-dollar cost efficiencies, cloud service providers rely on hard drives to store mass quantities of data.

Recently, analyst firm IDC estimated that 89% of data stored by leading cloud service providers is stored on hard drives. Now, according to this Recon Analytics study, nearly two-thirds of respondents (61%) from companies that use cloud as their leading storage medium expect their cloud-based storage to grow by more than 100% over the next 3 years. "The survey results generally point to a coming surge in demand for data storage, with hard drives emerging as the clear winner," remarked Roger Entner, founder and lead analyst of Recon Analytics. "When you consider that the business leaders we surveyed intend to store more and more of this AI-driven data in the cloud, it appears that cloud services are well-positioned to ride a second growth wave."

NVIDIA NeMo AI Guardrails Upgraded with Latest NIM Microservices

AI agents are poised to transform productivity for the world's billion knowledge workers with "knowledge robots" that can accomplish a variety of tasks. To develop AI agents, enterprises need to address critical concerns like trust, safety, security and compliance. New NVIDIA NIM microservices for AI guardrails—part of the NVIDIA NeMo Guardrails collection of software tools—are portable, optimized inference microservices that help companies improve the safety, precision and scalability of their generative AI applications.

Central to the orchestration of the microservices is NeMo Guardrails, part of the NVIDIA NeMo platform for curating, customizing and guardrailing AI. NeMo Guardrails helps developers integrate and manage AI guardrails in large language model (LLM) applications. Industry leaders Amdocs, Cerence AI and Lowe's are among those using NeMo Guardrails to safeguard AI applications. Developers can use the NIM microservices to build more secure, trustworthy AI agents that provide safe, appropriate responses within context-specific guidelines and are bolstered against jailbreak attempts. Deployed in customer service across industries like automotive, finance, healthcare, manufacturing and retail, the agents can boost customer satisfaction and trust.

Aetina & Qualcomm Collaborate on Flagship MegaEdge AIP-FR68 Edge AI Solution

Aetina, a leading provider of edge AI solutions and a subsidiary of Innodisk Group, today announced a collaboration with Qualcomm Technologies, Inc., who unveiled a revolutionary Qualcomm AI On-Prem Appliance Solution and Qualcomm AI Inference Suite for On-Prem. This collaboration combines Qualcomm Technologies' cutting-edge inference accelerators and advanced software with Aetina's edge computing hardware to deliver unprecedented computing power and ready-to-use AI applications for enterprises and industrial organizations.

The flagship offering, the Aetina MegaEdge AIP-FR68, sets a new industry benchmark by integrating Qualcomm Cloud AI family of accelerator cards. Each Cloud AI 100 Ultra card delivers an impressive 870 TOPS of AI computing power at 8-bit integer (INT8) while maintaining remarkable energy efficiency at just 150 W power consumption. The system supports dual Cloud AI 100 Ultra cards in a single desktop workstation. This groundbreaking combination of power and efficiency in a compact form factor revolutionizes on-premises AI processing, making enterprise-grade computing more accessible than ever.

Supermicro Begins Volume Shipments of Max-Performance Servers Optimized for AI, HPC, Virtualization, and Edge Workloads

Supermicro, Inc. a Total IT Solution Provider for AI/ML, HPC, Cloud, Storage, and 5G/Edge is commencing shipments of max-performance servers featuring Intel Xeon 6900 series processors with P-cores. The new systems feature a range of new and upgraded technologies with new architectures optimized for the most demanding high-performance workloads including large-scale AI, cluster-scale HPC, and environments where a maximum number of GPUs are needed, such as collaborative design and media distribution.

"The systems now shipping in volume promise to unlock new capabilities and levels of performance for our customers around the world, featuring low latency, maximum I/O expansion providing high throughput with 256 performance cores per system, 12 memory channels per CPU with MRDIMM support, and high performance EDSFF storage options," said Charles Liang, president and CEO of Supermicro. "We are able to ship our complete range of servers with these new application-optimized technologies thanks to our Server Building Block Solutions design methodology. With our global capacity to ship solutions at any scale, and in-house developed liquid cooling solutions providing unrivaled cooling efficiency, Supermicro is leading the industry into a new era of maximum performance computing."

UGREEN Shows Off High-End, AI Capable NAS Devices at CES 2025

At CES this year, UGREEN was showing off two new NAS models, the NASync iDX6011 and the iDX6011 Pro, with the i in the model name seemingly denoting that both models are using Intel Core Ultra processors. The basic design builds on last years NASync models and the UGOS Pro operating system, but with several added features that may or may not appeal to the target audience. The common feature set between the two models is six 3.5-inch drive bays and a pair of M.2 slots that can be used for either storage or as a cache for the mechanical drives. Both models are expected to ship with 32 GB of RAM as standard and can be expanded to 64 GB and both of them also supports a PCIe 4.0 x8 expansion card slot, although at this point it's not clear what that slot can be used for. As with the already launched NASync models, the two new SKUs will come with the OS installed on a 128 GB SSD.

Where things get interesting is on the connectivity side of things, as both models sports dual 10 Gbps Ethernet ports, what is said to be an 8K capable HDMI port, a pair of USB 3.2 (10 Gbps) ports, two USB 2.0 ports and an SD 4.0 card slot. However, the NASync iDX6011 comes with a pair of Thunderbolt 4 ports around the front, although it's not clear if it can be used as a DAS using these ports, or if they simply act as virtual network ports. The iDX6011 Pro on the other hand, sports two USB Type-C ports around the front—as well as a small status LCD display—in favour of an OCuLink port around the back. The OCuLink port is capable of up to 64 Gbps of bandwidth, compared to 40 Gbps for the Thunderbolt 4 ports. It's currently not know what the OCuLink port can be used for, but it's more or less an external PCIe interface. It's also unknown what type of AI or LLM features the two new NASync devices will support, but it's clear they'll rely on the capabilities of the Intel processors they're built around. No pricing was announced at CES and the NASync iDX6011 is expected to launch sometime in the second quarter this year, with the NASync iDX6011 Pro launching in the third quarter. We should also note that the NASync iDX6011 Pro wasn't on display at CES, hence the renders below.

UnifyDrive is Redefining AI-Driven Data Storage at CES 2025

UnifyDrive's participation at CES 2025 marked a pivotal moment in the evolution of portable data storage, as the company unveiled the UP6, the world's first AI-equipped portable tablet NAS. Met with overwhelming acclaim, attendees praised the UP6 as a leap forward in portable data storage, combining intuitive file organization, performance, and Artificial Intelligence to meet the demands of creators, businesses, and modern consumers.

Powered by Intel Core Ultra Processor and integrated with a Large Language Model (LLM), the UP6 has reshaped how users interact with their data. The device's ability to enable natural language searches, retrieve local data, and restore and enhance images resonated strongly with attendees. Many praised its potential to revolutionize productivity workflows, with one industry analyst describing it as "a fundamental change in smart storage solutions."

UGREEN Showcases Pioneering NASync AI NAS Lineup and More at CES 2025

UGREEN, a leading innovator in consumer electronics, is due to showcase its latest innovations at CES 2025 under the theme of "Activate the Possibility of AI." The highlight of the event will be the unveiling of the highly anticipated NASync iDX6011 and NASync iDX6011 Pro devices, which are from the cutting-edge AI NAS lineup of the NASync series. Alongside these groundbreaking products, the Nexode 500 W 6-Port GaN Desktop Fast Charger and the Revodok Max 2131 Thunderbolt 5 Docking Station will also take center stage.

The NASync series AI NAS models are set to redefine expectations with integrated large language models (LLMs) for advanced natural language processing and AI-driven interactive capabilities. Powered by cutting-edge Intel Core Ultra Processors, the iDX6011 and iDX6011 Pro deliver unmatched performance, enabling seamless functionality and exceptional AI applications. These models build on the success of earlier NASync series products, such as the NASync DXP models, which garnered widespread attention and raised over $6.6 million during a Kickstarter campaign in March 2024.

Gigabyte Unveils a Diverse Lineup of AI PCs With Groundbreaking GiMATE AI Agent at CES 2025

GIGABYTE, the world's leading computer brand, unveiled its next-gen AI PCs at CES 2025. GiMATE, a groundbreaking AI agent for seamless hardware and software control, takes center stage in the all-new lineup, redefining gaming, creation, and productivity in the AI era. Powered by NVIDIA GeForce RTX 50 Series Laptop GPUs and NVIDIA NIM microservices for advanced AI NIM and RTX AI, AMD Ryzen AI, Intel NPU AI, and enhanced by Microsoft Copilot, the AORUS MASTER, GIGABYTE AERO, and the GIGABYTE GAMING series deliver cutting-edge performance with upgraded WINDFORCE cooling in sleek, portable designs.

GiMATE, GIGABYTE's exclusive AI agent, integrates with an advanced Large Language Model (LLM) and the "Press and Speak" feature, making laptop control more natural and intuitive. From AI Power Gear II for optimal energy efficiency to AI Boost II's precision overclocking, GiMATE ensures optimal settings for every scenario. AI Cooling delivers 0dB ambiance, perfect for work environments, while AI Audio and AI Voice optimize sound for any setting. Safeguard your screen with AI Privacy, which detects prying eyes and activates protection instantly. GiMATE aims to be users' smart AI Mate, that redefines laptops in users' daily lives.

Gigabyte Demonstrates Omni-AI Capabilities at CES 2025

GIGABYTE Technology, internationally renowned for its R&D capabilities and a leading innovator in server and data center solutions, continues to lead technological innovation during this critical period of AI and computing advancement. With its comprehensive AI product portfolio, GIGABYTE will showcase its complete range of AI computing solutions at CES 2025, from data center infrastructure to IoT applications and personal computing, demonstrating how its extensive product line enables digital transformation across all sectors in this AI-driven era.

Powering AI from the Cloud
With AI Large Language Models (LLMs) now routinely featuring parameters in the hundreds of billions to trillions, robust training environments (data centers) have become a critical requirement in the AI race. GIGABYTE offers three distinctive solutions for AI infrastructure.

Qualcomm Launches On-Prem AI Appliance Solution and Inference Suite at CES 2025

At CES 2025, Qualcomm Technologies, Inc., today announced Qualcomm AI On-Prem Appliance Solution, an on-premises desktop or wall-mounted hardware solution, and Qualcomm AI Inference Suite, a set of software and services for AI inferencing spanning from near-edge to cloud. The combination of these new offerings allows for small and medium businesses, enterprises and industrial organizations to run custom and off-the-shelf AI applications on their premises, including generative workloads. Running AI inference on premises can deliver significant savings in operational costs and overall total cost of ownership (TCO), compared to the cost of renting third-party AI infrastructure.

Using the AI On-Prem Appliance Solution in concert with the AI Inference Suite, customers can now use generative AI leveraging their proprietary data, fine-tuned models, and technology infrastructure to automate human and machine processes and applications in virtually any end environment, such as retail stores, quick service restaurants, shopping outlets, dealerships, hospitals, factories and shop floors - where the workflow is well established, repeatable and ready for automation.

Axelera AI Partners with Arduino for Edge AI Solutions

Axelera AI - a leading edge-inference company - and Arduino, the global leader in open-source hardware and software, today announced a strategic partnership to make high-performance AI at the edge more accessible than ever, building advanced technology solutions based on inference and an open ecosystem. This furthers Axelera AI's strategy to democratize artificial intelligence everywhere.

The collaboration will combine the strengths of Axelera AI's Metis AI Platform with the powerful SOMs from the Arduino Pro range to provide customers with easy-to-use hardware and software to innovate around AI. Users will enjoy the freedom to dictate their own AI journey, thanks to tools that provide unique digital in-memory computing and RISC-V controlled dataflow technology, delivering high performance and usability at a fraction of the cost and power of other solutions available today.

NVIDIA Unveils New Jetson Orin Nano Super Developer Kit

NVIDIA is taking the wraps off a new compact generative AI supercomputer, offering increased performance at a lower price with a software upgrade. The new NVIDIA Jetson Orin Nano Super Developer Kit, which fits in the palm of a hand, provides everyone from commercial AI developers to hobbyists and students, gains in generative AI capabilities and performance. And the price is now $249, down from $499.

Available today, it delivers as much as a 1.7x leap in generative AI inference performance, a 70% increase in performance to 67 INT8 TOPS, and a 50% increase in memory bandwidth to 102 GB/s compared with its predecessor. Whether creating LLM chatbots based on retrieval-augmented generation, building a visual AI agent, or deploying AI-based robots, the Jetson Orin Nano Super is an ideal solution to fetch.

Advantech Introduces Its GPU Server SKY-602E3 With NVIDIA H200 NVL

Advantech, a leading global provider of industrial edge AI solutions, is excited to introduce its GPU server SKY-602E3 equipped with the NVIDIA H200 NVL platform. This powerful combination is set to accelerate the offline LLM for manufacturing, providing unprecedented levels of performance and efficiency. The NVIDIA H200 NVL, requiring 600 W passive cooling, is fully supported by the compact and efficient SKY-602E3 GPU server, making it an ideal solution for demanding edge AI applications.

Core of Factory LLM Deployment: AI Vision
The SKY-602E3 GPU server excels in supporting large language models (LLMs) for AI inference and training. It features four PCIe 5.0 x16 slots, delivering high bandwidth for intensive tasks, and four PCIe 5.0 x8 slots, providing enhanced flexibility for GPU and frame grabber card expansion. The half-width design of the SKY-602E3 makes it an excellent choice for workstation environments. Additionally, the server can be equipped with the NVIDIA H200 NVL platform, which offers 1.7x more performance than the NVIDIA H100 NVL, freeing up additional PCIe slots for other expansion needs.

Amazon AWS Announces General Availability of Trainium2 Instances, Reveals Details of Next Gen Trainium3 Chip

At AWS re:Invent, Amazon Web Services, Inc. (AWS), an Amazon.com, Inc. company, today announced the general availability of AWS Trainium2-powered Amazon Elastic Compute Cloud (Amazon EC2) instances, introduced new Trn2 UltraServers, enabling customers to train and deploy today's latest AI models as well as future large language models (LLM) and foundation models (FM) with exceptional levels of performance and cost efficiency, and unveiled next-generation Trainium3 chips.

"Trainium2 is purpose built to support the largest, most cutting-edge generative AI workloads, for both training and inference, and to deliver the best price performance on AWS," said David Brown, vice president of Compute and Networking at AWS. "With models approaching trillions of parameters, we understand customers also need a novel approach to train and run these massive workloads. New Trn2 UltraServers offer the fastest training and inference performance on AWS and help organizations of all sizes to train and deploy the world's largest models faster and at a lower cost."

Microsoft Office Tools Reportedly Collect Data for AI Training, Requiring Manual Opt-Out

Microsoft's Office suite is the staple in productivity tools, with millions of users entering sensitive personal and company data into Excel and Word. According to @nixCraft, an author from Cyberciti.biz, Microsoft left its "Connected Experiences" feature enabled by default, reportedly using user-generated content to train the company's AI models. This feature is enabled by default, meaning data from Word and Excel files may be used in AI development unless users manually opt-out. As a default option, this setting raises security concerns, especially from businesses and government workers relying on Microsoft Office for proprietary work. The feature allows documents such as articles, government data, and other confidential files to be included in AI training, creating ethical and legal challenges regarding consent and intellectual property.

Disabling the feature requires going to: File > Options > Trust Center > Trust Center Settings > Privacy Options > Privacy Settings > Optional Connected Experiences, and unchecking the box. Even with an unnecessary long opt-out steps, the European Union's GPDR agreement, which Microsoft complies with, requires all settings to be opt-in rather than opt-out by default. This directly contradicts EU GDPR laws, which could prompt an investigation from the EU. Microsoft has yet to confirm whether user content is actively being used to train its AI models. However, its Services Agreement includes a clause granting the company a "worldwide and royalty-free intellectual property license" to use user-generated content for purposes such as improving Microsoft products. The controversy raised from this is not new, especially where more companies leverage user data for AI development, often without explicit consent.

Aetina Debuts at SC24 With NVIDIA MGX Server for Enterprise Edge AI

Aetina, a subsidiary of the Innodisk Group and an expert in edge AI solutions, is pleased to announce its debut at Supercomputing (SC24) in Atlanta, Georgia, showcasing the innovative SuperEdge NVIDIA MGX short-depth edge AI server, AEX-2UA1. By integrating an enterprise-class on-premises large language model (LLM) with the advanced retrieval-augmented generation (RAG) technique, Aetina NVIDIA MGX short-depth server demonstrates exceptional enterprise edge AI performance, setting a new benchmark in Edge AI innovation. The server is powered by the latest Intel Xeon 6 processor and dual high-end double-width NVIDIA GPUs, delivering ultimate AI computing power in a compact 2U form factor, accelerating Gen AI at the edge.

The SuperEdge NVIDIA MGX server expands Aetina's product portfolio from specialized edge devices to comprehensive AI server solutions, propelling a key milestone in Innodisk Group's AI roadmap, from sensors and storage to AI software, computing platforms, and now AI edge servers.

Hypertec Introduces the World's Most Advanced Immersion-Born GPU Server

Hypertec proudly announces the launch of its latest breakthrough product, the TRIDENT iG series, an immersion-born GPU server line that brings extreme density, sustainability, and performance to the AI and HPC community. Purpose-built for the most demanding AI applications, this cutting-edge server is optimized for generative AI, machine learning (ML), deep learning (DL), large language model (LLM) training, inference, and beyond. With up to six of the latest NVIDIA GPUs in a 2U form factor, a staggering 8 TB of memory with enhanced RDMA capabilities, and groundbreaking density supporting up to 200 GPUs per immersion tank, the TRIDENT iG server line is a game-changer for AI infrastructure.

Additionally, the server's innovative design features a single or dual root complex, enabling greater flexibility and efficiency for GPU usage in complex workloads.

Q.ANT Introduces First Commercial Photonic Processor

Q.ANT, the leading startup for photonic computing, today announced the launch of its first commercial product - a photonics-based Native Processing Unit (NPU) built on the company's compute architecture LENA - Light Empowered Native Arithmetics. The product is fully compatible with today's existing computing ecosystem as it comes on the industry-standard PCI-Express. The Q.ANT NPU executes complex, non-linear mathematics natively using light instead of electrons, promising to deliver at least 30 times greater energy efficiency and significant computational speed improvements over traditional CMOS technology. Designed for compute-intensive applications such as AI Inference, machine learning, and physics simulation, the Q.ANT NPU has been proven to solve real-world challenges, including number recognition for deep neural network inference (see the recent press release regarding Cloud Access to NPU).

"With our photonic chip technology now available on the standard PCIe interface, we're bringing the incredible power of photonics directly into real-world applications. For us, this is not just a processor—it's a statement of intent: Sustainability and performance can go hand in hand," said Dr. Michael Förtsch, CEO of Q.ANT. "For the first time, developers can create AI applications and explore the capabilities of photonic computing, particularly for complex, nonlinear calculations. For example, experts calculated that one GPT-4 query today uses 10 times more electricity than a regular internet search request. Our photonic computing chips offer the potential to reduce the energy consumption for that query by a factor of 30."

IBM Expands Its AI Accelerator Offerings; Announces Collaboration With AMD

IBM and AMD have announced a collaboration to deploy AMD Instinct MI300X accelerators as a service on IBM Cloud. This offering, which is expected to be available in the first half of 2025, aims to enhance performance and power efficiency for Gen AI models such as and high-performance computing (HPC) applications for enterprise clients. This collaboration will also enable support for AMD Instinct MI300X accelerators within IBM's watsonx AI and data platform, as well as Red Hat Enterprise Linux AI inferencing support.

"As enterprises continue adopting larger AI models and datasets, it is critical that the accelerators within the system can process compute-intensive workloads with high performance and flexibility to scale," said Philip Guido, executive vice president and chief commercial officer, AMD. "AMD Instinct accelerators combined with AMD ROCm software offer wide support including IBM watsonx AI, Red Hat Enterprise Linux AI and Red Hat OpenShift AI platforms to build leading frameworks using these powerful open ecosystem tools. Our collaboration with IBM Cloud will aim to allow customers to execute and scale Gen AI inferencing without hindering cost, performance or efficiency."

GIGABYTE Launches AMD Radeon PRO W7800 AI TOP 48G Graphics Card

GIGABYTE TECHNOLOGY Co. Ltd, a leading manufacturer of premium gaming hardware, today launched the cutting-edge GIGABYTE AMD Radeon PRO W7800 AI TOP 48G. GIGABYTE has taken a significant leap forward with the release of the Radeon PRO W7800 AI TOP 48G graphics card, featuring AMD's RDNA 3 architecture and a massive 48 GB of GDDR6 memory. This significant increase in memory capacity, compared to its predecessor, provides workstation professionals, creators, and AI developers with incredible computational power to effortlessly handle complex design, rendering, and AI model training tasks.

⁠GIGABYTE stands as the AMD professional graphics partner in the market, with a proven ability to design and manufacture the entire Radeon PRO series. Our dedication to quality products, unwavering business commitment, and comprehensive customer service empower us to deliver professional-grade GPU solutions, expanding user's choices in workstation and AI computing.⁠

Anthropic Develops AI Model That Can Use Computers, Updates Claude 3.5 Sonnet

The age of automation is upon us. Anthropic, the company behind top-performing Claude large language models that compete directly with OpenAI GPT, has today announced updates to its models and a new feature—computer use. The computer use allows Claude 3.5 Sonnet model to access the user's system by looking at the screen, moving the cursor, typing text, and clicking buttons. While only being experimental for now, the system is prone to errors and creating "dumb" mistakes. However, it allows for one very important feature: driving the operating system designed for humans using artificial intelligence.

There is a benchmark that evaluates AI model's ability to use computers like a human does on human-centered operating system. Called OSWorld, the Claude 3.5 Sonnet model has managed to score 14.9% in screenshot-only category, and 22.0% in some other tasks that require more steps. A typical human scores around 72.36% in this testing, which proves to be difficult even for natural intelligence. However, this is only the beginning as these models advance rapidly. Usually, these models work with other types of data, like text and static images, where they process it and do computation based on it. Working on computers designed for human interaction first is a great leap in the capabilities of AI models.

Intel Won't Compete Against NVIDIA's High-End AI Dominance Soon, Starts Laying Off Over 2,200 Workers Across US

Intel's taking a different path with its Gaudi 3 accelerator chips. It's staying away from the high-demand market for training big AI models, which has made NVIDIA so successful. Instead, Intel wants to help businesses that need cheaper AI solutions to train and run smaller specific models and open-source options. At a recent event, Intel talked up Gaudi 3's "price performance advantage" over NVIDIA's H100 GPU for inference tasks. Intel says Gaudi 3 is faster and more cost-effective than the H100 when running Llama 3 and Llama 2 models of different sizes.

Intel also claims that Gaudi 3 is as power-efficient as the H100 for large language model (LLM) inference with small token outputs and does even better with larger outputs. The company even suggests Gaudi 3 beats NVIDIA's newer H200 in LLM inference throughput for large token outputs. However, Gaudi 3 doesn't match up to the H100 in overall floating-point operation throughput for 16-bit and 8-bit formats. For bfloat16 and 8-bit floating-point precision matrix math, Gaudi 3 hits 1,835 TFLOPS in each format, while the H100 reaches 1,979 TFLOPS for BF16 and 3,958 TFLOPS for FP8.

NVIDIA Fine-Tunes Llama3.1 Model to Beat GPT-4o and Claude 3.5 Sonnet with Only 70 Billion Parameters

NVIDIA has officially released its Llama-3.1-Nemotron-70B-Instruct model. Based on META's Llama3.1 70B, the Nemotron model is a large language model customized by NVIDIA in order to improve the helpfulness of LLM-generated responses. NVIDIA uses fine-tuning structured data to steer the model and allow it to generate more helpful responses. With only 70 billion parameters, the model is punching far above its weight class. The company claims that the model is beating the current top models from leading labs like OpenAI's GPT-4o and Anthropic's Claude 3.5 Sonnet, which are the current leaders across AI benchmarks. In evaluations such as Arena Hard, the NVIDIA Llama3.1 Nemotron 70B is scoring 85 points, while GPT-4o and Sonnet 3.5 score 79.3 and 79.2, respectively. Other benchmarks like AlpacaEval and MT-Bench spot NVIDIA also hold the top spot, with 57.6 and 8.98 scores earned. Claude and GPT reach 52.4 / 8.81 and 57.5 / 8.74, just below Nemotron.

This language model underwent training using reinforcement learning from human feedback (RLHF), specifically employing the REINFORCE algorithm. The process involved a reward model based on a large language model architecture and custom preference prompts designed to guide the model's behavior. The training began with a pre-existing instruction-tuned language model as the starting point. It was trained on Llama-3.1-Nemotron-70B-Reward and HelpSteer2-Preference prompts on a Llama-3.1-70B-Instruct model as the initial policy. Running the model locally requires either four 40 GB or two 80 GB VRAM GPUs and 150 GB of free disk space. We managed to take it for a spin on NVIDIA's website to say hello to TechPowerUp readers. The model also passes the infamous "strawberry" test, where it has to count the number of specific letters in a word, however, it appears that it was part of the fine-tuning data as it fails the next test, shown in the image below.

MSI Unveils AI Servers Powered by NVIDIA MGX at OCP 2024

MSI, a leading global provider of high-performance server solutions, proudly announced it is showcasing new AI servers powered by the NVIDIA MGX platform—designed to address the increasing demand for scalable, energy-efficient AI workloads in modern data centers—at the OCP Global Summit 2024, booth A6. This collaboration highlights MSI's continued commitment to advancing server solutions, focusing on cutting-edge AI acceleration and high-performance computing (HPC).

The NVIDIA MGX platform offers a flexible architecture that enables MSI to deliver purpose-built solutions optimized for AI, HPC, and LLMs. By leveraging this platform, MSI's AI server solutions provide exceptional scalability, efficiency, and enhanced GPU density—key factors in meeting the growing computational demands of AI workloads. Tapping into MSI's engineering expertise and NVIDIA's advanced AI technologies, these AI servers based on the MGX architecture deliver unparalleled compute power, positioning data centers to maximize performance and power efficiency while paving the way for the future of AI-driven infrastructure.

Arm and Partners Develop AI CPU: Neoverse V3 CSS Made on 2 nm Samsung GAA FET

Yesterday, Arm has announced significant progress in its Total Design initiative. The program, launched a year ago, aims to accelerate the development of custom silicon for data centers by fostering collaboration among industry partners. The ecosystem has now grown to include nearly 30 participating companies, with recent additions such as Alcor Micro, Egis, PUF Security, and SEMIFIVE. A notable development is a partnership between Arm, Samsung Foundry, ADTechnology, and Rebellions to create an AI CPU chiplet platform. This collaboration aims to deliver a solution for cloud, HPC, and AI/ML workloads, combining Rebellions' AI accelerator with ADTechnology's compute chiplet, implemented using Samsung Foundry's 2 nm Gate-All-Around (GAA) FET technology. The platform is expected to offer significant efficiency gains for generative AI workloads, with estimates suggesting a 2-3x improvement over the standard CPU design for LLMs like Llama3.1 with 405 billion parameters.

Arm's approach emphasizes the importance of CPU compute in supporting the complete AI stack, including data pre-processing, orchestration, and advanced techniques like Retrieval-augmented Generation (RAG). The company's Compute Subsystems (CSS) are designed to address these requirements, providing a foundation for partners to build diverse chiplet solutions. Several companies, including Alcor Micro and Alphawave, have already announced plans to develop CSS-powered chiplets for various AI and high-performance computing applications. The initiative also focuses on software readiness, ensuring that major frameworks and operating systems are compatible with Arm-based systems. Recent efforts include the introduction of Arm Kleidi technology, which optimizes CPU-based inference for open-source projects like PyTorch and Llama.cpp. Notably, as Google claims, most AI workloads are being inferenced on CPUs, so creating the most efficient and most performant CPUs for AI makes a lot of sense.
Return to Keyword Browsing
Jan 20th, 2025 12:04 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts