News Posts matching #LLM

Return to Keyword Browsing

Cerebras Launches the World's Fastest AI Inference

Today, Cerebras Systems, the pioneer in high performance AI compute, announced Cerebras Inference, the fastest AI inference solution in the world. Delivering 1,800 tokens per second for Llama3.1 8B and 450 tokens per second for Llama3.1 70B, Cerebras Inference is 20 times faster than NVIDIA GPU-based solutions in hyperscale clouds. Starting at just 10c per million tokens, Cerebras Inference is priced at a fraction of GPU solutions, providing 100x higher price-performance for AI workloads.

Unlike alternative approaches that compromise accuracy for performance, Cerebras offers the fastest performance while maintaining state of the art accuracy by staying in the 16-bit domain for the entire inference run. Cerebras Inference is priced at a fraction of GPU-based competitors, with pay-as-you-go pricing of 10 cents per million tokens for Llama 3.1 8B and 60 cents per million tokens for Llama 3.1 70B.

FuriosaAI Unveils RNGD Power-Efficient AI Processor at Hot Chips 2024

Today at Hot Chips 2024, FuriosaAI is pulling back the curtain on RNGD (pronounced "Renegade"), our new AI accelerator designed for high-performance, highly efficient large language model (LLM) and multimodal model inference in data centers. As part of his Hot Chips presentation, Furiosa co-founder and CEO June Paik is sharing technical details and providing the first hands-on look at the fully functioning RNGD card.

With a TDP of 150 watts, a novel chip architecture, and advanced memory technology like HBM3, RNGD is optimized for inference with demanding LLMs and multimodal models. It's built to deliver high performance, power efficiency, and programmability all in a single product - a trifecta that the industry has struggled to achieve in GPUs and other AI chips.

AMD Completes Acquisition of Silo AI

AMD today announced the completion of its acquisition of Silo AI, the largest private AI lab in Europe. The all-cash transaction valued at approximately $665 million furthers the company's commitment to deliver end-to-end AI solutions based on open standards and in strong partnership with the global AI ecosystem. Silo AI brings a team of world-class AI scientists and engineers to AMD experienced in developing cutting-edge AI models, platforms and solutions for large enterprise customers including Allianz, Philips, Rolls-Royce and Unilever. Their expertise spans diverse markets and they have created state-of-the-art open source multilingual Large Language Models (LLMs) including Poro and Viking on AMD platforms. The Silo AI team will join the AMD Artificial Intelligence Group (AIG), led by AMD Senior Vice President Vamsi Boppana.

"AI is our number one strategic priority, and we continue to invest in both the talent and software capabilities to support our growing customer deployments and roadmaps," said Vamsi Boppana, AMD senior vice president, AIG. "The Silo AI team has developed state-of-the-art language models that have been trained at scale on AMD Instinct accelerators and they have broad experience developing and integrating AI models to solve critical problems for end customers. We expect their expertise and software capabilities will directly improve the experience for customers in delivering the best performing AI solutions on AMD platforms."

AI SSD Procurement Capacity Estimated to Exceed 45 EB in 2024; NAND Flash Suppliers Accelerate Process Upgrades

TrendForce's latest report on enterprise SSDs reveals that a surge in demand for AI has led AI server customers to significantly increase their orders for enterprise SSDs over the past two quarters. Upstream suppliers have been accelerating process upgrades and planning for 2YY products—slated to enter mass production in 2025—in order to meet the growing demand for SSDs in AI applications.

TrendForce observes that increased orders for enterprise SSDs from AI server customers have resulted in contract prices for this category rising by over 80% from 4Q23 to 3Q24. SSDs play a crucial role in AI development. In AI model training, SSDs primarily store model parameters, including evolving weights and deviations.

Intel Announces Arc A760A Automotive-grade GPU

In a strategic move to empower automakers with groundbreaking opportunities, Intel unveiled its first discrete graphics processing unit (dGPU), the Intel Arc Graphics for Automotive, at its AI Cockpit Innovation Experience event. To advance automotive AI, the product will be commercially deployed in vehicles as soon as 2025, accelerating automobile technology and unlocking a new era of AI-driven cockpit experiences and enhanced personalization for manufacturers and drivers alike.

Intel's entry into automotive discrete GPUs addresses growing demand for compute power in increasingly sophisticated vehicle cockpits. By adding the Intel Arc graphics for Automotive to its existing portfolio of AI-enhanced software-defined vehicle (SDV) system-on-chips (SoCs), Intel offers automakers an open, flexible and scalable platform solution that brings next-level, high-fidelity experiences to the vehicle.

Intel Releases AI Playground, a Unified Generative AI and Chat App for Intel Arc GPUs

Intel on Monday rolled out the first public release of AI Playground, an AI productivity suite the company showcased in its 2024 Computex booth. AI Playground is a well-packaged suite of generative AI applications and a chatbot, which are designed to leverage Intel Arc discrete GPUs with at least 8 GB of video memory. All utilities in the suite are designed under the OpenVINO framework, and take advantage of the XMX cores of Arc A-series discrete GPUs. Currently, only three GPU models from the lineup come with 8 GB or higher amounts of video memory, the A770, A750, and A580; and their mobile variants. The company is working on a variant of the suite that can work on Intel Core Ultra-H series processors, where it uses a combination of the NPU and the iGPU for acceleration. AI Playground is open source. Intel put in effort to make the suite as client-friendly as possible, by giving it a packaged installer that looks after installation of all software dependencies.

Intel AI Playground tools include an image generative AI that can turn prompts into standard or HD images, which is based on Stable Diffusion backed by DreamShaper 8 and Juggernaut XL models. It also supports Phi3, LCM LoRA, and LCM LoRA SDXL. All of these have been optimized for acceleration on Arc "Alchemist" GPUs. The utility also includes an AI image enhancement utility that can be used for upscaling along with detail reconstruction, styling, inpainting and outpainting, and certain kinds of image manipulation. The third most important tool is the text AI chatbot with all popular LLMs.

DOWNLOAD: Intel AI Playground

Tenstorrent Launches Next Generation Wormhole-based Developer Kits and Workstations

Tenstorrent is launching their next generation Wormhole chip featuring PCIe cards and workstations designed for developers who are interested in scalability for multi-chip development using Tenstorrent's powerful open-source software stacks.

These Wormhole-based cards and systems are now available for immediate order on tenstorrent.com:
  • Wormhole n150, powered by a single processor
  • Wormhole n300, powered by two processors
  • TT-LoudBox, a developer workstation powered by four Wormhole n300s (eight processors)

Gigabyte AI TOP Utility Reinventing Your Local AI Fine-tuning

GIGABYTE TECHNOLOGY Co. Ltd, a leading manufacturer of motherboards, graphics cards, and hardware solutions, released the GIGABYTE exclusive groundbreaking AI TOP Utility. With reinvented workflows, user-friendly interface, and real-time progress monitoring, AI TOP Utility provides a reinventing touch of local AI model training and fine-tuning. It features a variety of groundbreaking technologies that can be easily adapted by beginners or experts, for most common open-source LLMs, in anyplace even on your desk.

GIGABYTE AI TOP is the all-round solution for local AI Model Fine-tuning. Running local AI training and fine-tuning on sensitive data can relatively provide greater privacy and security with maximum flexibility and real-time adjustment. Collocating with GIGABYTE AI TOP hardware and AI TOP Utility, the common constraints of GPU VRAM insufficiency when trying to execute AI fine-tuning locally can be addressed. By GIGABYTE AI TOP series motherboard, PSU, and SSD, as well as GIGABYTE graphics cards lineup covering NVIDIA GeForce RTX 40 Series, AMD Radeon RX 7900 Series, Radeon Pro W7900 and W7800 series, the size of open-source LLM fine-tuning can now reach up to 236B and more.

HP is Betting on AI for their Notebooks and Desktops

HP Inc. today introduced two new innovations—the world's highest performance AI PC and the first integration of a trust framework into an AI model development platform. Both announcements expand HP's efforts to make AI real for companies and people with new and transformative AI experiences across the company's PCs, software, and partner ecosystem.

HP is empowering everyone, from corporate knowledge workers to freelancers and students, to unlock the power of AI. Users can connect with anyone in the world with real time translation to 40 languages, become master presenters with their personal communication coach, and quickly create videos like a pro.

AMD to Acquire Silo AI to Expand Enterprise AI Solutions Globally

AMD today announced the signing of a definitive agreement to acquire Silo AI, the largest private AI lab in Europe, in an all-cash transaction valued at approximately $665 million. The agreement represents another significant step in the company's strategy to deliver end-to-end AI solutions based on open standards and in strong partnership with the global AI ecosystem. The Silo AI team consists of world-class AI scientists and engineers with extensive experience developing tailored AI models, platforms and solutions for leading enterprises spanning cloud, embedded and endpoint computing markets.

Silo AI CEO and co-founder Peter Sarlin will continue to lead the Silo AI team as part of the AMD Artificial Intelligence Group, reporting to AMD senior vice president Vamsi Boppana. The acquisition is expected to close in the second half of 2024.

Gigabyte Launches AMD Radeon PRO W7000 Series Graphics Cards

GIGABYTE TECHNOLOGY Co. Ltd, a leading manufacturer of premium gaming hardware, today launched the cutting-edge AMD Radeon PRO W7000 series workstation graphics cards, including the flagship GIGABYTE Radeon PRO W7900 Dual Slot AI TOP 48G as well as the GIGABYTE Radeon PRO W7800 AI TOP 32G. Powered by AMD RDNA 3 architecture, these graphics cards offer a massive 48 GB and 32 GB of GDDR6 memory, respectively, delivering cutting-edge performance and exceptional experiences for workstation professionals, creators and AI developers.⁠⁠

GIGABYTE stands as the AMD professional graphics partner in the market, with a proven ability to design and manufacture the entire Radeon PRO series. Our dedication to quality products, unwavering business commitment, and comprehensive customer service empower us to deliver professional-grade GPU solutions, expanding user's choices in workstation and AI computing.

NVIDIA MLPerf Training Results Showcase Unprecedented Performance and Elasticity

The full-stack NVIDIA accelerated computing platform has once again demonstrated exceptional performance in the latest MLPerf Training v4.0 benchmarks. NVIDIA more than tripled the performance on the large language model (LLM) benchmark, based on GPT-3 175B, compared to the record-setting NVIDIA submission made last year. Using an AI supercomputer featuring 11,616 NVIDIA H100 Tensor Core GPUs connected with NVIDIA Quantum-2 InfiniBand networking, NVIDIA achieved this remarkable feat through larger scale - more than triple that of the 3,584 H100 GPU submission a year ago - and extensive full-stack engineering.

Thanks to the scalability of the NVIDIA AI platform, Eos can now train massive AI models like GPT-3 175B even faster, and this great AI performance translates into significant business opportunities. For example, in NVIDIA's recent earnings call, we described how LLM service providers can turn a single dollar invested into seven dollars in just four years running the Llama 3 70B model on NVIDIA HGX H200 servers. This return assumes an LLM service provider serving Llama 3 70B at $0.60/M tokens, with an HGX H200 server throughput of 24,000 tokens/second.

Intel Submits Gaudi 2 Results on MLCommons' Newest Benchmark

Today, MLCommons published results of its industry AI performance benchmark, MLPerf Training v4.0. Intel's results demonstrate the choice that Intel Gaudi 2 AI accelerators give enterprises and customers. Community-based software simplifies generative AI (GenAI) development and industry-standard Ethernet networking enables flexible scaling of AI systems. For the first time on the MLPerf benchmark, Intel submitted results on a large Gaudi 2 system (1,024 Gaudi 2 accelerators) trained in Intel Tiber Developer Cloud to demonstrate Gaudi 2 performance and scalability and Intel's cloud capacity for training MLPerf's GPT-3 175B1 parameter benchmark model.

"The industry has a clear need: address the gaps in today's generative AI enterprise offerings with high-performance, high-efficiency compute options. The latest MLPerf results published by MLCommons illustrate the unique value Intel Gaudi brings to market as enterprises and customers seek more cost-efficient, scalable systems with standard networking and open software, making GenAI more accessible to more customers," said Zane Ball, Intel corporate vice president and general manager, DCAI Product Management.

SK hynix Showcases Its Next-Gen Solutions at Computex 2024

SK hynix presented its leading AI memory solutions at COMPUTEX Taipei 2024 from June 4-7. As one of Asia's premier IT shows, COMPUTEX Taipei 2024 welcomed around 1,500 global participants including tech companies, venture capitalists, and accelerators under the theme "Connecting AI". Making its debut at the event, SK hynix underlined its position as a first mover and leading AI memory provider through its lineup of next-generation products.

"Connecting AI" With the Industry's Finest AI Memory Solutions
Themed "Memory, The Power of AI," SK hynix's booth featured its advanced AI server solutions, groundbreaking technologies for on-device AI PCs, and outstanding consumer SSD products. HBM3E, the fifth generation of HBM1, was among the AI server solutions on display. Offering industry-leading data processing speeds of 1.18 terabytes (TB) per second, vast capacity, and advanced heat dissipation capability, HBM3E is optimized to meet the requirements of AI servers and other applications. Another technology which has become crucial for AI servers is CXL as it can increase system bandwidth and processing capacity. SK hynix highlighted the strength of its CXL portfolio by presenting its CXL Memory Module-DDR5 (CMM-DDR5), which significantly expands system bandwidth and capacity compared to systems only equipped with DDR5. Other AI server solutions on display included the server DRAM products DDR5 RDIMM and MCR DIMM. In particular, SK hynix showcased its tall 128-gigabyte (GB) MCR DIMM for the first time at an exhibition.

New Performance Optimizations Supercharge NVIDIA RTX AI PCs for Gamers, Creators and Developers

NVIDIA today announced at Microsoft Build new AI performance optimizations and integrations for Windows that help deliver maximum performance on NVIDIA GeForce RTX AI PCs and NVIDIA RTX workstations. Large language models (LLMs) power some of the most exciting new use cases in generative AI and now run up to 3x faster with ONNX Runtime (ORT) and DirectML using the new NVIDIA R555 Game Ready Driver. ORT and DirectML are high-performance tools used to run AI models locally on Windows PCs.

WebNN, an application programming interface for web developers to deploy AI models, is now accelerated with RTX via DirectML, enabling web apps to incorporate fast, AI-powered capabilities. And PyTorch will support DirectML execution backends, enabling Windows developers to train and infer complex AI models on Windows natively. NVIDIA and Microsoft are collaborating to scale performance on RTX GPUs. These advancements build on NVIDIA's world-leading AI platform, which accelerates more than 500 applications and games on over 100 million RTX AI PCs and workstations worldwide.

AMD Instinct MI300X Accelerators Power Microsoft Azure OpenAI Service Workloads and New Azure ND MI300X V5 VMs

Today at Microsoft Build, AMD (NASDAQ: AMD) showcased its latest end-to-end compute and software capabilities for Microsoft customers and developers. By using AMD solutions such as AMD Instinct MI300X accelerators, ROCm open software, Ryzen AI processors and software, and Alveo MA35D media accelerators, Microsoft is able to provide a powerful suite of tools for AI-based deployments across numerous markets. The new Microsoft Azure ND MI300X virtual machines (VMs) are now generally available, giving customers like Hugging Face, access to impressive performance and efficiency for their most demanding AI workloads.

"The AMD Instinct MI300X and ROCm software stack is powering the Azure OpenAI Chat GPT 3.5 and 4 services, which are some of the world's most demanding AI workloads," said Victor Peng, president, AMD. "With the general availability of the new VMs from Azure, AI customers have broader access to MI300X to deliver high-performance and efficient solutions for AI applications."

Lenovo Supercharges Copilot+ PCs with Latest Yoga Slim 7x and ThinkPad T14s Gen 6 20 May 2024

Today, Lenovo launched the Lenovo Yoga Slim 7x and Lenovo ThinkPad T14s Gen 6, its first next generation Copilot+ PCs powered by Snapdragon X Elite. As the PC industry enters a new phase of the artificial intelligence era, Lenovo is poised to offer new levels of personalization in personal computing across its PC portfolio. Intelligent software-powered local processing of tasks, and increased productivity, creativity, and security, these Copilot+ PC's combine to deliver a whole new experience in PC interaction. Lenovo is expanding its already comprehensive portfolio of AI-ready devices, software, and optimized services with two new laptops for consumers and business users—the Lenovo Yoga Slim 7x and the Lenovo ThinkPad T14s Gen 6.

Powered by Qualcomm Technologies' new Snapdragon X Elite processor featuring the 12-core Qualcomm Oryon CPU, Qualcomm Adreno GPU and a dedicated Qualcomm Hexagon NPU (neural processing unit), the new laptops deliver leading PC performance per watt with the fastest to date AI NPU processing up to 45 trillion operations per second (TOPS). With the latest enhancements from Microsoft and Copilot+, users can now access Large Language Model (LLM) capabilities even when offline, offering seamless productivity and creativity. The latest Lenovo laptops allow users to tap into the extensive Copilot+ knowledge base, empowering them to explore endless creative possibilities. By leveraging generative AI and machine learning, Copilot+ assists in composing compelling text, crafting engaging visuals, and streamlining common productivity tasks. With the ability to work offline with the same fluidity as online, the Yoga Slim 7x and the ThinkPad T14s Gen 6 set new standards in AI PC innovation, promising a futuristic and streamlined user experience for end users.

Ampere Scales AmpereOne Product Family to 256 Cores

Ampere Computing today released its annual update on upcoming products and milestones, highlighting the company's continued innovation and invention around sustainable, power efficient computing for the Cloud and AI. The company also announced that they are working with Qualcomm Technologies, Inc. to develop a joint solution for AI inferencing using Qualcomm Technologies' high-performance, low power Qualcomm Cloud AI 100 inference solutions and Ampere CPUs.

Semiconductor industry veteran and Ampere CEO Renee James said the increasing power requirements and energy challenge of AI is bringing Ampere's silicon design approach around performance and efficiency into focus more than ever. "We started down this path six years ago because it is clear it is the right path," James said. "Low power used to be synonymous with low performance. Ampere has proven that isn't true. We have pioneered the efficiency frontier of computing and delivered performance beyond legacy CPUs in an efficient computing envelope."

Report: 3 Out of 4 Laptop PCs Sold in 2027 will be AI Laptop PCs

Personal computers (PCs) have been used as the major productivity device for several decades. But now we are entering a new era of PCs based on artificial intelligence (AI), thanks to the boom witnessed in generative AI (GenAI). We believe the inventory correction and demand weakness in the global PC market have already normalized, with the impacts from COVID-19 largely being factored in. All this has created a comparatively healthy backdrop for reshaping the PC industry. Counterpoint estimates that almost half a billion AI laptop PCs will be sold during the 2023-2027 period, with AI PCs reviving the replacement demand.

Counterpoint separates GenAI laptop PCs into three categories - AI basic laptop, AI-advanced laptop and AI-capable laptop - based on different levels of computational performance, corresponding use cases and the efficiency of computational performance. We believe AI basic laptops, which are already in the market, can perform basic AI tasks but not completely GenAI tasks and, starting this year, will be supplanted by more AI-advanced and AI-capable models with enough TOPS (tera operations per second) powered by NPU (neural processing unit) or GPU (graphics processing unit) to perform the advanced GenAI tasks really well.

Apple Reportedly Developing Custom Data Center Processors with Focus on AI Inference

Apple is reportedly working on creating in-house chips designed explicitly for its data centers. This news comes from a recent report by the Wall Street Journal, which highlights the company's efforts to enhance its data processing capabilities and reduce dependency on third parties to supply the infrastructure. In the internal project called Apple Chips in Data Center (ACDC), which started in 2018, Apple wanted to design data center processors to handle the massive user base and increase the company's service offerings. The most recent advancement in AI means that Apple will probably serve an LLM processed in Apple's data center. The chip will most likely focus on inference of AI models rather than training.

The AI chips are expected to play a crucial role in improving the efficiency and speed of Apple's data centers, which handle vast amounts of data generated by the company's various services and products. By developing these custom chips, Apple aims to optimize its data processing and storage capabilities, ultimately leading to better user experiences across its ecosystem. The move by Apple to develop AI-enhanced chips for data centers is seen as a strategic step in the company's efforts to stay ahead in the competitive tech landscape. Almost all major tech companies, famously called the big seven, have products that use AI in silicon and in software processing. However, Apple is the one that seemingly lacked that. Now, the company is integrating AI across the entire vertical, from the upcoming iPhone integration to M4 chips for Mac devices and ACDC chips for data centers.

We Tested NVIDIA's new ChatRTX: Your Own GPU-accelerated AI Assistant with Photo Recognition, Speech Input, Updated Models

NVIDIA today unveiled ChatRTX, the AI assistant that runs locally on your machine, and which is accelerated by your GeForce RTX GPU. NVIDIA had originally launched this as "Chat with RTX" back in February 2024, back then this was regarded more as a public tech demo. We reviewed the application in our feature article. The ChatRTX rebranding is probably aimed at making the name sound more like ChatGPT, which is what the application aims to be—except it runs completely on your machine, and is exhaustively customizable. The most obvious advantage of a locally-run AI assistant is privacy—you are interacting with an assistant that processes your prompt locally, and accelerated by your GPU; the second is that you're not held back by performance bottlenecks by cloud-based assistants.

ChatRTX is a major update over the Chat with RTX tech-demo from February. To begin with, the application has several stability refinements from Chat with RTX, which felt a little rough on the edges. NVIDIA has significantly updated the LLMs included with the application, including Mistral 7B INT4, and Llama 2 7B INT4. Support is also added for additional LLMs, including Gemma, a local LLM trained by Google, based on the same technology used to make Google's flagship Gemini model. ChatRTX now also supports ChatGLM3, for both English and Chinese prompts. Perhaps the biggest upgrade ChatRTX is its ability to recognize images on your machine, as it incorporates CLIP (contrastive language-image pre-training) from OpenAI. CLIP is an LLM that recognizes what it's seeing in image collections. Using this feature, you can interact with your image library without the need for metadata. ChatRTX doesn't just take text input—you can speak to it. It now accepts natural voice input, as it integrates the Whisper speech-to-text NLI model.
DOWNLOAD: NVIDIA ChatRTX

Intel Builds World's Largest Neuromorphic System to Enable More Sustainable AI

Today, Intel announced that it has built the world's largest neuromorphic system. Code-named Hala Point, this large-scale neuromorphic system, initially deployed at Sandia National Laboratories, utilizes Intel's Loihi 2 processor, aims at supporting research for future brain-inspired artificial intelligence (AI), and tackles challenges related to the efficiency and sustainability of today's AI. Hala Point advances Intel's first-generation large-scale research system, Pohoiki Springs, with architectural improvements to achieve over 10 times more neuron capacity and up to 12 times higher performance.

"The computing cost of today's AI models is rising at unsustainable rates. The industry needs fundamentally new approaches capable of scaling. For that reason, we developed Hala Point, which combines deep learning efficiency with novel brain-inspired learning and optimization capabilities. We hope that research with Hala Point will advance the efficiency and adaptability of large-scale AI technology." -Mike Davies, director of the Neuromorphic Computing Lab at Intel Labs

NVIDIA Issues Patches for ChatRTX AI Chatbot, Suspect to Improper Privilege Management

Just a month after releasing the 0.1 beta preview of Chat with RTX, now called ChatRTX, NVIDIA has swiftly addressed critical security vulnerabilities discovered in its cutting-edge AI chatbot. The chatbot was found to be susceptible to cross-site scripting attacks (CWE-79) and improper privilege management attacks (CWE-269) in version 0.2 and all prior releases. The identified vulnerabilities posed significant risks to users' personal data and system security. Cross-site scripting attacks could allow malicious actors to inject scripts into the chatbot's interface, potentially compromising sensitive information. The improper privilege management flaw could also enable attackers to escalate their privileges and gain administrative control over users' systems and files.

Upon becoming aware of these vulnerabilities, NVIDIA promptly released an updated version of ChatRTX 0.2, available for download from its official website. The latest iteration of the software addresses these security issues, providing users with a more secure experience. As ChatRTX utilizes retrieval augmented generation (RAG) and NVIDIA Tensor-RT LLM software to allow users to train the chatbot on their personal data, the presence of such vulnerabilities is particularly concerning. Users are strongly advised to update their ChatRTX software to the latest version to mitigate potential risks and protect their personal information. ChatRTX remains in beta version, with no official release candidate timeline announced. As NVIDIA continues to develop and refine this innovative AI chatbot, the company must prioritize security and promptly address any vulnerabilities that may arise, ensuring a safe and reliable user experience.

NVIDIA Hopper Leaps Ahead in Generative AI at MLPerf

It's official: NVIDIA delivered the world's fastest platform in industry-standard tests for inference on generative AI. In the latest MLPerf benchmarks, NVIDIA TensorRT-LLM—software that speeds and simplifies the complex job of inference on large language models—boosted the performance of NVIDIA Hopper architecture GPUs on the GPT-J LLM nearly 3x over their results just six months ago. The dramatic speedup demonstrates the power of NVIDIA's full-stack platform of chips, systems and software to handle the demanding requirements of running generative AI. Leading companies are using TensorRT-LLM to optimize their models. And NVIDIA NIM—a set of inference microservices that includes inferencing engines like TensorRT-LLM—makes it easier than ever for businesses to deploy NVIDIA's inference platform.

Raising the Bar in Generative AI
TensorRT-LLM running on NVIDIA H200 Tensor Core GPUs—the latest, memory-enhanced Hopper GPUs—delivered the fastest performance running inference in MLPerf's biggest test of generative AI to date. The new benchmark uses the largest version of Llama 2, a state-of-the-art large language model packing 70 billion parameters. The model is more than 10x larger than the GPT-J LLM first used in the September benchmarks. The memory-enhanced H200 GPUs, in their MLPerf debut, used TensorRT-LLM to produce up to 31,000 tokens/second, a record on MLPerf's Llama 2 benchmark. The H200 GPU results include up to 14% gains from a custom thermal solution. It's one example of innovations beyond standard air cooling that systems builders are applying to their NVIDIA MGX designs to take the performance of Hopper GPUs to new heights.

Tiny Corp. Prepping Separate AMD & NVIDIA GPU-based AI Compute Systems

George Hotz and his startup operation (Tiny Corporation) appeared ready to completely abandon AMD Radeon GPUs last week, after experiencing a period of firmware-related headaches. The original plan involved the development of a pre-orderable $15,000 TinyBox AI compute cluster that housed six XFX Speedster MERC310 RX 7900 XTX graphics cards, but software/driver issues prompted experimentation via alternative hardware routes. A lot of media coverage has focused on the unusual adoption of consumer-grade GPUs—Tiny Corp.'s struggles with RDNA 3 (rather than CDNA 3) were maneuvered further into public view, after top AMD brass pitched in.

The startup's social media feed is very transparent about showcasing everyday tasks, problem-solving and important decision-making. Several Acer Predator BiFrost Arc A770 OC cards were purchased and promptly integrated into a colorfully-lit TinyBox prototype, but Hotz & Co. swiftly moved onto Team Green pastures. Tiny Corp. has begrudgingly adopted NVIDIA GeForce RTX 4090 GPUs. Earlier today, it was announced that work on the AMD-based system has resumed—although customers were forewarned about anticipated teething problems. The surprising message arrived in the early hours: "a hard to find 'umr' repo has turned around the feasibility of the AMD TinyBox. It will be a journey, but it gives us an ability to debug. We're going to sell both, red for $15,000 and green for $25,000. When you realize your pre-order you'll choose your color. Website has been updated. If you like to tinker and feel pain, buy red. The driver still crashes the GPU and hangs sometimes, but we can work together to improve it."
Return to Keyword Browsing
Dec 21st, 2024 07:22 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts