News Posts matching #Stable Diffusion

Return to Keyword Browsing

NVIDIA Launches the RTX A400 and A1000 Professional Graphics Cards

AI integration across design and productivity applications is becoming the new standard, fueling demand for advanced computing performance. This means professionals and creatives will need to tap into increased compute power, regardless of the scale, complexity or scope of their projects. To meet this growing need, NVIDIA is expanding its RTX professional graphics offerings with two new NVIDIA Ampere architecture-based GPUs for desktops: the NVIDIA RTX A400 and NVIDIA RTX A1000.

They expand access to AI and ray tracing technology, equipping professionals with the tools they need to transform their daily workflows. The RTX A400 GPU introduces accelerated ray tracing and AI to the RTX 400 series GPUs. With 24 Tensor Cores for AI processing, it surpasses traditional CPU-based solutions, enabling professionals to run cutting-edge AI applications, such as intelligent chatbots and copilots, directly on their desktops. The GPU delivers real-time ray tracing, so creators can build vivid, physically accurate 3D renders that push the boundaries of creativity and realism.

ASRock Reveals AI QuickSet 2024 Q1 Update With Two New AI Tools

Leading global motherboard manufacturer, ASRock, has successively released software based on Microsoft Windows 10/11 and Canonical Ubuntu Linux platforms since the end of last year, which can help users quickly download, install and configure artificial intelligence software. After receiving great response from the market, ASRock has revealed the 2024 Q1 update of AI QuickSet today, adding two new artificial intelligence (AI) tools, Whisper Desktop and AudioCraft, allowing users of ASRock AMD Radeon RX 7000 series graphics cards to experience more diverse artificial intelligence (AI) applications!

ASRock AI QuickSet software tool 1.2.4 Windows version supports Microsoft Windows 10/11 64-bit operating system, while Linux version 1.1.6 supports Canonical Ubuntu 22.04.4 Desktop (64-bit) operating system, through ASRock AMD Radeon RX 7000 series graphics cards and AMD ROCm software platform provide powerful computing capabilities to support a variety of well-known artificial intelligence (AI) applications. The 1.2.4 Windows version supports image generation tools such as DirectML Shark and Stable Diffusion web UI, as well as the newly added Whisper Desktop speech recognition tool; and the 1.1.6 Linux version supports Image/Manga Translator, Stable Diffusion CLI & web UI image generation tool, and Text generation web UI Llama 2 text generation tool using Meta Llama 2 language model, Ultralytics YOLOv8 object recognition tool, and the newly added AudioCraft audio generation tool.

NVIDIA Hopper Leaps Ahead in Generative AI at MLPerf

It's official: NVIDIA delivered the world's fastest platform in industry-standard tests for inference on generative AI. In the latest MLPerf benchmarks, NVIDIA TensorRT-LLM—software that speeds and simplifies the complex job of inference on large language models—boosted the performance of NVIDIA Hopper architecture GPUs on the GPT-J LLM nearly 3x over their results just six months ago. The dramatic speedup demonstrates the power of NVIDIA's full-stack platform of chips, systems and software to handle the demanding requirements of running generative AI. Leading companies are using TensorRT-LLM to optimize their models. And NVIDIA NIM—a set of inference microservices that includes inferencing engines like TensorRT-LLM—makes it easier than ever for businesses to deploy NVIDIA's inference platform.

Raising the Bar in Generative AI
TensorRT-LLM running on NVIDIA H200 Tensor Core GPUs—the latest, memory-enhanced Hopper GPUs—delivered the fastest performance running inference in MLPerf's biggest test of generative AI to date. The new benchmark uses the largest version of Llama 2, a state-of-the-art large language model packing 70 billion parameters. The model is more than 10x larger than the GPT-J LLM first used in the September benchmarks. The memory-enhanced H200 GPUs, in their MLPerf debut, used TensorRT-LLM to produce up to 31,000 tokens/second, a record on MLPerf's Llama 2 benchmark. The H200 GPU results include up to 14% gains from a custom thermal solution. It's one example of innovations beyond standard air cooling that systems builders are applying to their NVIDIA MGX designs to take the performance of Hopper GPUs to new heights.

UL Announces the Procyon AI Image Generation Benchmark Based on Stable Diffusion

We're excited to announce we're expanding our AI Inference benchmark offerings with the UL Procyon AI Image Generation Benchmark, coming Monday, 25th March. AI has the potential to be one of the most significant new technologies hitting the mainstream this decade, and many industry leaders are competing to deliver the best AI Inference performance through their hardware. Last year, we launched the first of our Procyon AI Inference Benchmarks for Windows, which measured AI Inference performance with a workload using Computer Vision.

The upcoming UL Procyon AI Image Generation Benchmark provides a consistent, accurate and understandable workload for measuring the AI performance of high-end hardware, built with input from members of the industry to ensure fair and comparable results across all supported hardware.

Intel Gaudi2 Accelerator Beats NVIDIA H100 at Stable Diffusion 3 by 55%

Stability AI, the developers behind the popular Stable Diffusion generative AI model, have run some first-party performance benchmarks for Stable Diffusion 3 using popular data-center AI GPUs, including the NVIDIA H100 "Hopper" 80 GB, A100 "Ampere" 80 GB, and Intel's Gaudi2 96 GB accelerator. Unlike the H100, which is a super-scalar CUDA+Tensor core GPU; the Gaudi2 is purpose-built to accelerate generative AI and LLMs. Stability AI published its performance findings in a blog post, which reveals that the Intel Gaudi2 96 GB is posting a roughly 56% higher performance than the H100 80 GB.

With 2 nodes, 16 accelerators, and a constant batch size of 16 per accelerator (256 in all), the Intel Gaudi2 array is able to generate 927 images per second, compared to 595 images for the H100 array, and 381 images per second for the A100 array, keeping accelerator and node counts constant. Scaling things up a notch to 32 nodes, and 256 accelerators or a batch size of 16 per accelerator (total batch size of 4,096), the Gaudi2 array is posting 12,654 images per second; or 49.4 images per-second per-device; compared to 3,992 images per second or 15.6 images per-second per-device for the older-gen A100 "Ampere" array.

AMD Readying Feature-enriched ROCm 6.1

The latest version of AMD's open-source GPU compute stack, ROCm, is due for launch soon according to a Phoronix article—chief author, Michael Larabel, has been poring over Team Red's public GitHub repositories over the past couple of days. AMD ROCm version 6.0 was released last December—bringing official support for the AMD Instinct MI300A/MI300X, alongside PyTorch improvements, expanded AI libraries, and many other upgrades and optimizations. The v6.0 milestone placed Team Red in a more competitive position next to NVIDIA's very mature CUDA software layer. A mid-February 2024 update added support for Radeon PRO W7800 and RX 7900 GRE GPUs, as well as ONNX Runtime.

Larabel believes that "ROCm 6.1" is in for an imminent release, given his tracking of increased activity on publicly visible developer platforms: "For MIPOpen 3.1 with ROCm 6.1 there's been many additions including new solvers, an AI-based parameter prediction model for the conv_hip_igemm_group_fwd_xdlops solver, numerous fixes, and other updates. AMD MIGraphX will see an important update with ROCm 6.1. For the next ROCm release, MIGraphX 2.9 brings FP8 support, support for more operators, documentation examples for Whisper / Llama-2 / Stable Diffusion 2.1, new ONNX examples, BLAS auto-tuning for GEMMs, and initial code for MIGraphX running on Microsoft Windows." The change-logs/documentation updates also point to several HIPIFY for ROCm 6.1 improvements—including the addition of CUDA 12.3.2 support.

AMD Ryzen 7 8700G AI Performance Enhanced by Overclocked DDR5 Memory

We already know about AMD Ryzen 7 8700G APU's enjoyment of overclocked memory—early reviews demonstrated the graphical benefits granted by fiddling with "iGPU engine clock and the processor's memory frequency." While gamers can enjoy a boosted integrated graphics solution that is comparable in performance 1080p stakes to a discrete Radeon RX 6500 XT GPU, AI enthusiasts are eager to experiment with the "Hawk Point" pat's Radeon 780M IGP and Neural Processing Unit (NPU)—the first generation Ryzen XDNA inference engine can unleash up to 16 AI TOPs. One individual, chi11eddog, posted their findings through social media channels earlier today, coinciding with the official launch of Ryzen 8000G processors. The initial set of results concentrated on the Radeon 780M aspect; NPU-centric data may arrive at a later date.

They performed quick tests on AMD's freshly released Ryzen 7 8700G desktop processor, combined with an MSI B650 Gaming Plus WiFi motherboard and two sticks of 16 GB DDR5-4800 memory. The MSI exclusive "Memory Try It" feature was deployed further up in the tables—this assisted in achieving and gauging several "higher system RAM frequency" settings. Here is chi11eddog's succinct interpretation of benchmark results: "7600 MT/s is 15% faster than 4800 MT/s in UL Procyon AI Inference Benchmark and 4% faster in GIMP with Stable Diffusion." The processor's default memory state is capable of producing 210 Float32 TOPs, according to chi11eddog's inference chart. The 6000 MT/s setting produces a 7% improvement over baseline, while 7200 MT/s drives proceedings to 11%—the flagship APU's Radeon 780M iGPU appears to be quite dependent on bandwidth. Their GIMP w/ Stable Diffusion benchmarks also taxed the integrated RDNA 3 graphics solution—again, it was deemed to be fairly bandwidth hungry.

NVIDIA Announces up to 5x Faster TensorRT-LLM for Windows, and ChatGPT API-like Interface

Even as CPU vendors are working to mainstream accelerated AI for client PCs, and Microsoft setting the pace for more AI in everyday applications with Windows 11 23H2 Update; NVIDIA is out there reminding you that every GeForce RTX GPU is an AI accelerator. This is thanks to its Tensor cores, and the SIMD muscle of the ubiquitous CUDA cores. NVIDIA has been making these for over 5 years now, and has an install base of over 100 million. The company is hence focusing on bring generative AI acceleration to more client- and enthusiast relevant use-cases, such as large language models.

NVIDIA at the Microsoft Ignite event announced new optimizations, models, and resources to bring accelerated AI to everyone with an NVIDIA GPU that meets the hardware requirements. To begin with, the company introduced an update to TensorRT-LLM for Windows, a library that leverages NVIDIA RTX architecture for accelerating large language models (LLMs). The new TensorRT-LLM version 0.6.0 will release later this month, and improve LLM inference performance by up to 5 times in terms of tokens per second, when compared to the initial release of TensorRT-LLM from October 2023. In addition, TensorRT-LLM 0.6.0 will introduce support for popular LLMs, including Mistral 7B and Nemtron-3 8B. Accelerating these two will require a GeForce RTX 30-series "Ampere" or 40-series "Ada" GPU with at least 8 GB of main memory.
Return to Keyword Browsing
May 1st, 2024 00:39 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts