News Posts matching #LLM

Return to Keyword Browsing

AMD Instinct MI300X Accelerators Power Microsoft Azure OpenAI Service Workloads and New Azure ND MI300X V5 VMs

Today at Microsoft Build, AMD (NASDAQ: AMD) showcased its latest end-to-end compute and software capabilities for Microsoft customers and developers. By using AMD solutions such as AMD Instinct MI300X accelerators, ROCm open software, Ryzen AI processors and software, and Alveo MA35D media accelerators, Microsoft is able to provide a powerful suite of tools for AI-based deployments across numerous markets. The new Microsoft Azure ND MI300X virtual machines (VMs) are now generally available, giving customers like Hugging Face, access to impressive performance and efficiency for their most demanding AI workloads.

"The AMD Instinct MI300X and ROCm software stack is powering the Azure OpenAI Chat GPT 3.5 and 4 services, which are some of the world's most demanding AI workloads," said Victor Peng, president, AMD. "With the general availability of the new VMs from Azure, AI customers have broader access to MI300X to deliver high-performance and efficient solutions for AI applications."

Lenovo Supercharges Copilot+ PCs with Latest Yoga Slim 7x and ThinkPad T14s Gen 6 20 May 2024

Today, Lenovo launched the Lenovo Yoga Slim 7x and Lenovo ThinkPad T14s Gen 6, its first next generation Copilot+ PCs powered by Snapdragon X Elite. As the PC industry enters a new phase of the artificial intelligence era, Lenovo is poised to offer new levels of personalization in personal computing across its PC portfolio. Intelligent software-powered local processing of tasks, and increased productivity, creativity, and security, these Copilot+ PC's combine to deliver a whole new experience in PC interaction. Lenovo is expanding its already comprehensive portfolio of AI-ready devices, software, and optimized services with two new laptops for consumers and business users—the Lenovo Yoga Slim 7x and the Lenovo ThinkPad T14s Gen 6.

Powered by Qualcomm Technologies' new Snapdragon X Elite processor featuring the 12-core Qualcomm Oryon CPU, Qualcomm Adreno GPU and a dedicated Qualcomm Hexagon NPU (neural processing unit), the new laptops deliver leading PC performance per watt with the fastest to date AI NPU processing up to 45 trillion operations per second (TOPS). With the latest enhancements from Microsoft and Copilot+, users can now access Large Language Model (LLM) capabilities even when offline, offering seamless productivity and creativity. The latest Lenovo laptops allow users to tap into the extensive Copilot+ knowledge base, empowering them to explore endless creative possibilities. By leveraging generative AI and machine learning, Copilot+ assists in composing compelling text, crafting engaging visuals, and streamlining common productivity tasks. With the ability to work offline with the same fluidity as online, the Yoga Slim 7x and the ThinkPad T14s Gen 6 set new standards in AI PC innovation, promising a futuristic and streamlined user experience for end users.

Ampere Scales AmpereOne Product Family to 256 Cores

Ampere Computing today released its annual update on upcoming products and milestones, highlighting the company's continued innovation and invention around sustainable, power efficient computing for the Cloud and AI. The company also announced that they are working with Qualcomm Technologies, Inc. to develop a joint solution for AI inferencing using Qualcomm Technologies' high-performance, low power Qualcomm Cloud AI 100 inference solutions and Ampere CPUs.

Semiconductor industry veteran and Ampere CEO Renee James said the increasing power requirements and energy challenge of AI is bringing Ampere's silicon design approach around performance and efficiency into focus more than ever. "We started down this path six years ago because it is clear it is the right path," James said. "Low power used to be synonymous with low performance. Ampere has proven that isn't true. We have pioneered the efficiency frontier of computing and delivered performance beyond legacy CPUs in an efficient computing envelope."

Report: 3 Out of 4 Laptop PCs Sold in 2027 will be AI Laptop PCs

Personal computers (PCs) have been used as the major productivity device for several decades. But now we are entering a new era of PCs based on artificial intelligence (AI), thanks to the boom witnessed in generative AI (GenAI). We believe the inventory correction and demand weakness in the global PC market have already normalized, with the impacts from COVID-19 largely being factored in. All this has created a comparatively healthy backdrop for reshaping the PC industry. Counterpoint estimates that almost half a billion AI laptop PCs will be sold during the 2023-2027 period, with AI PCs reviving the replacement demand.

Counterpoint separates GenAI laptop PCs into three categories - AI basic laptop, AI-advanced laptop and AI-capable laptop - based on different levels of computational performance, corresponding use cases and the efficiency of computational performance. We believe AI basic laptops, which are already in the market, can perform basic AI tasks but not completely GenAI tasks and, starting this year, will be supplanted by more AI-advanced and AI-capable models with enough TOPS (tera operations per second) powered by NPU (neural processing unit) or GPU (graphics processing unit) to perform the advanced GenAI tasks really well.

Apple Reportedly Developing Custom Data Center Processors with Focus on AI Inference

Apple is reportedly working on creating in-house chips designed explicitly for its data centers. This news comes from a recent report by the Wall Street Journal, which highlights the company's efforts to enhance its data processing capabilities and reduce dependency on third parties to supply the infrastructure. In the internal project called Apple Chips in Data Center (ACDC), which started in 2018, Apple wanted to design data center processors to handle the massive user base and increase the company's service offerings. The most recent advancement in AI means that Apple will probably serve an LLM processed in Apple's data center. The chip will most likely focus on inference of AI models rather than training.

The AI chips are expected to play a crucial role in improving the efficiency and speed of Apple's data centers, which handle vast amounts of data generated by the company's various services and products. By developing these custom chips, Apple aims to optimize its data processing and storage capabilities, ultimately leading to better user experiences across its ecosystem. The move by Apple to develop AI-enhanced chips for data centers is seen as a strategic step in the company's efforts to stay ahead in the competitive tech landscape. Almost all major tech companies, famously called the big seven, have products that use AI in silicon and in software processing. However, Apple is the one that seemingly lacked that. Now, the company is integrating AI across the entire vertical, from the upcoming iPhone integration to M4 chips for Mac devices and ACDC chips for data centers.

We Tested NVIDIA's new ChatRTX: Your Own GPU-accelerated AI Assistant with Photo Recognition, Speech Input, Updated Models

NVIDIA today unveiled ChatRTX, the AI assistant that runs locally on your machine, and which is accelerated by your GeForce RTX GPU. NVIDIA had originally launched this as "Chat with RTX" back in February 2024, back then this was regarded more as a public tech demo. We reviewed the application in our feature article. The ChatRTX rebranding is probably aimed at making the name sound more like ChatGPT, which is what the application aims to be—except it runs completely on your machine, and is exhaustively customizable. The most obvious advantage of a locally-run AI assistant is privacy—you are interacting with an assistant that processes your prompt locally, and accelerated by your GPU; the second is that you're not held back by performance bottlenecks by cloud-based assistants.

ChatRTX is a major update over the Chat with RTX tech-demo from February. To begin with, the application has several stability refinements from Chat with RTX, which felt a little rough on the edges. NVIDIA has significantly updated the LLMs included with the application, including Mistral 7B INT4, and Llama 2 7B INT4. Support is also added for additional LLMs, including Gemma, a local LLM trained by Google, based on the same technology used to make Google's flagship Gemini model. ChatRTX now also supports ChatGLM3, for both English and Chinese prompts. Perhaps the biggest upgrade ChatRTX is its ability to recognize images on your machine, as it incorporates CLIP (contrastive language-image pre-training) from OpenAI. CLIP is an LLM that recognizes what it's seeing in image collections. Using this feature, you can interact with your image library without the need for metadata. ChatRTX doesn't just take text input—you can speak to it. It now accepts natural voice input, as it integrates the Whisper speech-to-text NLI model.
DOWNLOAD: NVIDIA ChatRTX

Intel Builds World's Largest Neuromorphic System to Enable More Sustainable AI

Today, Intel announced that it has built the world's largest neuromorphic system. Code-named Hala Point, this large-scale neuromorphic system, initially deployed at Sandia National Laboratories, utilizes Intel's Loihi 2 processor, aims at supporting research for future brain-inspired artificial intelligence (AI), and tackles challenges related to the efficiency and sustainability of today's AI. Hala Point advances Intel's first-generation large-scale research system, Pohoiki Springs, with architectural improvements to achieve over 10 times more neuron capacity and up to 12 times higher performance.

"The computing cost of today's AI models is rising at unsustainable rates. The industry needs fundamentally new approaches capable of scaling. For that reason, we developed Hala Point, which combines deep learning efficiency with novel brain-inspired learning and optimization capabilities. We hope that research with Hala Point will advance the efficiency and adaptability of large-scale AI technology." -Mike Davies, director of the Neuromorphic Computing Lab at Intel Labs

NVIDIA Issues Patches for ChatRTX AI Chatbot, Suspect to Improper Privilege Management

Just a month after releasing the 0.1 beta preview of Chat with RTX, now called ChatRTX, NVIDIA has swiftly addressed critical security vulnerabilities discovered in its cutting-edge AI chatbot. The chatbot was found to be susceptible to cross-site scripting attacks (CWE-79) and improper privilege management attacks (CWE-269) in version 0.2 and all prior releases. The identified vulnerabilities posed significant risks to users' personal data and system security. Cross-site scripting attacks could allow malicious actors to inject scripts into the chatbot's interface, potentially compromising sensitive information. The improper privilege management flaw could also enable attackers to escalate their privileges and gain administrative control over users' systems and files.

Upon becoming aware of these vulnerabilities, NVIDIA promptly released an updated version of ChatRTX 0.2, available for download from its official website. The latest iteration of the software addresses these security issues, providing users with a more secure experience. As ChatRTX utilizes retrieval augmented generation (RAG) and NVIDIA Tensor-RT LLM software to allow users to train the chatbot on their personal data, the presence of such vulnerabilities is particularly concerning. Users are strongly advised to update their ChatRTX software to the latest version to mitigate potential risks and protect their personal information. ChatRTX remains in beta version, with no official release candidate timeline announced. As NVIDIA continues to develop and refine this innovative AI chatbot, the company must prioritize security and promptly address any vulnerabilities that may arise, ensuring a safe and reliable user experience.

NVIDIA Hopper Leaps Ahead in Generative AI at MLPerf

It's official: NVIDIA delivered the world's fastest platform in industry-standard tests for inference on generative AI. In the latest MLPerf benchmarks, NVIDIA TensorRT-LLM—software that speeds and simplifies the complex job of inference on large language models—boosted the performance of NVIDIA Hopper architecture GPUs on the GPT-J LLM nearly 3x over their results just six months ago. The dramatic speedup demonstrates the power of NVIDIA's full-stack platform of chips, systems and software to handle the demanding requirements of running generative AI. Leading companies are using TensorRT-LLM to optimize their models. And NVIDIA NIM—a set of inference microservices that includes inferencing engines like TensorRT-LLM—makes it easier than ever for businesses to deploy NVIDIA's inference platform.

Raising the Bar in Generative AI
TensorRT-LLM running on NVIDIA H200 Tensor Core GPUs—the latest, memory-enhanced Hopper GPUs—delivered the fastest performance running inference in MLPerf's biggest test of generative AI to date. The new benchmark uses the largest version of Llama 2, a state-of-the-art large language model packing 70 billion parameters. The model is more than 10x larger than the GPT-J LLM first used in the September benchmarks. The memory-enhanced H200 GPUs, in their MLPerf debut, used TensorRT-LLM to produce up to 31,000 tokens/second, a record on MLPerf's Llama 2 benchmark. The H200 GPU results include up to 14% gains from a custom thermal solution. It's one example of innovations beyond standard air cooling that systems builders are applying to their NVIDIA MGX designs to take the performance of Hopper GPUs to new heights.

Tiny Corp. Prepping Separate AMD & NVIDIA GPU-based AI Compute Systems

George Hotz and his startup operation (Tiny Corporation) appeared ready to completely abandon AMD Radeon GPUs last week, after experiencing a period of firmware-related headaches. The original plan involved the development of a pre-orderable $15,000 TinyBox AI compute cluster that housed six XFX Speedster MERC310 RX 7900 XTX graphics cards, but software/driver issues prompted experimentation via alternative hardware routes. A lot of media coverage has focused on the unusual adoption of consumer-grade GPUs—Tiny Corp.'s struggles with RDNA 3 (rather than CDNA 3) were maneuvered further into public view, after top AMD brass pitched in.

The startup's social media feed is very transparent about showcasing everyday tasks, problem-solving and important decision-making. Several Acer Predator BiFrost Arc A770 OC cards were purchased and promptly integrated into a colorfully-lit TinyBox prototype, but Hotz & Co. swiftly moved onto Team Green pastures. Tiny Corp. has begrudgingly adopted NVIDIA GeForce RTX 4090 GPUs. Earlier today, it was announced that work on the AMD-based system has resumed—although customers were forewarned about anticipated teething problems. The surprising message arrived in the early hours: "a hard to find 'umr' repo has turned around the feasibility of the AMD TinyBox. It will be a journey, but it gives us an ability to debug. We're going to sell both, red for $15,000 and green for $25,000. When you realize your pre-order you'll choose your color. Website has been updated. If you like to tinker and feel pain, buy red. The driver still crashes the GPU and hangs sometimes, but we can work together to improve it."

Qualcomm Announces the Snapdragon 7+ Gen 3, Featuring Exceptional On-Device AI Capabilities

Qualcomm Technologies, Inc., unveiled today the Snapdragon 7+ Gen 3 Mobile Platform, bringing on-device generative AI into the Snapdragon 7 series. The Mobile Platform supports a wide range of AI models including large language models (LLMs) such as Baichuan-7B, Llama 2, and Gemini Nano. Fueling extraordinary entertainment capabilities, Snapdragon 7+ Gen 3 also brings new select Snapdragon Elite Gaming features to the 7-series including Game Post Processing Accelerator and Adreno Frame Motion Engine 2, enhancing game effects and upscaling gaming content for desktop-level visuals. Plus, this platform brings top-notch photography features with our industry-leading 18-bit cognitive ISP.

"Today, we embark on the latest expansion in the 7-series to create new levels of entertainment for consumers - integrating next-generation technologies for richer experiences," said Chris Patrick, senior vice president and general manager of mobile handsets, Qualcomm Technologies, Inc. "Snapdragon 7+ Gen 3 is packed with support for incredible on-device generative AI features and provides incredible performance and power efficiency, while bringing Wi-Fi 7 to the Snapdragon 7 Series for the first time."

Samsung Roadmaps UFS 5.0 Storage Standard, Predicts Commercialization by 2027

Mobile tech tipster, Revegnus, has highlighted an interesting Samsung presentation slide—according to machine translation, the company's electronics division is already responding to an anticipated growth of "client-side large language model" service development. This market trend will demand improved Universal Flash Storage (UFS) interface speeds—Samsung engineers are currently engaged in: "developing a new product that uses UFS 4.0 technology, but increases the number of channels from the current 2 to 4." The upcoming "more advanced" UFS 4.0 storage chips could be beefy enough to be utilized alongside next-gen mobile processors in 2025. For example; ARM is gearing up "Blackhawk," the Cortex-X4's successor—industry watchdogs reckon that the semiconductor firm's new core is designed to deliver "great Large Language Model (LLM) performance" on future smartphones. Samsung's roadmap outlines another major R&D goal, but this prospect is far off from finalization—their chart reveals an anticipated 2027 rollout. The slide's body of text included a brief teaser: "at the same time, we are also actively participating in discussions on the UFS 5.0 standard."

Tiny Corp. Pauses Development of AMD Radeon GPU-based Tinybox AI Cluster

George Hotz and his Tiny Corporation colleagues were pinning their hopes on AMD delivering some good news earlier this month. The development of a "TinyBox" AI compute cluster project hit some major roadblocks a couple of weeks ago—at the time, Radeon RX 7900 XTX GPU firmware was not gelling with Tiny Corp.'s setup. Hotz expressed "70% confidence" in AMD approving open-sourcing certain bits of firmware. At the time of writing this has not transpired—this week the Tiny Corp. social media account has, once again, switched to an "all guns blazing" mode. Hotz and Co. have publicly disclosed that they were dabbling with Intel Arc graphics cards, as of a few weeks ago. NVIDIA hardware is another possible route, according to freshly posted open thoughts.

Yesterday, it was confirmed that the young startup organization had paused its utilization of XFX Speedster MERC310 RX 7900 XTX graphics cards: "the driver is still very unstable, and when it crashes or hangs we have no way of debugging it. We have no way of dumping the state of a GPU. Apparently it isn't just the MES causing these issues, it's also the Command Processor (CP). After seeing how open Tenstorrent is, it's hard to deal with this. With Tenstorrent, I feel confident that if there's an issue, I can debug and fix it. With AMD, I don't." The $15,000 TinyBox system relies on "cheaper" gaming-oriented GPUs, rather than traditional enterprise solutions—this oddball approach has attracted a number of customers, but the latest announcements likely signal another delay. Yesterday's tweet continued to state: "we are exploring Intel, working on adding Level Zero support to tinygrad. We also added a $400 bounty for XMX support. We are also (sadly) exploring a 6x GeForce RTX 4090 GPU box. At least we know the software is good there. We will revisit AMD once we have an open and reproducible build process for the driver and firmware. We are willing to dive really deep into hardware to make it amazing. But without access, we can't."

Ubisoft Exploring Generative AI, Could Revolutionize NPC Narratives

Have you ever dreamed of having a real conversation with an NPC in a video game? Not just one gated within a dialogue tree of pre-determined answers, but an actual conversation, conducted through spontaneous action and reaction? Lately, a small R&D team at Ubisoft's Paris studio, in collaboration with Nvidia's Audio2Face application and Inworld's Large Language Model (LLM), have been experimenting with generative AI in an attempt to turn this dream into a reality. Their project, NEO NPC, uses GenAI to prod at the limits of how a player can interact with an NPC without breaking the authenticity of the situation they are in, or the character of the NPC itself.

Considering that word—authenticity—the project has had to be a hugely collaborative effort across artistic and scientific disciplines. Generative AI is a hot topic of conversation in the videogame industry, and Senior Vice President of Production Technology Guillemette Picard is keen to stress that the goal behind all genAI projects at Ubisoft is to bring value to the player; and that means continuing to focus on human creativity behind the scenes. "The way we worked on this project, is always with our players and our developers in mind," says Picard. "With the player in mind, we know that developers and their creativity must still drive our projects. Generative AI is only of value if it has value for them."

NVIDIA Digital Human Technologies Bring AI Characters to Life

NVIDIA announced today that leading AI application developers across a wide range of industries are using NVIDIA digital human technologies to create lifelike avatars for commercial applications and dynamic game characters. The results are on display at GTC, the global AI conference held this week in San Jose, Calif., and can be seen in technology demonstrations from Hippocratic AI, Inworld AI, UneeQ and more.

NVIDIA Avatar Cloud Engine (ACE) for speech and animation, NVIDIA NeMo for language, and NVIDIA RTX for ray-traced rendering are the building blocks that enable developers to create digital humans capable of AI-powered natural language interactions, making conversations more realistic and engaging.

NVIDIA Blackwell Platform Arrives to Power a New Era of Computing

Powering a new era of computing, NVIDIA today announced that the NVIDIA Blackwell platform has arrived—enabling organizations everywhere to build and run real-time generative AI on trillion-parameter large language models at up to 25x less cost and energy consumption than its predecessor.

The Blackwell GPU architecture features six transformative technologies for accelerated computing, which will help unlock breakthroughs in data processing, engineering simulation, electronic design automation, computer-aided drug design, quantum computing and generative AI—all emerging industry opportunities for NVIDIA.

Gigabyte Unveils Comprehensive and Powerful AI Platforms at NVIDIA GTC

GIGABYTE Technology and Giga Computing, a subsidiary of GIGABYTE and an industry leader in enterprise solutions, will showcase their solutions at the GIGABYTE booth #1224 at NVIDIA GTC, a global AI developer conference running through March 21. This event will offer GIGABYTE the chance to connect with its valued partners and customers, and together explore what the future in computing holds.

The GIGABYTE booth will focus on GIGABYTE's enterprise products that demonstrate AI training and inference delivered by versatile computing platforms based on NVIDIA solutions, as well as direct liquid cooling (DLC) for improved compute density and energy efficiency. Also not to be missed at the NVIDIA booth is the MGX Pavilion, which features a rack of GIGABYTE servers for the NVIDIA GH200 Grace Hopper Superchip architecture.

Report: Apple to Use Google's Gemini AI for iPhones

In the world where the largest companies are riding the AI train, the biggest of them all—Apple—seemed to stay quiet for a while. Even with many companies announcing their systems/models, Apple has stayed relatively silent about the use of LLMs in their products. However, according to Bloomberg, Apple is not pushing out an AI model of its own; rather, it will license Google's leading Gemini models for its iPhone smartphones. Gemini is Google's leading AI model with three variants: Gemini Nano 1/2, Gemini Pro, and Gemini Ultra. The Gemini Nano 1 and Nano 2 are designed to run locally on hardware like smartphones. At the same time, Gemini Pro and Ultra are inferenced from Google's servers onto a local device using API and the internet.

Apple could use a local Gemini Nano for basic tasks while also utilizing Geminin Pro or Ultra for more complex tasks, where a router sends user input to the available model. That way, users could use AI capabilities both online and offline. Since Apple is readying a suite of changes for iOS version 18, backed by Neural Engine inside A-series Bionic chips, the LLM game of Apple's iPhones might get a significant upgrade with the Google partnership. While we still don't know the size of the deal, it surely is a massive deal for Google to tap into millions of devices Apple ships every year and for Apple to give its users a more optimized experience.

MemVerge and Micron Boost NVIDIA GPU Utilization with CXL Memory

MemVerge, a leader in AI-first Big Memory Software, has joined forces with Micron to unveil a groundbreaking solution that leverages intelligent tiering of CXL memory, boosting the performance of large language models (LLMs) by offloading from GPU HBM to CXL memory. This innovative collaboration is being showcased in Micron booth #1030 at GTC, where attendees can witness firsthand the transformative impact of tiered memory on AI workloads.

Charles Fan, CEO and Co-founder of MemVerge, emphasized the critical importance of overcoming the bottleneck of HBM capacity. "Scaling LLM performance cost-effectively means keeping the GPUs fed with data," stated Fan. "Our demo at GTC demonstrates that pools of tiered memory not only drive performance higher but also maximize the utilization of precious GPU resources."

MAINGEAR Introduces PRO AI Workstations Featuring aiDAPTIV+ For Cost-Effective Large Language Model Training

MAINGEAR, a leading provider of high-performance custom PC systems, and Phison, a global leader in NAND controllers and storage solutions, today unveiled groundbreaking MAINGEAR PRO AI workstations with Phison's aiDAPTIV+ technology. Specifically engineered to democratize Large Language Model (LLM) development and training for small and medium-sized businesses (SMBs), these ultra-powerful workstations incorporate aiDAPTIV+ technology to deliver supercomputer LLM training capabilities at a fraction of the cost of traditional AI training servers.

As the demand for large-scale generative AI models continues to surge and their complexity increases, the potential for LLMs also expands. However, this rapid advancement in LLM AI technology has led to a notable boost in hardware requirements, making model training cost-prohibitive and inaccessible for many small to medium businesses.

Cerebras & G42 Break Ground on Condor Galaxy 3 - an 8 exaFLOPs AI Supercomputer

Cerebras Systems, the pioneer in accelerating generative AI, and G42, the Abu Dhabi-based leading technology holding group, today announced the build of Condor Galaxy 3 (CG-3), the third cluster of their constellation of AI supercomputers, the Condor Galaxy. Featuring 64 of Cerebras' newly announced CS-3 systems - all powered by the industry's fastest AI chip, the Wafer-Scale Engine 3 (WSE-3) - Condor Galaxy 3 will deliver 8 exaFLOPs of AI with 58 million AI-optimized cores. The Cerebras and G42 strategic partnership already delivered 8 exaFLOPs of AI supercomputing performance via Condor Galaxy 1 and Condor Galaxy 2, each amongst the largest AI supercomputers in the world. Located in Dallas, Texas, Condor Galaxy 3 brings the current total of the Condor Galaxy network to 16 exaFLOPs.

"With Condor Galaxy 3, we continue to achieve our joint vision of transforming the worldwide inventory of AI compute through the development of the world's largest and fastest AI supercomputers," said Kiril Evtimov, Group CTO of G42. "The existing Condor Galaxy network has trained some of the leading open-source models in the industry, with tens of thousands of downloads. By doubling the capacity to 16exaFLOPs, we look forward to seeing the next wave of innovation Condor Galaxy supercomputers can enable." At the heart of Condor Galaxy 3 are 64 Cerebras CS-3 Systems. Each CS-3 is powered by the new 4 trillion transistor, 900,000 AI core WSE-3. Manufactured at TSMC at the 5-nanometer node, the WSE-3 delivers twice the performance at the same power and for the same price as the previous generation part. Purpose built for training the industry's largest AI models, WSE-3 delivers an astounding 125 petaflops of peak AI performance per chip.

ZOTAC Expands Computing Hardware with GPU Server Product Line for the AI-Bound Future

ZOTAC Technology Limited, a global leader in innovative technology solutions, expands its product portfolio with the GPU Server Series. The first series of products in ZOTAC's Enterprise lineup offers organizations affordable and high-performance computing solutions for a wide range of demanding applications, from core-to-edge inferencing and data visualization to model training, HPC modeling, and simulation.

The ZOTAC series of GPU Servers comes in a diverse range of form factors and configurations, featuring both Tower Workstations and Rack Mount Servers, as well as both Intel and AMD processor configurations. With support for up to 10 GPUs, modular design for easier access to internal hardware, a high space-to-performance ratio, and industry-standard features like redundant power supplies and extensive cooling options, ZOTAC's enterprise solutions can ensure optimal performance and durability, even under sustained intense workloads.

Cerebras Systems Unveils World's Fastest AI Chip with 4 Trillion Transistors and 900,000 AI cores

Cerebras Systems, the pioneer in accelerating generative AI, has doubled down on its existing world record of fastest AI chip with the introduction of the Wafer Scale Engine 3. The WSE-3 delivers twice the performance of the previous record-holder, the Cerebras WSE-2, at the same power draw and for the same price. Purpose built for training the industry's largest AI models, the 5 nm-based, 4 trillion transistor WSE-3 powers the Cerebras CS-3 AI supercomputer, delivering 125 petaflops of peak AI performance through 900,000 AI optimized compute cores.

Intel Gaudi2 Accelerator Beats NVIDIA H100 at Stable Diffusion 3 by 55%

Stability AI, the developers behind the popular Stable Diffusion generative AI model, have run some first-party performance benchmarks for Stable Diffusion 3 using popular data-center AI GPUs, including the NVIDIA H100 "Hopper" 80 GB, A100 "Ampere" 80 GB, and Intel's Gaudi2 96 GB accelerator. Unlike the H100, which is a super-scalar CUDA+Tensor core GPU; the Gaudi2 is purpose-built to accelerate generative AI and LLMs. Stability AI published its performance findings in a blog post, which reveals that the Intel Gaudi2 96 GB is posting a roughly 56% higher performance than the H100 80 GB.

With 2 nodes, 16 accelerators, and a constant batch size of 16 per accelerator (256 in all), the Intel Gaudi2 array is able to generate 927 images per second, compared to 595 images for the H100 array, and 381 images per second for the A100 array, keeping accelerator and node counts constant. Scaling things up a notch to 32 nodes, and 256 accelerators or a batch size of 16 per accelerator (total batch size of 4,096), the Gaudi2 array is posting 12,654 images per second; or 49.4 images per-second per-device; compared to 3,992 images per second or 15.6 images per-second per-device for the older-gen A100 "Ampere" array.

AMD Publishes User Guide for LM Studio - a Local AI Chatbot

AMD has caught up with NVIDIA and Intel in the race to get a locally run AI chatbot up and running on its respective hardware. Team Red's community hub welcomed a new blog entry on Wednesday—AI staffers published a handy "How to run a Large Language Model (LLM) on your AMD Ryzen AI PC or Radeon Graphics Card" step-by-step guide. They recommend that interested parties are best served by downloading the correct version of LM Studio. Their CPU-bound Windows variant—designed for higher-end Phoenix and Hawk Point chips—compatible Ryzen AI PCs can deploy instances of a GPT based LLM-powered AI chatbot. The LM Studio ROCm technical preview functions similarly, but is reliant on Radeon RX 7000 graphics card ownership. Supported GPU targets include: gfx1100, gfx1101 and gfx1102.

AMD believes that: "AI assistants are quickly becoming essential resources to help increase productivity, efficiency or even brainstorm for ideas." Their blog also puts a spotlight on LM Studio's offline functionality: "Not only does the local AI chatbot on your machine not require an internet connection—but your conversations stay on your local machine." The six-step guide invites curious members to experiment with a handful of large language models—most notably Mistral 7b and LLAMA v2 7b. They thoroughly recommend that you select options with "Q4 K M" (AKA 4-bit quantization). You can learn about spooling up "your very own AI chatbot" here.
Return to Keyword Browsing
May 21st, 2024 15:39 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts