News Posts matching #Tensor

Return to Keyword Browsing

NVIDIA Outlines Cost Benefits of Inference Platform

Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform—a full stack comprising world-class silicon, systems and software—is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost. NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.

AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience. But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system—and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task. Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.

ADLINK Launches the DLAP Supreme Series

ADLINK Technology Inc., a global leader in edge computing, unveiled its new "DLAP Supreme Series", an edge generative AI platform. By integrating Phison's innovative aiDAPTIV+ AI solution, this series overcomes memory limitations in edge generative AI applications, significantly enhancing AI computing capabilities on edge devices. Without increasing high hardware costs, the DLAP Supreme series achieves notable AI performance improvements, helping enterprises reduce the cost barriers of AI deployment and accelerating the adoption of generative AI across various industries, especially in edge computing.

Lower AI Computing Costs and Significantly Improved Performance
As generative AI continues to penetrate various industries, many edge devices encounter performance bottlenecks due to insufficient DRAM capacity when executing large language models, affecting model operations and even causing issues such as inadequate token length. The DLAP Supreme series, leveraging aiDAPTIV+ technology, effectively overcomes these limitations and significantly enhances computing performance. Additionally, it supports edge devices in conducting generative language model training, enabling them with AI model training capabilities and improving their autonomous learning and adaptability.

NVIDIA NIM Microservices and AI Blueprints Usher in New Era of Local AI

Over the past year, generative AI has transformed the way people live, work and play, enhancing everything from writing and content creation to gaming, learning and productivity. PC enthusiasts and developers are leading the charge in pushing the boundaries of this groundbreaking technology. Countless times, industry-defining technological breakthroughs have been invented in one place—a garage. This week marks the start of the RTX AI Garage series, which will offer routine content for developers and enthusiasts looking to learn more about NVIDIA NIM microservices and AI Blueprints, and how to build AI agents, creative workflow, digital human, productivity apps and more on AI PCs. Welcome to the RTX AI Garage.

This first installment spotlights announcements made earlier this week at CES, including new AI foundation models available on NVIDIA RTX AI PCs that take digital humans, content creation, productivity and development to the next level. These models—offered as NVIDIA NIM microservices—are powered by new GeForce RTX 50 Series GPUs. Built on the NVIDIA Blackwell architecture, RTX 50 Series GPUs deliver up to 3,352 trillion AI operations per second of performance, 32 GB of VRAM and feature FP4 compute, doubling AI inference performance and enabling generative AI to run locally with a smaller memory footprint.

Cisco Unveils Plug-and-Play AI Solutions Powered by NVIDIA H100 and H200 Tensor Core GPUs

Today, Cisco announced new additions to its data center infrastructure portfolio: an AI server family purpose-built for GPU-intensive AI workloads with NVIDIA accelerated computing, and AI PODs to simplify and de-risk AI infrastructure investment. They give organizations an adaptable and scalable path to AI, supported by Cisco's industry-leading networking capabilities.

"Enterprise customers are under pressure to deploy AI workloads, especially as we move toward agentic workflows and AI begins solving problems on its own," said Jeetu Patel, Chief Product Officer, Cisco. "Cisco innovations like AI PODs and the GPU server strengthen the security, compliance, and processing power of those workloads as customers navigate their AI journeys from inferencing to training."

Google's Upcoming Tensor G5 and G6 Specs Might Have Been Revealed Early

Details of what is claimed to be Google's upcoming Tensor G5 and G6 SoCs have popped up over on Notebookcheck.net and the site claims to have found the specs on a public platform, without going into any further details. Those that were betting on the Tensor G5—codenamed Laguna—delivering vastly improved performance over the Tensor G4, are likely to be disappointed, at least on the CPU side of things. As previous rumours have suggested, the chip is expected to be manufactured by TSMC, using its N3E process node, but the Tensor G5 will retain the single Arm Cortex-X4 core, although it will see a slight upgrade to five Cortex-A725 cores vs. the three Cortex-A720 cores of the Tensor G4. The G5 loses two Cortex-A520 cores in favour of the extra Cortex-A725 cores. The Cortex-X4 will also remain clocked at the same peak 3.1 GHz as that of the Tensor G4.

Interestingly it looks like Google will drop the Arm Mali GPU in favour of an Imagination Technologies DXT GPU, although the specs listed by Notebookcheck doesn't add up with any of the specs listed by Imagination Technologies. The G5 will continue to support 4x 16-bit LPDDR5 or LPDDR5X memory chips, but Google has added support for UFS 4.0 memory, something that's been a point of complaint for the Tensor G4. Other new additions is support for 10 Gbps USB 3.2 Gen 2 and PCI Express 4.0. Some improvements to the camera logic has also been made, with support for up to 200 Megapixel sensors or 108 Megapixels with zero shutter lag, but if Google will use such a camera or not is anyone's guess at this point in time.
Return to Keyword Browsing
Feb 5th, 2025 05:48 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts