News Posts matching #inference

Return to Keyword Browsing

OnLogic Reveals the Axial AX300 Edge Server

OnLogic, a leading provider of edge computing solutions, has launched the Axial AX300, a highly customizable and powerful edge server. The AX300 is engineered to help businesses of any size better leverage their on-site data and unlock the potential of AI by placing powerful computing capabilities on-site.

The Axial AX300 empowers organizations to seamlessly move computing resources closer to the data source, providing significant advantages in performance, latency, operational efficiency, and total cost of ownership over cloud-based data management. With its robust design, flexible configuration options, and advanced security features, the Axial AX300 is the ideal platform for a wide range of highly-impactful edge computing applications, including:
  • AI/ML inference and training: Leveraging the power of AI/ML at the edge for real-time insights, predictive maintenance, and improved decision-making.
  • Data analytics: Processing and analyzing data generated by IoT devices and sensors in real-time to improve operational efficiency.
  • Virtualization: Consolidating multiple workloads onto a single server, optimizing resource utilization and simplifying deployment and management.

ASRock Industrial iEP-6010E Series Now Powered by NVIDIA Jetson Orin NX & Nano Super Mode

The future of AI at the edge has arrived, and ASRock Industrial is leading the way. With Super Mode support for the iEP-6010E Series and iEP-6010E Series Dev Kit, powered by NIVIDIA Jetson Orin NX and Jetson Orin Nano modules and optimized with NVIDIA JetPack 6.2 SDK, this upgrade supercharges AI acceleration, unlocking up to 2X faster generative AI performance. Delivering unprecedented inference speeds, real-time decision-making, and superior power efficiency, Super Mode is redefining edge AI computing - Powering everything from predictive maintenance and anomaly detection to AI-driven medical imaging, intelligent surveillance, and autonomous mobility - pushing performance beyond limits, faster, smarter, and more efficient than ever before. Whether driving Edge AI computing, real-time vision processing, intelligent autonomous vehicles, advanced robotics, industrial automation, or smart city infrastructure, ASRock Industrial's latest innovation empowers industries with cutting-edge AI capabilities that drive real-world transformation.

A Quantum Leap in Edge AI Performance
The iEP-6010E Series and iEP-6010E Series Dev Kit, now optimized with Super Mode, provides:
  • Up to 2X Faster Gen AI Processing - Up to 157 TOPS (Sparse) on Orin NX 16 GB (Super) and 67 TOPS (Sparse) on Orin Nano 8 GB (Super), delivering blazing-fast generative AI performance for automation, robotics, and real-time vision systems.
  • Optimized Power Boost and Efficiency - Supports power modes up to 40 W / MAXN Super on Orin NX and 25 W / MAXN Super on Orin Nano, while maintaining high-performance AI workloads in power-sensitive environments, making it ideal for industrial automation and smart city infrastructure.
  • NVIDIA JetPack 6.2 Support - Fully compatible with the latest NVIDIA JetPack 6.2 SDK, offering enhanced libraries, frameworks, and tools to accelerate advanced AI development.

CoreWeave Launches Debut Wave of NVIDIA GB200 NVL72-based Cloud Instances

AI reasoning models and agents are set to transform industries, but delivering their full potential at scale requires massive compute and optimized software. The "reasoning" process involves multiple models, generating many additional tokens, and demands infrastructure with a combination of high-speed communication, memory and compute to ensure real-time, high-quality results. To meet this demand, CoreWeave has launched NVIDIA GB200 NVL72-based instances, becoming the first cloud service provider to make the NVIDIA Blackwell platform generally available. With rack-scale NVIDIA NVLink across 72 NVIDIA Blackwell GPUs and 36 NVIDIA Grace CPUs, scaling to up to 110,000 GPUs with NVIDIA Quantum-2 InfiniBand networking, these instances provide the scale and performance needed to build and deploy the next generation of AI reasoning models and agents.

NVIDIA GB200 NVL72 on CoreWeave
NVIDIA GB200 NVL72 is a liquid-cooled, rack-scale solution with a 72-GPU NVLink domain, which enables the six dozen GPUs to act as a single massive GPU. NVIDIA Blackwell features many technological breakthroughs that accelerate inference token generation, boosting performance while reducing service costs. For example, fifth-generation NVLink enables 130 TB/s of GPU bandwidth in one 72-GPU NVLink domain, and the second-generation Transformer Engine enables FP4 for faster AI performance while maintaining high accuracy. CoreWeave's portfolio of managed cloud services is purpose-built for Blackwell. CoreWeave Kubernetes Service optimizes workload orchestration by exposing NVLink domain IDs, ensuring efficient scheduling within the same rack. Slurm on Kubernetes (SUNK) supports the topology block plug-in, enabling intelligent workload distribution across GB200 NVL72 racks. In addition, CoreWeave's Observability Platform provides real-time insights into NVLink performance, GPU utilization and temperatures.

Moore Threads Teases Excellent Performance of DeepSeek-R1 Model on MTT GPUs

Moore Threads, a Chinese manufacturer of proprietary GPU designs is (reportedly) the latest company to jump onto the DeepSeek-R1 bandwagon. Since late January, NVIDIA, Microsoft and AMD have swooped in with their own interpretations/deployments. By global standards, Moore Threads GPUs trail behind Western-developed offerings—early 2024 evaluations presented the firm's MTT S80 dedicated desktop graphics card struggling against an AMD integrated solution: Radeon 760M. The recent emergence of DeepSeek's open source models has signalled a shift away from reliance on extremely powerful and expensive AI-crunching hardware (often accessed via the cloud)—widespread excitement has been generated by DeepSeek solutions being relatively frugal, in terms of processing requirements. Tom's Hardware has observed cases of open source AI models running (locally) on: "inexpensive hardware, like the Raspberry Pi."

According to recent Chinese press coverage, Moore Threads has announced a successful deployment of DeepSeek's R1-Distill-Qwen-7B distilled model on the aforementioned MTT S80 GPU. The company also revealed that it had taken similar steps with its MTT S4000 datacenter-oriented graphics hardware. On the subject of adaptation, a Moore Threads spokesperson stated: "based on the Ollama open source framework, Moore Threads completed the deployment of the DeepSeek-R1-Distill-Qwen-7B distillation model and demonstrated excellent performance in a variety of Chinese tasks, verifying the versatility and CUDA compatibility of Moore Threads' self-developed full-featured GPU." Exact performance figures, benchmark results and technical details were not disclosed to the Chinese public, so Moore Threads appears to be teasing the prowess of its MTT GPU designs. ITHome reported that: "users can also perform inference deployment of the DeepSeek-R1 distillation model based on MTT S80 and MTT S4000. Some users have previously completed the practice manually on MTT S80." Moore Threads believes that its: "self-developed high-performance inference engine, combined with software and hardware co-optimization technology, significantly improves the model's computing efficiency and resource utilization through customized operator acceleration and memory management. This engine not only supports the efficient operation of the DeepSeek distillation model, but also provides technical support for the deployment of more large-scale models in the future."

Huawei Delivers Record $118 Billion Revenue with 22% Yearly Growth Despite US Sanctions

Huawei Technologies reported a robust 22% year-over-year revenue increase for 2024, reaching 860 billion yuan ($118.27 billion), demonstrating remarkable resilience amid continued US-imposed trade restrictions. The Chinese tech giant's resurgence was primarily driven by its revitalized smartphone division, which captured 16% of China's domestic market share, overtaking Apple in regional sales. This achievement was notably accomplished by deploying domestically produced chipsets, marking a significant milestone for the company. In collaboration with Chinese SMIC, Huawei delivers in-house silicon solutions to integrate with HarmonyOS for complete vertical integration. The company's strategic diversification into automotive technology has emerged as a crucial growth vector, with its smart car solutions unit delivering autonomous driving software and specialized chips to Chinese EV manufacturers.

In parallel, Huawei's Ascend AI 910B/C platform recently announced compatibility with DeepSeek's R1 large language model and announced availability on Chinese AI cloud providers like SiliconFlow. Through a strategic partnership with AI infrastructure startup SiliconFlow, Huawei is enhancing its Ascend cloud service capabilities, further strengthening its competitive position in the global AI hardware market despite ongoing international trade challenges. Even if the company can't compete on performance versus the latest solutions from NVIDIA and AMD due to the lack of advanced manufacturing required for AI accelerators, it can compete on costs and deliver solutions that are typically much more competitive with the price/performance ratio. Huawei's Ascend AI solutions deliver modest performance. Still, the pricing makes AI model inference very cheap, with API costs of around one Yaun per million input tokens and four Yuan per one million output tokens on DeepSeek R1.

ASUS AI POD With NVIDIA GB200 NVL72 Platform Ready to Ramp-Up Production for Scheduled Shipment in March

ASUS is proud to announce that ASUS AI POD, featuring the NVIDIA GB200 NVL72 platform, is ready to ramp-up production for a scheduled shipping date of March 2025. ASUS remains dedicated to providing comprehensive end-to-end solutions and software services, encompassing everything from AI supercomputing to cloud services. With a strong focus on fostering AI adoption across industries, ASUS is positioned to empower clients in accelerating their time to market by offering a full spectrum of solutions.

Proof of concept, funded by ASUS
Honoring the commitment to delivering exceptional value to clients, ASUS is set to launch a proof of concept (POC) for the groundbreaking ASUS AI POD, powered by the NVIDIA Blackwell platform. This exclusive opportunity is now open to a select group of innovators who are eager to harness the full potential of AI computing. Innovators and enterprises can experience firsthand the full potential of AI and deep learning solutions at exceptional scale. To take advantage of this limited-time offer, please complete this surveyi at: forms.office.com/r/FrAbm5BfH2. The expert ASUS team of NVIDIA GB200 specialists will guide users through the next steps.

NVIDIA GeForce RTX 50 Series AI PCs Accelerate DeepSeek Reasoning Models

The recently released DeepSeek-R1 model family has brought a new wave of excitement to the AI community, allowing enthusiasts and developers to run state-of-the-art reasoning models with problem-solving, math and code capabilities, all from the privacy of local PCs. With up to 3,352 trillion operations per second of AI horsepower, NVIDIA GeForce RTX 50 Series GPUs can run the DeepSeek family of distilled models faster than anything on the PC market.

A New Class of Models That Reason
Reasoning models are a new class of large language models (LLMs) that spend more time on "thinking" and "reflecting" to work through complex problems, while describing the steps required to solve a task. The fundamental principle is that any problem can be solved with deep thought, reasoning and time, just like how humans tackle problems. By spending more time—and thus compute—on a problem, the LLM can yield better results. This phenomenon is known as test-time scaling, where a model dynamically allocates compute resources during inference to reason through problems. Reasoning models can enhance user experiences on PCs by deeply understanding a user's needs, taking actions on their behalf and allowing them to provide feedback on the model's thought process—unlocking agentic workflows for solving complex, multi-step tasks such as analyzing market research, performing complicated math problems, debugging code and more.

NVIDIA AI Helps Fight Against Fraud Across Many Sectors

Companies and organizations are increasingly using AI to protect their customers and thwart the efforts of fraudsters around the world. Voice security company Hiya found that 550 million scam calls were placed per week in 2023, with INTERPOL estimating that scammers stole $1 trillion from victims that same year. In the U.S., one of four noncontact-list calls were flagged as suspected spam, with fraudsters often luring people into Venmo-related or extended warranty scams.

Traditional methods of fraud detection include rules-based systems, statistical modeling and manual reviews. These methods have struggled to scale to the growing volume of fraud in the digital era without sacrificing speed and accuracy. For instance, rules-based systems often have high false-positive rates, statistical modeling can be time-consuming and resource-intensive, and manual reviews can't scale rapidly enough.

Advantech Unveils New Lineup Powered by AMD Ryzen Embedded 8000 Series Processors

Advantech, a global leader in embedded IoT computing, is excited to launch its latest Edge AI solutions featuring the advanced AMD Ryzen Embedded 8000 Series processors. The SOM-6873 (COMe Compact), AIMB-2210 (Mini-ITX motherboards), and AIR-410 (AI Inference System) deliver exceptional AI performance leveraging the first AMD embedded devices with integrated Neural Processing Units. These integrated NPUs are optimized to enhance AI inference efficiency and precision. Together with traditional CPU and GPU elements, the architecture delivers performance up to 39 TOPS. They also support dual-channel DDR5 memory and PCIe Gen 4, providing ample computing power with flexible TDP options and thermal solutions.

Within value-added software services, such as the Edge AI SDK that integrates the AMD Ryzen AI software, the model porting procedure is accelerated. These features make the solutions ideal for edge applications such as HMI, machine vision in industrial automation, smart management and interactive services in urban entertainment systems, and healthcare devices such as ultrasound machines.

NVIDIA NeMo AI Guardrails Upgraded with Latest NIM Microservices

AI agents are poised to transform productivity for the world's billion knowledge workers with "knowledge robots" that can accomplish a variety of tasks. To develop AI agents, enterprises need to address critical concerns like trust, safety, security and compliance. New NVIDIA NIM microservices for AI guardrails—part of the NVIDIA NeMo Guardrails collection of software tools—are portable, optimized inference microservices that help companies improve the safety, precision and scalability of their generative AI applications.

Central to the orchestration of the microservices is NeMo Guardrails, part of the NVIDIA NeMo platform for curating, customizing and guardrailing AI. NeMo Guardrails helps developers integrate and manage AI guardrails in large language model (LLM) applications. Industry leaders Amdocs, Cerence AI and Lowe's are among those using NeMo Guardrails to safeguard AI applications. Developers can use the NIM microservices to build more secure, trustworthy AI agents that provide safe, appropriate responses within context-specific guidelines and are bolstered against jailbreak attempts. Deployed in customer service across industries like automotive, finance, healthcare, manufacturing and retail, the agents can boost customer satisfaction and trust.

Advantech Enhances AI and Hybrid Computing With Intel Core Ultra Processors (Series 2) S-Series

Advantech, a leader in embedded IoT computing solutions, is excited to introduce their AI-integrated desktop platform series: the Mini-ITX AIMB-2710 and Micro-ATX AIMB-589. These platforms are powered by Intel Core Ultra Processors (Series 2) S-Series, featuring the first desktop processor with an integrated NPU, delivering up to 36 TOPS for superior AI acceleration. Designed for data visualization and image analysis, both models offer PCIe Gen 5 support for high-performance GPU cards and feature 6400 MHz DDR5 memory, USB4 Type-C, multiple USB 3.2 Gen 2 and 2.5GbE LAN ports for fast and accurate data transmission in real time.

The AIMB-2710 and AIMB-589 excel in high-speed computing and AI-driven performance, making them ideal for applications such as medical imaging, automated optical inspection, and semiconductor testing. Backed by comprehensive software support, these platforms are engineered for the next wave of AI innovation.

NVIDIA NIM Microservices and AI Blueprints Usher in New Era of Local AI

Over the past year, generative AI has transformed the way people live, work and play, enhancing everything from writing and content creation to gaming, learning and productivity. PC enthusiasts and developers are leading the charge in pushing the boundaries of this groundbreaking technology. Countless times, industry-defining technological breakthroughs have been invented in one place—a garage. This week marks the start of the RTX AI Garage series, which will offer routine content for developers and enthusiasts looking to learn more about NVIDIA NIM microservices and AI Blueprints, and how to build AI agents, creative workflow, digital human, productivity apps and more on AI PCs. Welcome to the RTX AI Garage.

This first installment spotlights announcements made earlier this week at CES, including new AI foundation models available on NVIDIA RTX AI PCs that take digital humans, content creation, productivity and development to the next level. These models—offered as NVIDIA NIM microservices—are powered by new GeForce RTX 50 Series GPUs. Built on the NVIDIA Blackwell architecture, RTX 50 Series GPUs deliver up to 3,352 trillion AI operations per second of performance, 32 GB of VRAM and feature FP4 compute, doubling AI inference performance and enabling generative AI to run locally with a smaller memory footprint.

Qualcomm Launches On-Prem AI Appliance Solution and Inference Suite at CES 2025

At CES 2025, Qualcomm Technologies, Inc., today announced Qualcomm AI On-Prem Appliance Solution, an on-premises desktop or wall-mounted hardware solution, and Qualcomm AI Inference Suite, a set of software and services for AI inferencing spanning from near-edge to cloud. The combination of these new offerings allows for small and medium businesses, enterprises and industrial organizations to run custom and off-the-shelf AI applications on their premises, including generative workloads. Running AI inference on premises can deliver significant savings in operational costs and overall total cost of ownership (TCO), compared to the cost of renting third-party AI infrastructure.

Using the AI On-Prem Appliance Solution in concert with the AI Inference Suite, customers can now use generative AI leveraging their proprietary data, fine-tuned models, and technology infrastructure to automate human and machine processes and applications in virtually any end environment, such as retail stores, quick service restaurants, shopping outlets, dealerships, hospitals, factories and shop floors - where the workflow is well established, repeatable and ready for automation.

Microsoft Acquired Nearly 500,000 NVIDIA "Hopper" GPUs This Year

Microsoft is heavily investing in enabling its company and cloud infrastructure to support the massive AI expansion. The Redmond giant has acquired nearly half a million of the NVIDIA "Hopper" family of GPUs to support this effort. According to market research company Omdia, Microsoft was the biggest hyperscaler, with data center CapEx and GPU expenditure reaching a record high. The company acquired precisely 485,000 NVIDIA "Hopper" GPUs, including H100, H200, and H20, resulting in more than $30 billion spent on servers alone. To put things into perspective, this is about double that of the next-biggest GPU purchaser, Chinese ByteDance, who acquired about 230,000 sanction-abiding H800 GPUs and regular H100s sources from third parties.

Regarding US-based companies, the only ones that have come close to the GPU acquisition rate are Meta, Tesla/xAI, Amazon, and Google. They have acquired around 200,000 GPUs on average while significantly boosting their in-house chip design efforts. "NVIDIA GPUs claimed a tremendously high share of the server capex," Vlad Galabov, director of cloud and data center research at Omdia, noted, adding, "We're close to the peak." Hyperscalers like Amazon, Google, and Meta have been working on their custom solutions for AI training and inference. For example, Google has its TPU, Amazon has its Trainium and Inferentia chips, and Meta has its MTIA. Hyperscalers are eager to develop their in-house solutions, but NVIDIA's grip on the software stack paired with timely product updates seems hard to break. The latest "Blackwell" chips are projected to get even bigger orders, so only the sky (and the local power plant) is the limit.

Axelera AI Partners with Arduino for Edge AI Solutions

Axelera AI - a leading edge-inference company - and Arduino, the global leader in open-source hardware and software, today announced a strategic partnership to make high-performance AI at the edge more accessible than ever, building advanced technology solutions based on inference and an open ecosystem. This furthers Axelera AI's strategy to democratize artificial intelligence everywhere.

The collaboration will combine the strengths of Axelera AI's Metis AI Platform with the powerful SOMs from the Arduino Pro range to provide customers with easy-to-use hardware and software to innovate around AI. Users will enjoy the freedom to dictate their own AI journey, thanks to tools that provide unique digital in-memory computing and RISC-V controlled dataflow technology, delivering high performance and usability at a fraction of the cost and power of other solutions available today.

Advantech Unveils Hailo-8 Powered AI Acceleration Modules for High-Efficiency Vision AI Applications

Advantech, a leading provider of AIoT platforms and services, proudly unveils its latest AI acceleration modules: the EAI-1200 and EAI-3300, powered by Hailo-8 AI processors. These modules deliver AI performance of up to 52 TOPS while achieving more than 12 times the power efficiency of comparable AI modules and GPU cards. Designed in standard M.2 and PCIe form factors, the EAI-1200 and EAI-3300 can be seamlessly integrated with diverse x86 and Arm-based platforms, enabling quick upgrades of existing systems and boards to incorporate AI capabilities. With these AI acceleration modules, developers can run inference efficiently on the Hailo-8 NPU while handling application processing primarily on the CPU, optimizing resource allocation. The modules are paired with user-friendly software toolkits, including the Edge AI SDK for seamless integration with HailoRT, the Dataflow Compiler for converting existing models, and TAPPAS, which offers pre-trained application examples. These features accelerate the development of edge-based vision AI applications.

EAI-1200 M.2 AI Module: Accelerating Development for Vision AI Security
The EAI-1200 is an M.2 AI module powered by a single Hailo-8 VPU, delivering up to 26 TOPS of computing performance while consuming approximately 5 watts of power. An optional heatsink supports operation in temperatures ranging from -40 to 65°C, ensuring easy integration. This cost-effective module is especially designed to bundle with Advantech's systems and boards, such as the ARK-1221L, AIR-150, and AFE-R770, enhancing AI applications including baggage screening, workforce safety, and autonomous mobile robots (AMR).

SPEC Delivers Major SPECworkstation 4.0 Benchmark Update, Adds AI/ML Workloads

The Standard Performance Evaluation Corporation (SPEC), the trusted global leader in computing benchmarks, today announced the availability of the SPECworkstation 4.0 benchmark, a major update to SPEC's comprehensive tool designed to measure all key aspects of workstation performance. This significant upgrade from version 3.1 incorporates cutting-edge features to keep pace with the latest workstation hardware and the evolving demands of professional applications, including the increasing reliance on data analytics, AI and machine learning (ML).

The new SPECworkstation 4.0 benchmark provides a robust, real-world measure of CPU, graphics, accelerator, and disk performance, ensuring professionals have the data they need to make informed decisions about their hardware investments. The benchmark caters to the diverse needs of engineers, scientists, and developers who rely on workstation hardware for daily tasks. It includes real-world applications like Blender, Handbrake, LLVM and more, providing a comprehensive performance measure across seven different industry verticals, each focusing on specific use cases and subsystems critical to workstation users. SPECworkstation 4.0 benchmark marks a significant milestone for measuring workstation AI performance, providing an unbiased, real-world, application-driven tool for measuring how workstations handle AI/ML workloads.

Amazon AWS Announces General Availability of Trainium2 Instances, Reveals Details of Next Gen Trainium3 Chip

At AWS re:Invent, Amazon Web Services, Inc. (AWS), an Amazon.com, Inc. company, today announced the general availability of AWS Trainium2-powered Amazon Elastic Compute Cloud (Amazon EC2) instances, introduced new Trn2 UltraServers, enabling customers to train and deploy today's latest AI models as well as future large language models (LLM) and foundation models (FM) with exceptional levels of performance and cost efficiency, and unveiled next-generation Trainium3 chips.

"Trainium2 is purpose built to support the largest, most cutting-edge generative AI workloads, for both training and inference, and to deliver the best price performance on AWS," said David Brown, vice president of Compute and Networking at AWS. "With models approaching trillions of parameters, we understand customers also need a novel approach to train and run these massive workloads. New Trn2 UltraServers offer the fastest training and inference performance on AWS and help organizations of all sizes to train and deploy the world's largest models faster and at a lower cost."

AMD Releases ROCm 6.3 with SGLang, Fortran Compiler, Multi-Node FFT, Vision Libraries, and More

AMD has released the new ROCm 6.3 version which introduces several new features and optimizations, including SGLang integration for accelerated AI inferencing, a re-engineered FlashAttention-2 for optimized AI training and inference, the introduction of multi-node Fast Fourier Transform (FFT), new Fortran compiler, and enhanced computer vision libraries like rocDecode, rocJPEG, and rocAL.

According to AMD, the SGLang, a runtime that is now supported by ROCm 6.3, is purpose-built for optimizing inference on models like LLMs and VLMs on AMD Instinct GPUs, and promises 6x higher throughput and much easier usage thanks to Python-integrated and pre-configured ROCm Docker containers. In addition, the AMD ROCm 6.3 also brings further transformer optimizations with FlashAttention-2, which should bring significant improvements in forward and backward pass compared to FlashAttention-1, a whole new AMD Fortran compiler with direct GPU offloading, backward compatibility, and integration with HIP Kernels and ROCm libraries, a whole new multi-node FFT support in rocFFT, which simplifies multi-node scaling and improved scalability, as well as enhanced computer vision libraries, rocDecode, rocJPEG, and rocAL, for AV1 codec support, GPU-accelerated JPEG decoding, and better audio augmentation.

Corsair by d-Matrix Enables GPU-Free AI Inference

d-Matrix today unveiled Corsair, an entirely new computing paradigm designed from the ground-up for the next era of AI inference in modern datacenters. Corsair leverages d-Matrix's innovative Digital In-Memory Compute architecture (DIMC), an industry first, to accelerate AI inference workloads with industry-leading real-time performance, energy efficiency, and cost savings as compared to GPUs and other alternatives.

The emergence of reasoning agents and interactive video generation represents the next level of AI capabilities. These leverage more inference computing power to enable models to "think" more and produce higher quality outputs. Corsair is the ideal inference compute solution with which enterprises can unlock new levels of automation and intelligence without compromising on performance, cost or power.

Hypertec Introduces the World's Most Advanced Immersion-Born GPU Server

Hypertec proudly announces the launch of its latest breakthrough product, the TRIDENT iG series, an immersion-born GPU server line that brings extreme density, sustainability, and performance to the AI and HPC community. Purpose-built for the most demanding AI applications, this cutting-edge server is optimized for generative AI, machine learning (ML), deep learning (DL), large language model (LLM) training, inference, and beyond. With up to six of the latest NVIDIA GPUs in a 2U form factor, a staggering 8 TB of memory with enhanced RDMA capabilities, and groundbreaking density supporting up to 200 GPUs per immersion tank, the TRIDENT iG server line is a game-changer for AI infrastructure.

Additionally, the server's innovative design features a single or dual root complex, enabling greater flexibility and efficiency for GPU usage in complex workloads.

ASUS Presents All-New Storage-Server Solutions to Unleash AI Potential at SC24

ASUS today announced its groundbreaking next-generation infrastructure solutions at SC24, featuring a comprehensive lineup powered by AMD and Intel, as well as liquid-cooling solutions designed to accelerate the future of AI. By continuously pushing the limits of innovation, ASUS simplifies the complexities of AI and high-performance computing (HPC) through adaptive server solutions paired with expert cooling and software-development services, tailored for the exascale era and beyond. As a total-solution provider with a distinguished history in pioneering AI supercomputing, ASUS is committed to delivering exceptional value to its customers.

Comprehensive Lineup for AI and HPC Success
To fuel enterprise digital transformation through HPC and AI-driven architecture, ASUS provides a full lineup of server systems that are powered by AMD and Intel. Startups, research institutions, large enterprises or government organizations all could find the adaptive solutions to unlock value and accelerate business agility from the big data.

Q.ANT Introduces First Commercial Photonic Processor

Q.ANT, the leading startup for photonic computing, today announced the launch of its first commercial product - a photonics-based Native Processing Unit (NPU) built on the company's compute architecture LENA - Light Empowered Native Arithmetics. The product is fully compatible with today's existing computing ecosystem as it comes on the industry-standard PCI-Express. The Q.ANT NPU executes complex, non-linear mathematics natively using light instead of electrons, promising to deliver at least 30 times greater energy efficiency and significant computational speed improvements over traditional CMOS technology. Designed for compute-intensive applications such as AI Inference, machine learning, and physics simulation, the Q.ANT NPU has been proven to solve real-world challenges, including number recognition for deep neural network inference (see the recent press release regarding Cloud Access to NPU).

"With our photonic chip technology now available on the standard PCIe interface, we're bringing the incredible power of photonics directly into real-world applications. For us, this is not just a processor—it's a statement of intent: Sustainability and performance can go hand in hand," said Dr. Michael Förtsch, CEO of Q.ANT. "For the first time, developers can create AI applications and explore the capabilities of photonic computing, particularly for complex, nonlinear calculations. For example, experts calculated that one GPT-4 query today uses 10 times more electricity than a regular internet search request. Our photonic computing chips offer the potential to reduce the energy consumption for that query by a factor of 30."

Etched Introduces AI-Powered Games Without GPUs, Displays Minecraft Replica

The gaming industry is about to get massively disrupted. Instead of using game engines to power games, we are now witnessing an entirely new and crazy concept. A startup specializing in designing ASICs specifically for Transformer architecture, the foundation behind generative AI models like GPT/Claude/Stable Diffusion, has showcased a demo in partnership with Decart of a Minecraft clone being entirely generated and operated by AI instead of the traditional game engine. While we use AI to create images and videos based on specific descriptions and output pretty realistic content, having an AI model spit out an entire playable game is something different. Oasis is the first playable, real-time, real-time, open-world AI model that takes users' input and generates real-time gameplay, including physics, game rules, and graphics.

An interesting thing to point out is the hardware that powers this setup. Using a single NVIDIA H100 GPU, this 500-million parameter Oasis model can run at 720p resolution at 20 generated frames per second. Due to limitations of accelerators like NVIDIA's H100/B200, gameplay at 4K is almost impossible. However, Etched has its own accelerator called Sohu, which is specialized in accelerating transformer architectures. Eight NVIDIA H100 GPUs can power five Oasis models to five users, while the eight Sohu cards are capable of serving 65 Oasis runs to 65 users. This is more than a 10x increase in inference capability compared to NVIDIA's hardware on a single-use case alone. The accelerator is designed to run much larger models like future 100 billion-parameter generative AI video game models that can output 4K 30 FPS, all thanks to 144 GB of HBM3E memory, yielding 1,152 GB in eight-accelerator server configuration.

Cisco Unveils Plug-and-Play AI Solutions Powered by NVIDIA H100 and H200 Tensor Core GPUs

Today, Cisco announced new additions to its data center infrastructure portfolio: an AI server family purpose-built for GPU-intensive AI workloads with NVIDIA accelerated computing, and AI PODs to simplify and de-risk AI infrastructure investment. They give organizations an adaptable and scalable path to AI, supported by Cisco's industry-leading networking capabilities.

"Enterprise customers are under pressure to deploy AI workloads, especially as we move toward agentic workflows and AI begins solving problems on its own," said Jeetu Patel, Chief Product Officer, Cisco. "Cisco innovations like AI PODs and the GPU server strengthen the security, compliance, and processing power of those workloads as customers navigate their AI journeys from inferencing to training."
Return to Keyword Browsing
Feb 21st, 2025 05:16 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts