News Posts matching #inference

Return to Keyword Browsing

AMD Introduces GAIA - an Open-Source Project That Runs Local LLMs on Ryzen AI NPUs

AMD has launched a new open-source project called, GAIA (pronounced /ˈɡaɪ.ə/), an awesome application that leverages the power of Ryzen AI Neural Processing Unit (NPU) to run private and local large language models (LLMs). In this blog, we'll dive into the features and benefits of GAIA, while introducing how you can take advantage of GAIA's open-source project to adopt into your own applications.

Introduction to GAIA
GAIA is a generative AI application designed to run local, private LLMs on Windows PCs and is optimized for AMD Ryzen AI hardware (AMD Ryzen AI 300 Series Processors). This integration allows for faster, more efficient processing - i.e. lower power- while keeping your data local and secure. On Ryzen AI PCs, GAIA interacts with the NPU and iGPU to run models seamlessly by using the open-source Lemonade (LLM-Aid) SDK from ONNX TurnkeyML for LLM inference. GAIA supports a variety of local LLMs optimized to run on Ryzen AI PCs. Popular models like Llama and Phi derivatives can be tailored for different use cases, such as Q&A, summarization, and complex reasoning tasks.

NVIDIA & Storage Industry Leaders Unveil New Class of Enterprise Infrastructure for the AI Era

At GTC 2025, NVIDIA announced the NVIDIA AI Data Platform, a customizable reference design that leading providers are using to build a new class of AI infrastructure for demanding AI inference workloads: enterprise storage platforms with AI query agents fueled by NVIDIA accelerated computing, networking and software. Using the NVIDIA AI Data Platform, NVIDIA-Certified Storage providers can build infrastructure to speed AI reasoning workloads with specialized AI query agents. These agents help businesses generate insights from data in near real time, using NVIDIA AI Enterprise software—including NVIDIA NIM microservices for the new NVIDIA Llama Nemotron models with reasoning capabilities—as well as the new NVIDIA AI-Q Blueprint.

Storage providers can optimize their infrastructure to power these agents with NVIDIA Blackwell GPUs, NVIDIA BlueField DPUs, NVIDIA Spectrum-X networking and the NVIDIA Dynamo open-source inference library. Leading data platform and storage providers—including DDN, Dell Technologies, Hewlett Packard Enterprise, Hitachi Vantara, IBM, NetApp, Nutanix, Pure Storage, VAST Data and WEKA—are collaborating with NVIDIA to create customized AI data platforms that can harness enterprise data to reason and respond to complex queries. "Data is the raw material powering industries in the age of AI," said Jensen Huang, founder and CEO of NVIDIA. "With the world's storage leaders, we're building a new class of enterprise infrastructure that companies need to deploy and scale agentic AI across hybrid data centers."

Tencent Will Launch Hunyuan T1 Inference Model on March 21

Tencent's large language model (LLM) specialist division has announced the imminent launch of their T1 AI inference model. The Chinese technology giant's Hunyuan social media accounts revealed a grand arrival, scheduled to take place on Friday (March 21). A friendly reminder was issued to interested parties, regarding the upcoming broadcast/showcase: "please set aside your valuable time. Let's step into T1 together." Earlier in the week, the Tencent AI team started to tease their "first ultra-large Mamba-powered reasoning model." Local news reports have highlighted Hunyuan's claim of Mamba architecture being applied losslessly to a super-large Mixture of Experts (MoE) model.

Late last month, the company released its Hunyuan Turbo S AI model—advertised as offering faster replies than DeepSeek's R1 system. Tencent's plucky solution has quickly climbed up the Chatbot Arena LLM Leaderboard. The Hunyuan team was in a boastful mood earlier today, and loudly proclaimed that their proprietary Turbo S model had charted in fifteenth place. At the time of writing, DeepSeek R1 is ranked seventh on the leaderboard. As explained by ITHome, this community-driven platform is driven by users interactions: "with multiple models anonymously, voting to decide which model is better, and then generating a ranking list based on the scores. This kind of evaluation is also seen as an arena for big models to compete directly, which is simple and direct."

Supermicro Adds Portfolio for Next Wave of AI with NVIDIA Blackwell Ultra Solutions

Supermicro, Inc., a Total IT Solution Provider for AI, Cloud, Storage, and 5G/Edge, is announcing new systems and rack solutions powered by the NVIDIA's Blackwell Ultra platform, featuring the NVIDIA HGX B300 NVL16 and NVIDIA GB300 NVL72 platforms. Supermicro and NVIDIA's new AI solutions strengthen leadership in AI by delivering breakthrough performance for the most compute-intensive AI workloads, including AI reasoning, agentic AI, and video inference applications.

"At Supermicro, we are excited to continue our long-standing partnership with NVIDIA to bring the latest AI technology to market with the NVIDIA Blackwell Ultra Platforms," said Charles Liang, president and CEO, Supermicro. "Our Data Center Building Block Solutions approach has streamlined the development of new air and liquid-cooled systems, optimized to the thermals and internal topology of the NVIDIA HGX B300 NVL16 and GB300 NVL72. Our advanced liquid-cooling solution delivers exceptional thermal efficiency, operating with 40℃ warm water in our 8-node rack configuration, or 35℃ warm water in double-density 16-node rack configuration, leveraging our latest CDUs. This innovative solution reduces power consumption by up to 40% while conserving water resources, providing both environmental and operational cost benefits for enterprise data centers."

Google Teams up with MediaTek for Next-Generation TPU v7 Design

According to Reuters, citing The Information, Google will collaborate with MediaTek to develop its seventh-generation Tensor Processing Unit (TPU), which is also known as TPU v7. Google maintains its existing partnership with Broadcom despite the new MediaTek collaboration. The AI accelerator is scheduled for production in 2026, and TSMC is handling manufacturing duties. Google will lead the core architecture design while MediaTek manages I/O and peripheral components, as Economic Daily News reports. This differs from Google's ongoing relationship with Broadcom, which co-develops core TPU architecture. The MediaTek partnership reportedly stems from the company's strong TSMC relationship and lower costs compared to Broadcom.

There is also a possibility that MediaTek could design inference-focused TPU v7 chips while Broadcom focuses on training architecture. Nonetheless, the development of TPU is a massive market as Google is using so many chips that it could use a third company, hypothetically. The development of TPU continues Google's vertical integration strategy for AI infrastructure. Google reduces dependency on NVIDIA hardware by designing proprietary AI chips for internal R&D and cloud operations. At the same time, competitors like OpenAI, Anthropic, and Meta rely heavily on NVIDIA's processors for AI training and inference. At Google's scale, serving billions of queries a day, designing custom chips makes sense from both financial and technological sides. As Google develops its own specific workloads, translating that into hardware acceleration is the game that Google has been playing for years now.

MSI IPC Introduces High-Performance ATX Motherboard MS-CF05, V2.0 for Industrial and Embedded Applications

MSI IPC, a global leader in industrial computing solutions, proudly unveils the MS-CF05, V2.0, a high-performance ATX motherboard engineered for industrial automation, edge computing, and embedded applications. Designed to support 14th, 13th, and 12th Gen Intel Core, Pentium, and Celeron processors, the MS-CF05, V2.0 delivers exceptional computing power, scalability, and I/O flexibility, including a PCI slot for legacy support, making it ideal for modern industrial needs.

AMD Recommends EPYC Processors for Everyday AI Server Tasks

Ask a typical IT professional today whether they're leveraging AI, and there's a good chance they'll say yes-after all, they have reputations to protect! Kidding aside, many will report that their teams may use Web-based tools like ChatGPT or even have internal chatbots that serve their employee base on their intranet, but for that not much AI is really being implemented at the infrastructure level. As it turns out, the true answer is a bit different. AI tools and techniques have embedded themselves firmly into standard enterprise workloads and are a more common, everyday phenomena than even many IT people may realize. Assembly line operations now include computer vision-powered inspections. Supply chains use AI for demand forecasting making business move faster and of course, AI note-taking and meeting summary is embedded on virtually all the variants of collaboration and meeting software.

Increasingly, critical enterprise software tools incorporate built-in recommendation systems, virtual agents or some other form of AI-enabled assistance. AI is truly becoming a pervasive, complementary tool for everyday business. At the same time, today's enterprises are navigating a hybrid landscape where traditional, mission-critical workloads coexist with innovative AI-driven tasks. This "mixed enterprise and AI" workload environment calls for infrastructure that can handle both types of processing seamlessly. Robust, general-purpose CPUs like the AMD EPYC processors are designed to be powerful and secure and flexible to address this need. They handle everyday tasks—running databases, web servers, ERP systems—and offer strong security features crucial for enterprise operations augmented with AI workloads. In essence, modern enterprise infrastructure is about creating a balanced ecosystem. AMD EPYC CPUs play a pivotal role in creating this balance, delivering high performance, efficiency, and security features that underpin both traditional enterprise workloads and advanced AI operations.

Meta Reportedly Reaches Test Phase with First In-house AI Training Chip

According to a Reuters technology report, Meta's engineering department is engaged in the testing of their "first in-house chip for training artificial intelligence systems." Two inside sources have declared this significant development milestone; involving a small-scale deployment of early samples. The owner of Facebook could ramp up production, upon initial batches passing muster. Despite a recent-ish showcasing of an open-architecture NVIDIA "Blackwell" GB200 system for enterprise, Meta leadership is reported to be pursuing proprietary solutions. Multiple big players—in the field of artificial intelligence—are attempting to breakaway from a total reliance on Team Green. Last month, press outlets concentrated on OpenAI's alleged finalization of an in-house design, with rumored involvement coming from Broadcom and TSMC.

One of the Reuters industry moles believes that Meta has signed up with TSMC—supposedly, the Taiwanese foundry was responsible for the production of test batches. Tom's Hardware reckons that Meta and Broadcom were working together with the tape out of the social media giant's "first AI training accelerator." Development of the company's "Meta Training and Inference Accelerator" (MTIA) series has stretched back a couple of years—according to Reuters, this multi-part project: "had a wobbly start for years, and at one point scrapped a chip at a similar phase of development...Meta last year, started using an MTIA chip to perform inference, or the process involved in running an AI system as users interact with it, for the recommendation systems that determine which content shows up on Facebook and Instagram news feeds." Leadership is reportedly aiming to get custom silicon solutions up and running for AI training by next year. Past examples of MTIA hardware were deployed with open-source RISC-V cores (for inference tasks), but is not clear whether this architecture will form the basis of Meta's latest AI chip design.

SOPHGO Unveils New Products at the 2025 China RISC-V Ecosystem Conference

On February 27-28, the 2025 China RISC-V Ecosystem Conference was grandly held at the Zhongguancun International Innovation Center in Beijing. As a core promoter in the RISC-V field, SOPHGO was invited to deliver a speech and prominently launch a series of new products based on the SG2044 chip, sharing the company's cutting-edge practices in the heterogeneous fusion of AI and RISC-V, and contributing to the vigorous development of the global open-source instruction set ecosystem. During the conference, SOPHGO set up a distinctive exhibition area that attracted many attendees from the industry to stop and watch.

Focusing on AI Integration, Leading Breakthroughs in RISC-V Technology
At the main forum of the conference, the Vice President of SOPHGO RISC-V delivered a speech titled "RISC-V Breakthroughs Driven by AI: Integration + Heterogeneous Innovation," where he elaborated on SOPHGO's innovative achievements in the deep integration of RISC-V architecture and artificial intelligence technology. He pointed out that current AI technological innovations are driving market changes, and the emergence of DeepSeek has ignited a trillion-level computing power market. The innovation of technical paradigms and the penetration of large models into various sectors will lead to an explosive growth in inference demand, resulting in changes in the structure of computing power demand. This will also reshape the landscape of the computing power market, bringing significant business opportunities to domestic computing power enterprises, while RISC-V high-performance computing is entering a fast track of development driven by AI.

NVIDIA Explains How CUDA Libraries Bolster Cybersecurity With AI

Traditional cybersecurity measures are proving insufficient for addressing emerging cyber threats such as malware, ransomware, phishing and data access attacks. Moreover, future quantum computers pose a security risk to today's data through "harvest now, decrypt later" attack strategies. Cybersecurity technology powered by NVIDIA accelerated computing and high-speed networking is transforming the way organizations protect their data, systems and operations. These advanced technologies not only enhance security but also drive operational efficiency, scalability and business growth.

Accelerated AI-Powered Cybersecurity
Modern cybersecurity relies heavily on AI for predictive analytics and automated threat mitigation. NVIDIA GPUs are essential for training and deploying AI models due to their exceptional computational power.

Advantech Unveils New AI, Industrial and Network Edge Servers Powered by Intel Xeon 6 Processors

Advantech, a global leader in industrial and embedded computing, today announced the launch of seven new server platforms built on Intel Xeon 6 processors, optimized for industrial, transportation and communications applications. Designed to meet the increasing demands of complex AI storage and networking workloads, these innovative edge servers and network appliances provide superior performance, reliability, and scalability for system integrators, solutions, and service provider customers.

Intel Xeon 6 - Exceptional Performance for the Widest Range of Workloads
Intel Xeon 6 processors feature advanced performance and efficiency cores, delivering up to 86 cores per CPU, DDR5 memory support with speeds up to 6400 MT/s, and PCIe Gen 5 lanes for high-speed connectivity. Designed to optimize both compute-intensive and scale-out workloads, these processors ensure seamless integration across a wide array of applications.

MSI Announces New Server Platforms Supporting Intel Xeon 6 Family of Processors

MSI introduces new server platforms powered by the latest Intel Xeon 6 family of processors with the Performance Cores (P-Cores). Engineered for high-density performance, seamless scalability, and energy-efficient operations, these servers deliver exceptional throughput, dynamic workload flexibility, and optimized power efficiency. Optimized for AI-driven applications, modern data centers, and cloud-native workloads, MSI's new platforms help lower total cost of ownership (TCO) while maximizing infrastructure efficiency and resource optimization.

"As data-driven transformation accelerates across industries, businesses require solutions that not only deliver performance but also enable sustainable growth and operational agility," said Danny Hsu, General Manager of MSI's Enterprise Platform Solutions. "Our Intel Xeon 6 processor-based servers are designed to support this shift by offering high-core scalability, energy-efficient performance, and dynamic workload optimization. These capabilities empower organizations to maximize compute density, streamline their digital ecosystems, and respond to evolving market demands with greater speed and efficiency."

OnLogic Reveals the Axial AX300 Edge Server

OnLogic, a leading provider of edge computing solutions, has launched the Axial AX300, a highly customizable and powerful edge server. The AX300 is engineered to help businesses of any size better leverage their on-site data and unlock the potential of AI by placing powerful computing capabilities on-site.

The Axial AX300 empowers organizations to seamlessly move computing resources closer to the data source, providing significant advantages in performance, latency, operational efficiency, and total cost of ownership over cloud-based data management. With its robust design, flexible configuration options, and advanced security features, the Axial AX300 is the ideal platform for a wide range of highly-impactful edge computing applications, including:
  • AI/ML inference and training: Leveraging the power of AI/ML at the edge for real-time insights, predictive maintenance, and improved decision-making.
  • Data analytics: Processing and analyzing data generated by IoT devices and sensors in real-time to improve operational efficiency.
  • Virtualization: Consolidating multiple workloads onto a single server, optimizing resource utilization and simplifying deployment and management.

ASRock Industrial iEP-6010E Series Now Powered by NVIDIA Jetson Orin NX & Nano Super Mode

The future of AI at the edge has arrived, and ASRock Industrial is leading the way. With Super Mode support for the iEP-6010E Series and iEP-6010E Series Dev Kit, powered by NIVIDIA Jetson Orin NX and Jetson Orin Nano modules and optimized with NVIDIA JetPack 6.2 SDK, this upgrade supercharges AI acceleration, unlocking up to 2X faster generative AI performance. Delivering unprecedented inference speeds, real-time decision-making, and superior power efficiency, Super Mode is redefining edge AI computing - Powering everything from predictive maintenance and anomaly detection to AI-driven medical imaging, intelligent surveillance, and autonomous mobility - pushing performance beyond limits, faster, smarter, and more efficient than ever before. Whether driving Edge AI computing, real-time vision processing, intelligent autonomous vehicles, advanced robotics, industrial automation, or smart city infrastructure, ASRock Industrial's latest innovation empowers industries with cutting-edge AI capabilities that drive real-world transformation.

A Quantum Leap in Edge AI Performance
The iEP-6010E Series and iEP-6010E Series Dev Kit, now optimized with Super Mode, provides:
  • Up to 2X Faster Gen AI Processing - Up to 157 TOPS (Sparse) on Orin NX 16 GB (Super) and 67 TOPS (Sparse) on Orin Nano 8 GB (Super), delivering blazing-fast generative AI performance for automation, robotics, and real-time vision systems.
  • Optimized Power Boost and Efficiency - Supports power modes up to 40 W / MAXN Super on Orin NX and 25 W / MAXN Super on Orin Nano, while maintaining high-performance AI workloads in power-sensitive environments, making it ideal for industrial automation and smart city infrastructure.
  • NVIDIA JetPack 6.2 Support - Fully compatible with the latest NVIDIA JetPack 6.2 SDK, offering enhanced libraries, frameworks, and tools to accelerate advanced AI development.

CoreWeave Launches Debut Wave of NVIDIA GB200 NVL72-based Cloud Instances

AI reasoning models and agents are set to transform industries, but delivering their full potential at scale requires massive compute and optimized software. The "reasoning" process involves multiple models, generating many additional tokens, and demands infrastructure with a combination of high-speed communication, memory and compute to ensure real-time, high-quality results. To meet this demand, CoreWeave has launched NVIDIA GB200 NVL72-based instances, becoming the first cloud service provider to make the NVIDIA Blackwell platform generally available. With rack-scale NVIDIA NVLink across 72 NVIDIA Blackwell GPUs and 36 NVIDIA Grace CPUs, scaling to up to 110,000 GPUs with NVIDIA Quantum-2 InfiniBand networking, these instances provide the scale and performance needed to build and deploy the next generation of AI reasoning models and agents.

NVIDIA GB200 NVL72 on CoreWeave
NVIDIA GB200 NVL72 is a liquid-cooled, rack-scale solution with a 72-GPU NVLink domain, which enables the six dozen GPUs to act as a single massive GPU. NVIDIA Blackwell features many technological breakthroughs that accelerate inference token generation, boosting performance while reducing service costs. For example, fifth-generation NVLink enables 130 TB/s of GPU bandwidth in one 72-GPU NVLink domain, and the second-generation Transformer Engine enables FP4 for faster AI performance while maintaining high accuracy. CoreWeave's portfolio of managed cloud services is purpose-built for Blackwell. CoreWeave Kubernetes Service optimizes workload orchestration by exposing NVLink domain IDs, ensuring efficient scheduling within the same rack. Slurm on Kubernetes (SUNK) supports the topology block plug-in, enabling intelligent workload distribution across GB200 NVL72 racks. In addition, CoreWeave's Observability Platform provides real-time insights into NVLink performance, GPU utilization and temperatures.

Moore Threads Teases Excellent Performance of DeepSeek-R1 Model on MTT GPUs

Moore Threads, a Chinese manufacturer of proprietary GPU designs is (reportedly) the latest company to jump onto the DeepSeek-R1 bandwagon. Since late January, NVIDIA, Microsoft and AMD have swooped in with their own interpretations/deployments. By global standards, Moore Threads GPUs trail behind Western-developed offerings—early 2024 evaluations presented the firm's MTT S80 dedicated desktop graphics card struggling against an AMD integrated solution: Radeon 760M. The recent emergence of DeepSeek's open source models has signalled a shift away from reliance on extremely powerful and expensive AI-crunching hardware (often accessed via the cloud)—widespread excitement has been generated by DeepSeek solutions being relatively frugal, in terms of processing requirements. Tom's Hardware has observed cases of open source AI models running (locally) on: "inexpensive hardware, like the Raspberry Pi."

According to recent Chinese press coverage, Moore Threads has announced a successful deployment of DeepSeek's R1-Distill-Qwen-7B distilled model on the aforementioned MTT S80 GPU. The company also revealed that it had taken similar steps with its MTT S4000 datacenter-oriented graphics hardware. On the subject of adaptation, a Moore Threads spokesperson stated: "based on the Ollama open source framework, Moore Threads completed the deployment of the DeepSeek-R1-Distill-Qwen-7B distillation model and demonstrated excellent performance in a variety of Chinese tasks, verifying the versatility and CUDA compatibility of Moore Threads' self-developed full-featured GPU." Exact performance figures, benchmark results and technical details were not disclosed to the Chinese public, so Moore Threads appears to be teasing the prowess of its MTT GPU designs. ITHome reported that: "users can also perform inference deployment of the DeepSeek-R1 distillation model based on MTT S80 and MTT S4000. Some users have previously completed the practice manually on MTT S80." Moore Threads believes that its: "self-developed high-performance inference engine, combined with software and hardware co-optimization technology, significantly improves the model's computing efficiency and resource utilization through customized operator acceleration and memory management. This engine not only supports the efficient operation of the DeepSeek distillation model, but also provides technical support for the deployment of more large-scale models in the future."

Huawei Delivers Record $118 Billion Revenue with 22% Yearly Growth Despite US Sanctions

Huawei Technologies reported a robust 22% year-over-year revenue increase for 2024, reaching 860 billion yuan ($118.27 billion), demonstrating remarkable resilience amid continued US-imposed trade restrictions. The Chinese tech giant's resurgence was primarily driven by its revitalized smartphone division, which captured 16% of China's domestic market share, overtaking Apple in regional sales. This achievement was notably accomplished by deploying domestically produced chipsets, marking a significant milestone for the company. In collaboration with Chinese SMIC, Huawei delivers in-house silicon solutions to integrate with HarmonyOS for complete vertical integration. The company's strategic diversification into automotive technology has emerged as a crucial growth vector, with its smart car solutions unit delivering autonomous driving software and specialized chips to Chinese EV manufacturers.

In parallel, Huawei's Ascend AI 910B/C platform recently announced compatibility with DeepSeek's R1 large language model and announced availability on Chinese AI cloud providers like SiliconFlow. Through a strategic partnership with AI infrastructure startup SiliconFlow, Huawei is enhancing its Ascend cloud service capabilities, further strengthening its competitive position in the global AI hardware market despite ongoing international trade challenges. Even if the company can't compete on performance versus the latest solutions from NVIDIA and AMD due to the lack of advanced manufacturing required for AI accelerators, it can compete on costs and deliver solutions that are typically much more competitive with the price/performance ratio. Huawei's Ascend AI solutions deliver modest performance. Still, the pricing makes AI model inference very cheap, with API costs of around one Yaun per million input tokens and four Yuan per one million output tokens on DeepSeek R1.

ASUS AI POD With NVIDIA GB200 NVL72 Platform Ready to Ramp-Up Production for Scheduled Shipment in March

ASUS is proud to announce that ASUS AI POD, featuring the NVIDIA GB200 NVL72 platform, is ready to ramp-up production for a scheduled shipping date of March 2025. ASUS remains dedicated to providing comprehensive end-to-end solutions and software services, encompassing everything from AI supercomputing to cloud services. With a strong focus on fostering AI adoption across industries, ASUS is positioned to empower clients in accelerating their time to market by offering a full spectrum of solutions.

Proof of concept, funded by ASUS
Honoring the commitment to delivering exceptional value to clients, ASUS is set to launch a proof of concept (POC) for the groundbreaking ASUS AI POD, powered by the NVIDIA Blackwell platform. This exclusive opportunity is now open to a select group of innovators who are eager to harness the full potential of AI computing. Innovators and enterprises can experience firsthand the full potential of AI and deep learning solutions at exceptional scale. To take advantage of this limited-time offer, please complete this surveyi at: forms.office.com/r/FrAbm5BfH2. The expert ASUS team of NVIDIA GB200 specialists will guide users through the next steps.

NVIDIA GeForce RTX 50 Series AI PCs Accelerate DeepSeek Reasoning Models

The recently released DeepSeek-R1 model family has brought a new wave of excitement to the AI community, allowing enthusiasts and developers to run state-of-the-art reasoning models with problem-solving, math and code capabilities, all from the privacy of local PCs. With up to 3,352 trillion operations per second of AI horsepower, NVIDIA GeForce RTX 50 Series GPUs can run the DeepSeek family of distilled models faster than anything on the PC market.

A New Class of Models That Reason
Reasoning models are a new class of large language models (LLMs) that spend more time on "thinking" and "reflecting" to work through complex problems, while describing the steps required to solve a task. The fundamental principle is that any problem can be solved with deep thought, reasoning and time, just like how humans tackle problems. By spending more time—and thus compute—on a problem, the LLM can yield better results. This phenomenon is known as test-time scaling, where a model dynamically allocates compute resources during inference to reason through problems. Reasoning models can enhance user experiences on PCs by deeply understanding a user's needs, taking actions on their behalf and allowing them to provide feedback on the model's thought process—unlocking agentic workflows for solving complex, multi-step tasks such as analyzing market research, performing complicated math problems, debugging code and more.

NVIDIA AI Helps Fight Against Fraud Across Many Sectors

Companies and organizations are increasingly using AI to protect their customers and thwart the efforts of fraudsters around the world. Voice security company Hiya found that 550 million scam calls were placed per week in 2023, with INTERPOL estimating that scammers stole $1 trillion from victims that same year. In the U.S., one of four noncontact-list calls were flagged as suspected spam, with fraudsters often luring people into Venmo-related or extended warranty scams.

Traditional methods of fraud detection include rules-based systems, statistical modeling and manual reviews. These methods have struggled to scale to the growing volume of fraud in the digital era without sacrificing speed and accuracy. For instance, rules-based systems often have high false-positive rates, statistical modeling can be time-consuming and resource-intensive, and manual reviews can't scale rapidly enough.

Advantech Unveils New Lineup Powered by AMD Ryzen Embedded 8000 Series Processors

Advantech, a global leader in embedded IoT computing, is excited to launch its latest Edge AI solutions featuring the advanced AMD Ryzen Embedded 8000 Series processors. The SOM-6873 (COMe Compact), AIMB-2210 (Mini-ITX motherboards), and AIR-410 (AI Inference System) deliver exceptional AI performance leveraging the first AMD embedded devices with integrated Neural Processing Units. These integrated NPUs are optimized to enhance AI inference efficiency and precision. Together with traditional CPU and GPU elements, the architecture delivers performance up to 39 TOPS. They also support dual-channel DDR5 memory and PCIe Gen 4, providing ample computing power with flexible TDP options and thermal solutions.

Within value-added software services, such as the Edge AI SDK that integrates the AMD Ryzen AI software, the model porting procedure is accelerated. These features make the solutions ideal for edge applications such as HMI, machine vision in industrial automation, smart management and interactive services in urban entertainment systems, and healthcare devices such as ultrasound machines.

NVIDIA NeMo AI Guardrails Upgraded with Latest NIM Microservices

AI agents are poised to transform productivity for the world's billion knowledge workers with "knowledge robots" that can accomplish a variety of tasks. To develop AI agents, enterprises need to address critical concerns like trust, safety, security and compliance. New NVIDIA NIM microservices for AI guardrails—part of the NVIDIA NeMo Guardrails collection of software tools—are portable, optimized inference microservices that help companies improve the safety, precision and scalability of their generative AI applications.

Central to the orchestration of the microservices is NeMo Guardrails, part of the NVIDIA NeMo platform for curating, customizing and guardrailing AI. NeMo Guardrails helps developers integrate and manage AI guardrails in large language model (LLM) applications. Industry leaders Amdocs, Cerence AI and Lowe's are among those using NeMo Guardrails to safeguard AI applications. Developers can use the NIM microservices to build more secure, trustworthy AI agents that provide safe, appropriate responses within context-specific guidelines and are bolstered against jailbreak attempts. Deployed in customer service across industries like automotive, finance, healthcare, manufacturing and retail, the agents can boost customer satisfaction and trust.

Advantech Enhances AI and Hybrid Computing With Intel Core Ultra Processors (Series 2) S-Series

Advantech, a leader in embedded IoT computing solutions, is excited to introduce their AI-integrated desktop platform series: the Mini-ITX AIMB-2710 and Micro-ATX AIMB-589. These platforms are powered by Intel Core Ultra Processors (Series 2) S-Series, featuring the first desktop processor with an integrated NPU, delivering up to 36 TOPS for superior AI acceleration. Designed for data visualization and image analysis, both models offer PCIe Gen 5 support for high-performance GPU cards and feature 6400 MHz DDR5 memory, USB4 Type-C, multiple USB 3.2 Gen 2 and 2.5GbE LAN ports for fast and accurate data transmission in real time.

The AIMB-2710 and AIMB-589 excel in high-speed computing and AI-driven performance, making them ideal for applications such as medical imaging, automated optical inspection, and semiconductor testing. Backed by comprehensive software support, these platforms are engineered for the next wave of AI innovation.

NVIDIA NIM Microservices and AI Blueprints Usher in New Era of Local AI

Over the past year, generative AI has transformed the way people live, work and play, enhancing everything from writing and content creation to gaming, learning and productivity. PC enthusiasts and developers are leading the charge in pushing the boundaries of this groundbreaking technology. Countless times, industry-defining technological breakthroughs have been invented in one place—a garage. This week marks the start of the RTX AI Garage series, which will offer routine content for developers and enthusiasts looking to learn more about NVIDIA NIM microservices and AI Blueprints, and how to build AI agents, creative workflow, digital human, productivity apps and more on AI PCs. Welcome to the RTX AI Garage.

This first installment spotlights announcements made earlier this week at CES, including new AI foundation models available on NVIDIA RTX AI PCs that take digital humans, content creation, productivity and development to the next level. These models—offered as NVIDIA NIM microservices—are powered by new GeForce RTX 50 Series GPUs. Built on the NVIDIA Blackwell architecture, RTX 50 Series GPUs deliver up to 3,352 trillion AI operations per second of performance, 32 GB of VRAM and feature FP4 compute, doubling AI inference performance and enabling generative AI to run locally with a smaller memory footprint.

Qualcomm Launches On-Prem AI Appliance Solution and Inference Suite at CES 2025

At CES 2025, Qualcomm Technologies, Inc., today announced Qualcomm AI On-Prem Appliance Solution, an on-premises desktop or wall-mounted hardware solution, and Qualcomm AI Inference Suite, a set of software and services for AI inferencing spanning from near-edge to cloud. The combination of these new offerings allows for small and medium businesses, enterprises and industrial organizations to run custom and off-the-shelf AI applications on their premises, including generative workloads. Running AI inference on premises can deliver significant savings in operational costs and overall total cost of ownership (TCO), compared to the cost of renting third-party AI infrastructure.

Using the AI On-Prem Appliance Solution in concert with the AI Inference Suite, customers can now use generative AI leveraging their proprietary data, fine-tuned models, and technology infrastructure to automate human and machine processes and applications in virtually any end environment, such as retail stores, quick service restaurants, shopping outlets, dealerships, hospitals, factories and shop floors - where the workflow is well established, repeatable and ready for automation.
Return to Keyword Browsing
Mar 25th, 2025 07:00 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts