T0@st
News Editor
- Joined
- Mar 7, 2023
- Messages
- 2,646 (3.59/day)
- Location
- South East, UK
System Name | The TPU Typewriter |
---|---|
Processor | AMD Ryzen 5 5600 (non-X) |
Motherboard | GIGABYTE B550M DS3H Micro ATX |
Cooling | DeepCool AS500 |
Memory | Kingston Fury Renegade RGB 32 GB (2 x 16 GB) DDR4-3600 CL16 |
Video Card(s) | PowerColor Radeon RX 7800 XT 16 GB Hellhound OC |
Storage | Samsung 980 Pro 1 TB M.2-2280 PCIe 4.0 X4 NVME SSD |
Display(s) | Lenovo Legion Y27q-20 27" QHD IPS monitor |
Case | GameMax Spark M-ATX (re-badged Jonsbo D30) |
Audio Device(s) | FiiO K7 Desktop DAC/Amp + Philips Fidelio X3 headphones, or ARTTI T10 Planar IEMs |
Power Supply | ADATA XPG CORE Reactor 650 W 80+ Gold ATX |
Mouse | Roccat Kone Pro Air |
Keyboard | Cooler Master MasterKeys Pro L |
Software | Windows 10 64-bit Home Edition |
Ask a typical IT professional today whether they're leveraging AI, and there's a good chance they'll say yes-after all, they have reputations to protect! Kidding aside, many will report that their teams may use Web-based tools like ChatGPT or even have internal chatbots that serve their employee base on their intranet, but for that not much AI is really being implemented at the infrastructure level. As it turns out, the true answer is a bit different. AI tools and techniques have embedded themselves firmly into standard enterprise workloads and are a more common, everyday phenomena than even many IT people may realize. Assembly line operations now include computer vision-powered inspections. Supply chains use AI for demand forecasting making business move faster and of course, AI note-taking and meeting summary is embedded on virtually all the variants of collaboration and meeting software.
Increasingly, critical enterprise software tools incorporate built-in recommendation systems, virtual agents or some other form of AI-enabled assistance. AI is truly becoming a pervasive, complementary tool for everyday business. At the same time, today's enterprises are navigating a hybrid landscape where traditional, mission-critical workloads coexist with innovative AI-driven tasks. This "mixed enterprise and AI" workload environment calls for infrastructure that can handle both types of processing seamlessly. Robust, general-purpose CPUs like the AMD EPYC processors are designed to be powerful and secure and flexible to address this need. They handle everyday tasks—running databases, web servers, ERP systems—and offer strong security features crucial for enterprise operations augmented with AI workloads. In essence, modern enterprise infrastructure is about creating a balanced ecosystem. AMD EPYC CPUs play a pivotal role in creating this balance, delivering high performance, efficiency, and security features that underpin both traditional enterprise workloads and advanced AI operations.
When CPU inference makes sense
Determining what workloads are good fits for CPU inference comes down to four potential use case characteristics:
These characteristics make the 5th Gen AMD EPYC processors a strategic choice for handling AI inference. It's no coincidence after all with the highest core count of x86 CPUs in the industry, these CPUs can support the parallelized architectures fundamental to AI models. Additionally, the proximity, speed and total capacity of memory allows AI models quick and easy access to the key value cache, helping models run efficiently. It's also no surprise that AMD EPYC CPUs have won hundreds of performance and efficiency world records, demonstrating leadership across a wide array of general-purpose computing tasks.
Workloads that fit for CPU inference
As we've seen, the characteristics of a workload will determine whether a workload is well suited for a CPU. The most common types of workloads that are repeatedly run on CPUs are classical machine learning, recommendation systems, natural language processing, generative AI, like language models, and collaborative prompt-based pre-processing. We take a deeper look into each of these and why they are a good fit for inference on 5th Gen AMD EPYC processors.
Classical Machine Learning
Common instances of machine learning models are decision trees and linear regression models. These algorithms typically have a more sequential architecture than AI models and involve matrix operations and rule-based logic vs deep neural networks. CPUs are well suited to efficiently handling scalar operations and branching logic. Additionally, classical ML algorithms work on structured datasets that fit in memory. CPUs, due to their low memory access latency and large memory capacity provide tremendous performance optimization.
Recommendation Systems
Consider how social media feeds and online shopping are curated with recommendations. These systems use diverse algorithms like collaborative filtering and content-based filtering and require processing a wide variety of data sets from item features, user demographics and interaction history. Supporting this wide variety requires flexibility, for which CPUs are ideal. Recommendation systems also require large, low latency memory access to optimally store entire datasets and embedding tables in memory for fast, frequent access, which CPUs are well suited for.
Natural Language Processing
Chatbots and text to speech or speech to text applications are often running natural language processing models. These models are compact and intended to run in a real time conversational scenario. Since human response time is within seconds, from a compute standpoint, these applications are not very latency sensitive requiring sub millisecond responses, so they make for great fits for CPU inference. Furthermore, leveraging high core count 5th Gen AMD EPYC CPUs, multiple concurrent instances can fit on a single CPU and deliver a compelling price-performance efficiency for these workloads.
Generative AI Including Language Models
Many enterprise applications that have moved from small chatbot applications are now using generative models to streamline and speed up the creation of content. The most common type of generative model are language models. Small and medium language models run efficiently on CPUs. The high core count and memory capacity of the 5th Gen AMD EPYC processors and can support real time inference that is responsive enough for most common use cases like chatbots or search engines and ideal for batch/offline inference that have relaxed response time needs. AMD EPYC optimized libraries can provide additional parallelism and options to run multi-instances enhancing performance throughput.
Collaborative Prompt Based Pre-Processing
Collaborative models are a newer category of models that are very small and efficient for pre-processing data or the user's prompt to streamline the inference work for a larger model downstream. These small models used in retrieval augmented generation (RAG) and speculative decoding AI solutions are great fits for running on a CPU and are often used in a "mixed" scenario running inference on the host CPU supporting the GPUs which run large inference workloads.
These workloads span a wide variety and are each used in multiple applications across industry segments. The set of end applications where these workloads fit are endless, making the applications for CPU based inference endless as well. Be it streamlining supply chains with demand forecasting powered by time series and classical machine learning models, or carbon footprint reduction using predictive analysis like XGBoost to forecast emissions, to improving customer experience with in-store deal and coupon delivery, CPUs power everyday AI inference. While each of these types of workloads can comfortably exist on a CPU, in each example, the high core count, high memory capacity architecture built to balance serialized and parallelized workloads and the flexibility to support multiple workloads and data types make 5th Gen AMD EPYC processors the ideal choice for CPU inference.
Speaking of flexibility, once you do start using accelerators, high frequency 5th Gen AMD EPYC processors are also the best host processor. Compared with the Intel Xeon 8592+, AMD EPYC 9575F processors boast 8% higher max core frequency (3.9 GHz vs 5.0 GHz), up to 50% more memory bandwidth capacity (8 channels vs 12 channels) and 1.6x the high-speed PCIe Gen 5 lanes (80 vs 128 in single socket configurations) for data movement.
To top this off, AMD offers a full portfolio of products to choose from, including the AMD Instinct GPU, for the ideal mix of compute engines. At the same time, a growing number of AMD EPYC CPU-based servers are certified to run NVIDIA GPUs, giving you the choice to run the infrastructure you want.
Solutions for the Evolving Spectrum of AI
AMD EPYC processors can give you the headroom to grow and evolve. Not only do they help consolidate legacy servers in your data center to free up space and power, they also offer flexibility to meet your AI workload needs, regardless of size and scale. For smaller scale AI deployments, 5th Gen AMD EPYC CPUs deliver exceptional price-performance efficiency and for large scale deployments, whether it requires 1 or hundreds of thousands of GPUs, they help extract maximum throughput for your AI workload.
Progress doesn't stand still. The future is opaque, so whether models get smaller and more efficient, or larger and more capable (or both!) 5th Gen AMD EPYC CPUs offer flexibility to adapt to the evolving AI landscape. To offer your customers the best products and services, at the right price, you must be able to adapt. An AMD EPYC CPU-based server will be able to adapt with you. Get started running AI on AMD EPYC with our out-of-the-box support for Pytorch models and see how we can help optimize your performance with ZenDNN.
View at TechPowerUp Main Site | Source
Increasingly, critical enterprise software tools incorporate built-in recommendation systems, virtual agents or some other form of AI-enabled assistance. AI is truly becoming a pervasive, complementary tool for everyday business. At the same time, today's enterprises are navigating a hybrid landscape where traditional, mission-critical workloads coexist with innovative AI-driven tasks. This "mixed enterprise and AI" workload environment calls for infrastructure that can handle both types of processing seamlessly. Robust, general-purpose CPUs like the AMD EPYC processors are designed to be powerful and secure and flexible to address this need. They handle everyday tasks—running databases, web servers, ERP systems—and offer strong security features crucial for enterprise operations augmented with AI workloads. In essence, modern enterprise infrastructure is about creating a balanced ecosystem. AMD EPYC CPUs play a pivotal role in creating this balance, delivering high performance, efficiency, and security features that underpin both traditional enterprise workloads and advanced AI operations.


When CPU inference makes sense
Determining what workloads are good fits for CPU inference comes down to four potential use case characteristics:
- High Memory Capacity: Increased memory capacity for larger models and more extensive state information to be maintained during inference.
- Low Latency: Small and medium models with real time, sporadic or low concurrent inference requests
- Batch/Offline Processing: Unbounded latency or where batch processing can be leveraged to handle high volume workloads
- Cost and Energy Efficiency: Sensitivity to energy consumption and cost, both CAPEX and OPEX
These characteristics make the 5th Gen AMD EPYC processors a strategic choice for handling AI inference. It's no coincidence after all with the highest core count of x86 CPUs in the industry, these CPUs can support the parallelized architectures fundamental to AI models. Additionally, the proximity, speed and total capacity of memory allows AI models quick and easy access to the key value cache, helping models run efficiently. It's also no surprise that AMD EPYC CPUs have won hundreds of performance and efficiency world records, demonstrating leadership across a wide array of general-purpose computing tasks.

Workloads that fit for CPU inference
As we've seen, the characteristics of a workload will determine whether a workload is well suited for a CPU. The most common types of workloads that are repeatedly run on CPUs are classical machine learning, recommendation systems, natural language processing, generative AI, like language models, and collaborative prompt-based pre-processing. We take a deeper look into each of these and why they are a good fit for inference on 5th Gen AMD EPYC processors.
Classical Machine Learning
Common instances of machine learning models are decision trees and linear regression models. These algorithms typically have a more sequential architecture than AI models and involve matrix operations and rule-based logic vs deep neural networks. CPUs are well suited to efficiently handling scalar operations and branching logic. Additionally, classical ML algorithms work on structured datasets that fit in memory. CPUs, due to their low memory access latency and large memory capacity provide tremendous performance optimization.
Recommendation Systems
Consider how social media feeds and online shopping are curated with recommendations. These systems use diverse algorithms like collaborative filtering and content-based filtering and require processing a wide variety of data sets from item features, user demographics and interaction history. Supporting this wide variety requires flexibility, for which CPUs are ideal. Recommendation systems also require large, low latency memory access to optimally store entire datasets and embedding tables in memory for fast, frequent access, which CPUs are well suited for.
Natural Language Processing
Chatbots and text to speech or speech to text applications are often running natural language processing models. These models are compact and intended to run in a real time conversational scenario. Since human response time is within seconds, from a compute standpoint, these applications are not very latency sensitive requiring sub millisecond responses, so they make for great fits for CPU inference. Furthermore, leveraging high core count 5th Gen AMD EPYC CPUs, multiple concurrent instances can fit on a single CPU and deliver a compelling price-performance efficiency for these workloads.
Generative AI Including Language Models
Many enterprise applications that have moved from small chatbot applications are now using generative models to streamline and speed up the creation of content. The most common type of generative model are language models. Small and medium language models run efficiently on CPUs. The high core count and memory capacity of the 5th Gen AMD EPYC processors and can support real time inference that is responsive enough for most common use cases like chatbots or search engines and ideal for batch/offline inference that have relaxed response time needs. AMD EPYC optimized libraries can provide additional parallelism and options to run multi-instances enhancing performance throughput.
Collaborative Prompt Based Pre-Processing
Collaborative models are a newer category of models that are very small and efficient for pre-processing data or the user's prompt to streamline the inference work for a larger model downstream. These small models used in retrieval augmented generation (RAG) and speculative decoding AI solutions are great fits for running on a CPU and are often used in a "mixed" scenario running inference on the host CPU supporting the GPUs which run large inference workloads.
These workloads span a wide variety and are each used in multiple applications across industry segments. The set of end applications where these workloads fit are endless, making the applications for CPU based inference endless as well. Be it streamlining supply chains with demand forecasting powered by time series and classical machine learning models, or carbon footprint reduction using predictive analysis like XGBoost to forecast emissions, to improving customer experience with in-store deal and coupon delivery, CPUs power everyday AI inference. While each of these types of workloads can comfortably exist on a CPU, in each example, the high core count, high memory capacity architecture built to balance serialized and parallelized workloads and the flexibility to support multiple workloads and data types make 5th Gen AMD EPYC processors the ideal choice for CPU inference.


Speaking of flexibility, once you do start using accelerators, high frequency 5th Gen AMD EPYC processors are also the best host processor. Compared with the Intel Xeon 8592+, AMD EPYC 9575F processors boast 8% higher max core frequency (3.9 GHz vs 5.0 GHz), up to 50% more memory bandwidth capacity (8 channels vs 12 channels) and 1.6x the high-speed PCIe Gen 5 lanes (80 vs 128 in single socket configurations) for data movement.
To top this off, AMD offers a full portfolio of products to choose from, including the AMD Instinct GPU, for the ideal mix of compute engines. At the same time, a growing number of AMD EPYC CPU-based servers are certified to run NVIDIA GPUs, giving you the choice to run the infrastructure you want.
Solutions for the Evolving Spectrum of AI
AMD EPYC processors can give you the headroom to grow and evolve. Not only do they help consolidate legacy servers in your data center to free up space and power, they also offer flexibility to meet your AI workload needs, regardless of size and scale. For smaller scale AI deployments, 5th Gen AMD EPYC CPUs deliver exceptional price-performance efficiency and for large scale deployments, whether it requires 1 or hundreds of thousands of GPUs, they help extract maximum throughput for your AI workload.
Progress doesn't stand still. The future is opaque, so whether models get smaller and more efficient, or larger and more capable (or both!) 5th Gen AMD EPYC CPUs offer flexibility to adapt to the evolving AI landscape. To offer your customers the best products and services, at the right price, you must be able to adapt. An AMD EPYC CPU-based server will be able to adapt with you. Get started running AI on AMD EPYC with our out-of-the-box support for Pytorch models and see how we can help optimize your performance with ZenDNN.
View at TechPowerUp Main Site | Source