Intel Announces "Cooper Lake" 4P-8P Xeons, New Optane Memory, PCIe 4.0 SSDs, and FPGAs for AI

btarunr · Jun 18, 2020

Intel today introduced its 3rd Gen Intel Xeon Scalable processors and additions to its hardware and software AI portfolio, enabling customers to accelerate the development and use of AI and analytics workloads running in data center, network and intelligent-edge environments. As the industry's first mainstream server processor with built-in bfloat16 support, Intel's new 3rd Gen Xeon Scalable processors makes artificial intelligence (AI) inference and training more widely deployable on general-purpose CPUs for applications that include image classification, recommendation engines, speech recognition and language modeling.

"The ability to rapidly deploy AI and data analytics is essential for today's businesses. We remain committed to enhancing built-in AI acceleration and software optimizations within the processor that powers the world's data center and edge solutions, as well as delivering an unmatched silicon foundation to unleash insight from data," said Lisa Spelman, Intel corporate vice president and general manager, Xeon and Memory Group.

AI and analytics open new opportunities for customers across a broad range of industries, including finance, healthcare, industrial, telecom and transportation. IDC predicts that by 2021, 75% of commercial enterprise apps will use AI1. And by 2025, IDC estimates that roughly a quarter of all data generated will be created in real time, with various internet of things (IoT) devices creating 95% of that volume growth.

Unequaled Portfolio Breadth and Ecosystem Support for AI and Analytics
Intel's new data platforms, coupled with a thriving ecosystem of partners using Intel AI technologies, are optimized for businesses to monetize their data through the deployment of intelligent AI and analytics services.

New 3rd Gen Intel Xeon Scalable Processors: Intel is further extending its investment in built-in AI acceleration in the new 3rd Gen Intel Xeon Scalable processors through the integration of bfloat16 support into the processor's unique Intel DL Boost technology. bfloat16 is a compact numeric format that uses half the bits as today's FP32 format but achieves comparable model accuracy with minimal (if any) software changes required. The addition of bfloat16 support accelerates both AI training and inference performance in the CPU. Intel-optimized distributions for leading deep learning frameworks (including TensorFlow and Pytorch) support bfloat16 and are available through the Intel AI Analytics toolkit. Intel also delivers bfloat16 optimizations into its OpenVINO toolkit and the ONNX Runtime environment to ease inference deployments.
The 3rd Gen Intel Xeon Scalable processors (codenamed "Cooper Lake") evolve Intel's 4- and 8-socket processor offering. The processor is designed for deep learning, virtual machine (VM) density, in-memory database, mission-critical applications and analytics-intensive workloads. Customers refreshing aging infrastructure can expect an average estimated gain of 1.9x on popular workloads and up to 2.2x more VMs compared with 5-year-old, 4-socket platform equivalents.
New Intel Optane Persistent Memory: As part of the 3rd Gen Intel Xeon Scalable platform, the company also announced the Intel Optane persistent memory 200 series, providing customers up to 4.5 TB of memory per socket to manage data intensive workloads, such as in-memory databases, dense virtualization, analytics and high-powered computing.
New Intel 3D NAND SSDs: For systems that store data in all-flash arrays, Intel announced the availability of its next-generation high-capacity Intel 3D NAND SSDs, the Intel SSD D7-P5500 and P5600. These 3D NAND SSDs are built with Intel's latest triple-level cell (TLC) 3D NAND technology and an all-new low-latency PCIe controller to meet the intense IO requirements of AI and analytics workloads and advanced features to improve IT efficiency and data security.
First Intel AI-Optimized FPGA: Intel disclosed its upcoming Intel Stratix 10 NX FPGAs, Intel's first AI-optimized FPGAs targeted for high-bandwidth, low-latency AI acceleration. These FPGAs will offer customers customizable, reconfigurable and scalable AI acceleration for compute-demanding applications such as natural language processing and fraud detection. Intel Stratix 10 NX FPGAs include integrated high-bandwidth memory (HBM), high-performance networking capabilities and new AI-optimized arithmetic blocks called AI Tensor Blocks, which contain dense arrays of lower-precision multipliers typically used for AI model arithmetic.

OneAPI Cross-Architecture Development for Ongoing AI Innovation: As Intel expands its advanced AI product portfolio to meet diverse customer needs, it is also paving the way to simplify heterogeneous programming for developers with its oneAPI cross-architecture tools portfolio to accelerate performance and increase productivity. With these advanced tools, developers can accelerate AI workloads across Intel CPUs, GPUs, and FPGAs, and future-proof their code for today's and next generations of Intel processors and accelerators.

Enhanced Intel Select Solutions Portfolio Address IT's Top Requirements: Intel has enhanced its Select Solutions portfolio to accelerate deployment of IT's most urgent requirements highlighting the value of pre-verified solution delivery in today's rapidly evolving business climate. Announced today are 3 new and 5 enhanced Intel Select Solutions focused on analytics, AI and hyper-converged infrastructure. The enhanced Intel Select Solution for Genomics Analytics is being used around the world to find a vaccine for COVID-19 and the new Intel Select Solution for VMware Horizon VDI on vSAN is being used to enhance remote learning.

The 3rd Gen Intel Xeon Scalable processors and Intel Optane persistent memory 200 series are shipping to customers today. In May, Facebook announced that 3rd Gen Intel Xeon Scalable processors are the foundation for its newest Open Compute Platform (OCP) servers, and other leading CSPs, including Alibaba, Baidu and Tencent, have announced they are adopting the next-generation processors. General OEM systems availability is expected in 2H 2020. The Intel SSD D7-P5500 and P5600 3D NAND SSDs are available today. And the Intel Stratix 10 NX FPGA is expected to be available in the 2H 2020.

Complete Slide Deck

View at TechPowerUp Main Site

Noztra · Jun 18, 2020

So compared to a 5 year old system, you will get 2x the performance?

That is innovation..

And funny how they haven’t got any charts comparing them to Epyc/Rome.

Mamya3084 · Jun 18, 2020

Ah yes, and ike a Tesla truck, when can it be purchased?

Aquinus · Jun 18, 2020

To quote someone from the Phoronix forums:

Hear ye, hear ye! Announcing our NewLake processors! They are the exact same uarch as OldLake, same process node as OldLake, same gfx as OldLake, but clock is 100 Mhz faster! Innovation!! Upgrade now for only $499!!

E-curbi · Jun 18, 2020

When can we purchase Optane DDR5 memory modules for client builds?

And New Optane PCIe 4.0 SSDs with the gen 2 controllers?

Those, the more pertinent questions for simple builders like us.

Assimilator · Jun 18, 2020

E-curbi said:
When can we purchase Optane DDR5 memory modules for client builds?

Considering DDR5 is still in the prototype stage... a while.

mamisano · Jun 18, 2020

New PCIe 4.0 SSDs, and no Intel platform to take advantage of them!

AusWolf · Jun 18, 2020

Hey, a new "Lake"! Just what everyone has been waiting for! :laugh:

Logoffon · Jun 18, 2020

"Copper Lake"?

Actually, there are two Os and one P instead of the other way around.

psyclist · Jun 18, 2020

Logoffon said:
"Copper Lake"?

Actually, there are two Os and one P instead of the other way around.

Ahh ok, Pooper Lake, I get it, thanks for the clarification.

lemonadesoda · Jun 18, 2020

Bfloat16 is a disgusting number format. It is going to cause all kind of awful “glitches” in the future. It has one purpose, to shoehorn additional performance using tricks and shortcuts to save money. It uses half the memory of FP32, and the silicon required is less, allowing you to have more BF16 parallel calculations on the same silicon die size, and is a little quicker at calculating. The cost? Inaccuracy. Doesnt matter if you are google and using it to process data to target advertising. Who cares if you get the wrong ad, an odd youtube recommendation, or some video or photo has lost a little quality. For google, they probably can process on less silicon and less memory saving them money. So what’s the problem?

libraries and obfuscation of calculation.

BF16 is often not used singularly and exclusively, but is mixed with FP32 in an ad hoc fashion, to obtain speed gains. Look, says the coder, i gained 30% throughput by optimising the algorithm using BF32 on the multipliers. Great, lets bake that into production.

years later, libraries layered on layers, API’s linked over networks or Internet, some IMPORTANT application will use one of these now standard libraries and deliver most of the time Accurate results but sometimes inaccurate results that could havemajor consequences, and the developer or user is Non the wiser and things will go wrong. Security? financial markets? Rocket trajectories? https://www.bbc.com/future/article/20150505-the-numbers-that-lead-to-disaster

Einstein said Everything should be made as simple as possible, but not simpler. There is a huge risk BF16 will be tomorrows year 2000 problem, or The GPS rollover problem, or other examples. We will regret it, it if gets out of its special case box

Houd.ini · Jun 19, 2020

lemonadesoda said:
Bfloat16 is a disgusting number format. It is going to cause all kind of awful “glitches” in the future. It has one purpose, to shoehorn additional performance using tricks and shortcuts to save money. It uses half the memory of FP32, and the silicon required is less, allowing you to have more BF16 parallel calculations on the same silicon die size, and is a little quicker at calculating. The cost? Inaccuracy. Doesnt matter if you are google and using it to process data to target advertising. Who cares if you get the wrong ad, an odd youtube recommendation, or some video or photo has lost a little quality. For google, they probably can process on less silicon and less memory saving them money. So what’s the problem?

libraries and obfuscation of calculation.

BF16 is often not used singularly and exclusively, but is mixed with FP32 in an ad hoc fashion, to obtain speed gains. Look, says the coder, i gained 30% throughput by optimising the algorithm using BF32 on the multipliers. Great, lets bake that into production.

years later, libraries layered on layers, API’s linked over networks or Internet, some IMPORTANT application will use one of these now standard libraries and deliver most of the time Accurate results but sometimes inaccurate results that could havemajor consequences, and the developer or user is Non the wiser and things will go wrong. Security? financial markets? Rocket trajectories? https://www.bbc.com/future/article/20150505-the-numbers-that-lead-to-disaster

Einstein said Everything should be made as simple as possible, but not simpler. There is a huge risk BF16 will be tomorrows year 2000 problem, or The GPS rollover problem, or other examples. We will regret it, it if gets out of its special case box

Interesting observation, thanks for the link!

Vya Domus · Jun 19, 2020

lemonadesoda said:
Bfloat16 is a disgusting number format. It is going to cause all kind of awful “glitches” in the future. It has one purpose, to shoehorn additional performance using tricks and shortcuts to save money. It uses half the memory of FP32, and the silicon required is less, allowing you to have more BF16 parallel calculations on the same silicon die size, and is a little quicker at calculating. The cost? Inaccuracy. Doesnt matter if you are google and using it to process data to target advertising. Who cares if you get the wrong ad, an odd youtube recommendation, or some video or photo has lost a little quality. For google, they probably can process on less silicon and less memory saving them money. So what’s the problem?

libraries and obfuscation of calculation.

BF16 is often not used singularly and exclusively, but is mixed with FP32 in an ad hoc fashion, to obtain speed gains. Look, says the coder, i gained 30% throughput by optimising the algorithm using BF32 on the multipliers. Great, lets bake that into production.

years later, libraries layered on layers, API’s linked over networks or Internet, some IMPORTANT application will use one of these now standard libraries and deliver most of the time Accurate results but sometimes inaccurate results that could havemajor consequences, and the developer or user is Non the wiser and things will go wrong. Security? financial markets? Rocket trajectories? https://www.bbc.com/future/article/20150505-the-numbers-that-lead-to-disaster

Einstein said Everything should be made as simple as possible, but not simpler. There is a huge risk BF16 will be tomorrows year 2000 problem, or The GPS rollover problem, or other examples. We will regret it, it if gets out of its special case box

Half floats are obviously used only when the accuracy is enough. Some networks require very low precision, some classification algorithms need as little as 8 bit integers or even 4 bit.

Neural networks are never "accurate" no matter the floating point precision, they can't be by definition, they are statistical models.

yeeeeman · Jun 19, 2020

Noztra said:
So compared to a 5 year old system, you will get 2x the performance?

That is innovation..

And funny how they haven’t got any charts comparing them to Epyc/Rome.

What do you expect? They just doubled the core counts from 28 to 56...same silicon.

System Name	RBMK-1000
Processor	AMD Ryzen 7 5700G
Motherboard	ASUS ROG Strix B450-E Gaming
Cooling	DeepCool Gammax L240 V2
Memory	2x 8GB G.Skill Sniper X
Video Card(s)	Palit GeForce RTX 2080 SUPER GameRock
Storage	Western Digital Black NVMe 512GB
Display(s)	BenQ 1440p 60 Hz 27-inch
Case	Corsair Carbide 100R
Audio Device(s)	ASUS SupremeFX S1220A
Power Supply	Cooler Master MWE Gold 650W
Mouse	ASUS ROG Strix Impact
Keyboard	Gamdias Hermes E2
Software	Windows 11 Pro

System Name	Apollo
Processor	Intel Core i9 9880H
Motherboard	Some proprietary Apple thing.
Memory	64GB DDR4-2667
Video Card(s)	AMD Radeon Pro 5600M, 8GB HBM2
Storage	1TB Apple NVMe, 4TB External
Display(s)	Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case	MacBook Pro (16", 2019)
Audio Device(s)	AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply	96w Power Adapter
Mouse	Logitech MX Master 3
Keyboard	Logitech G915, GL Clicky
Software	MacOS 12.1

System Name	Batman's CaseLabs Mercury S8 Work Computer
Processor	8086K 5.3Ghz binned delidded by Siliconlottery.com 5.5Ghz 6c12t 5.6Ghz 6c6t on ambient air
Motherboard	EVGA Z390 DARK
Cooling	Noctua C14S for all overclocking so far Noctua Industrial PWM fan 2000rpm rated (700rpm inaudible)
Memory	Gskill Trident Z Royal Silver F4-4600C18D-16GTRS running at 4500Mhz 17-17-17-37 (new mem OC) : )
Video Card(s)	AMD WX 4100 Workstation Card (AMD W5400 7nm workstation card coming soon)
Storage	Intel Optane 900P 280GB PCIe card as Primary OS drive / (4) Samsung 860Pro 256GB SATA internal
Display(s)	Planar 27in 2560x1440 Glossy LG panel with glass bonded to panel for increased clarity
Case	CaseLabs Mercury S8 open bench chassis two-tone black front cover with gunmetal frame
Audio Device(s)	Creative $25 2.1 speakers lol
Power Supply	Seasonic Prime Titanium 700watt fanless
Mouse	Logitech MX Master 3 graphite / Glorious Model D matte black / Razer Invicta mousing mat gunmetal
Keyboard	HHKB Hybrid Type-S black printed keycaps
Software	Work Apps text and statistical
Benchmark Scores	Single Thread scores at 5.6Ghz: Cinebench R15 ST - 249 CPU-Z ST - 676 PassMark CPU ST - 3389

System Name	Firelance.
Processor	Threadripper 3960X
Motherboard	ROG Strix TRX40-E Gaming
Cooling	IceGem 360 + 6x Arctic Cooling P12
Memory	8x 16GB Patriot Viper DDR4-3200 CL16
Video Card(s)	MSI GeForce RTX 4060 Ti Ventus 2X OC
Storage	2TB WD SN850X (boot), 4TB Crucial P3 (data)
Display(s)	Dell S3221QS(A) (32" 38x21 60Hz) + 2x AOC Q32E2N (32" 25x14 75Hz)
Case	Enthoo Pro II Server Edition (Closed Panel) + 6 fans
Power Supply	Fractal Design Ion+ 2 Platinum 760W
Mouse	Logitech G604
Keyboard	Razer Pro Type Ultra
Software	Windows 10 Professional x64

System Name	Home PC
Processor	AMD Ryzen 7 1700
Motherboard	ASRock Fatal1ty X370 Gaming K4 AM4
Cooling	AMD Wraith Spire
Memory	16 GB Corsair Vengeance PC3000 DDR4
Video Card(s)	PowerColor RED DRAGON Radeon RX Vega 56
Storage	Samsung 850 Evo 1TB, Crucial MX300 500GB
Display(s)	Dell S2719DGF 1440p
Case	Phanteks Enthoo Pro Series PH-ES614P
Audio Device(s)	Onboard
Power Supply	SeaSonic M12II 620 Bronze
Mouse	Logitech G9X
Keyboard	Dell
Software	Windows 10 Pro

Intel Announces "Cooper Lake" 4P-8P Xeons, New Optane Memory, PCIe 4.0 SSDs, and FPGAs for AI

btarunr

Editor & Senior Moderator

Noztra

Mamya3084

Aquinus

Resident Wat-man

E-curbi

Assimilator

mamisano

AusWolf

Logoffon

psyclist

lemonadesoda

Houd.ini

Vya Domus

yeeeeman

Processor	Various Intel and AMD CPUs
Motherboard	Micro-ATX and mini-ITX
Cooling	Yes
Memory	Overclocking is overrated
Video Card(s)	Various Nvidia and AMD GPUs
Storage	A lot
Display(s)	Monitors and TVs
Case	It's not about size, but how you use it
Audio Device(s)	Speakers and headphones
Power Supply	300 to 750 W, bronze to gold
Mouse	Wireless
Keyboard	Mechanic
VR HMD	Not yet
Software	Linux gaming master race

System Name	ICE-QUAD // ICE-CRUNCH
Processor	Q6600 // 2x Xeon 5472
Memory	2GB DDR // 8GB FB-DIMM
Video Card(s)	HD3850-AGP // FireGL 3400
Display(s)	2 x Samsung 204Ts = 3200x1200
Audio Device(s)	Audigy 2
Software	Windows Server 2003 R2 as a Workstation now migrated to W10 with regrets.

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C