• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel "Cannon Lake" Confirmed to Feature AVX-512 Instruction-Set

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,343 (7.51/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
Intel updated the ARK information page for its stealthily launched 10 nm production chip, the Core i3-8121U "Cannon Lake," to confirm that the chip supports the new AVX-512 instruction-set. This is the first "mainstream" client-segment processor by the company to feature the extremely advanced instruction-set that, if implemented properly on the software side, can double performance/Watt compared to tasks that can take advantage of AVX2.

The instruction-set made its debut with the Xeon Phi "Knights Landing" HPC processor, and made its client-segment debut with the Core X "Skylake X" HEDT processors. It remains to be seen if the implementation of AVX-512 on "Cannon Lake" is complete, or if some instructions found on HPC processors such as the Xeon Phi are omitted due to irrelevance to the client platform.



View at TechPowerUp Main Site
 
Joined
Jun 10, 2014
Messages
3,006 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
This confirms again that Cannon Lake will bring AVX-512 to the mainstream platforms, but keep in mind that this doesn't mean that the mainstream platforms will get the complete feature set.

Now we just have to wait for productive software to start utilizing this…
 
Joined
Sep 15, 2007
Messages
3,946 (0.62/day)
Location
Police/Nanny State of America
Processor OCed 5800X3D
Motherboard Asucks C6H
Cooling Air
Memory 32GB
Video Card(s) OCed 6800XT
Storage NVMees
Display(s) 32" Dull curved 1440
Case Freebie glass idk
Audio Device(s) Sennheiser
Power Supply Don't even remember
This confirms again that Cannon Lake will bring AVX-512 to the mainstream platforms, but keep in mind that this doesn't mean that the mainstream platforms will get the complete feature set.

Now we just have to wait for productive software to start utilizing this…

As if it even matters since you won't get real desktop parts until 2020 LOL. What are you gonna do, run blender on a quad core in 2019? LOL
 
Joined
Feb 19, 2009
Messages
1,162 (0.20/day)
Location
I live in Norway
Processor R9 5800x3d | R7 3900X | 4800H | 2x Xeon gold 6142
Motherboard Asrock X570M | AB350M Pro 4 | Asus Tuf A15
Cooling Air | Air | duh laptop
Memory 64gb G.skill SniperX @3600 CL16 | 128gb | 32GB | 192gb
Video Card(s) RTX 4080 |Quadro P5000 | RTX2060M
Storage Many drives
Display(s) AW3423dwf.
Case Jonsbo D41
Power Supply Corsair RM850x
Mouse g502 Lightspeed
Keyboard G913 tkl
Software win11, proxmox
This confirms again that Cannon Lake will bring AVX-512 to the mainstream platforms, but keep in mind that this doesn't mean that the mainstream platforms will get the complete feature set.

Now we just have to wait for productive software to start utilizing this…


Prerequisites for AVX-512 to work at all.
it not using lots of power, it not clocking down.

Unless you have a batch job avx is a total waste of time and performance loss, even at a batch job the improvements are lacking.
25% ipc boost, this is what everyone see, OH WOW 25% FASTER!!!
People go like: Hey, Software company\dude; why don't you have the new much faster AVX stuff implemented?
or, AVX is revolutionary so you need to buy skylake-s, ryzen sucks cause it doesn't have avx-512 blabla.

well, 25% sounds great does it?, well 25% ipc that is and 15% lower clocks due to power consumption thus a lot of the improvements are still there.
but even at 15% lower clocks the power consumption is still above what a normal nice workload using sse instruction sets.

click this link to read about the major issues with AVX and why people who bought into the 25% promised performance may be getting an "oh.. right" moment
avx info from a dev

Not saying AVX cannot be great and be used but currently it's a bit of a problem, and where you might find AVX great we have cuda, opencl with gpu acceleration doing most of the work so it's not too important in my eyes.
The major issue is getting vrm circuit ready for massive avx use...
 
Last edited:
Joined
Jun 10, 2014
Messages
3,006 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Unless you have a batch job avx is a total waste of time and performance loss, even at a batch job the improvements are lacking.
You are mistaken, AVX is essential for most workloads like video encoding, graphical editing, 3D modeling and CAD.
It does not matter at all for your games or your browser running Facebook, but it makes a huge difference for work.

Not saying AVX cannot be great and be used but currently it's a bit of a problem, and where you might find AVX great we have cuda, opencl with gpu acceleration doing most of the work so it's not too important in my eyes.

The major issue is getting vrm circuit ready for massive avx use...
Then you lack a basic understanding of how AVX works.
CUDA and OpenCL are great whenever you have a large chunk of data which can be processed by the GPU without being transferred back and forth very frequently. Switching between normal CPU registers and AVX registers are very cheap, and nearly "free" compared to transferring data between CPU cores or to a GPU.
 
Joined
Jan 8, 2017
Messages
9,525 (3.26/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
AVX was never meant for consumer products. The perfect use case for it is on a server CPU that typically runs at a constant low clock rate with a constant workload , anything outside of that and you need a great deal of power consumption management.

I noticed that for the last couple of years Intel has trouble figuring out what features they should put in their products. They went from locking down AVX instructions of their lower end CPUs to now turning everything into a Xeon Phi.
Compute should run on a GPU , that's what they are designed for from the ground up and they do have a mostly useless poorly designed slab of silicon on almost all their CPUs to do that. Of course Intel doesn't like that so they keep shoving wider and wider SIMD extensions that come with convoluted trade-offs since you can't a have a high clocked CPU with many cores and very wide SIMD instructions at the same time. Too many design tensions.
 
Joined
Apr 12, 2013
Messages
1,192 (0.28/day)
Processor 11700
Motherboard TUF z590
Memory G.Skill 32gb 3600mhz
Video Card(s) ROG Vega 56
Case Deepcool
Power Supply RM 850
Checked the ark and it has Hyper-Threading enabled thats nice.
 
Joined
Jan 8, 2017
Messages
9,525 (3.26/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Checked the ark and it has Hyper-Threading enabled thats nice.
Was expecting it to finally be a full fledged quad core though but for 15W it's good enough I guess.
 
Joined
Feb 19, 2009
Messages
1,162 (0.20/day)
Location
I live in Norway
Processor R9 5800x3d | R7 3900X | 4800H | 2x Xeon gold 6142
Motherboard Asrock X570M | AB350M Pro 4 | Asus Tuf A15
Cooling Air | Air | duh laptop
Memory 64gb G.skill SniperX @3600 CL16 | 128gb | 32GB | 192gb
Video Card(s) RTX 4080 |Quadro P5000 | RTX2060M
Storage Many drives
Display(s) AW3423dwf.
Case Jonsbo D41
Power Supply Corsair RM850x
Mouse g502 Lightspeed
Keyboard G913 tkl
Software win11, proxmox
You are mistaken, AVX is essential for most workloads like video encoding, graphical editing, 3D modeling and CAD.
It does not matter at all for your games or your browser running Facebook, but it makes a huge difference for work.


Then you lack a basic understanding of how AVX works.
CUDA and OpenCL are great whenever you have a large chunk of data which can be processed by the GPU without being transferred back and forth very frequently. Switching between normal CPU registers and AVX registers are very cheap, and nearly "free" compared to transferring data between CPU cores or to a GPU.


when I've already used milliseconds to start my task the gpu have already been able to do the same and when it starts it does it in a shorter time.
the point was that avx is meant to replace sse for tasks running for minutes, at that point a gpu might as well be used.
also Smaller tasks using heavy avx is worthless.

AVX heavy is still beneficial when no gpu is present to do said work, but even at that point the avx improvements have been lost to lower clock speeds and the tounted 25% ipc is ipc and not effeciency nor speed which was my points against avx being our saviour.

I won't write off avx being an failure just that it does see some major hurdles and issues to gain more traction and is generally misunderstood by many because many do say that 25% ipc and it's a game changer while it's not, it has it's nichè's but that's about it.
 
Joined
Jun 10, 2014
Messages
3,006 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
when I've already used milliseconds to start my task the gpu have already been able to do the same and when it starts it does it in a shorter time.

the point was that avx is meant to replace sse for tasks running for minutes, at that point a gpu might as well be used.
No, that's incorrect. The overhead of using AVX/SSE is a few clocks, which means ns scale, not ms (keep in mind that 1 ms = 1.000 us = 1.000.000 ns).

also Smaller tasks using heavy avx is worthless.

AVX heavy is still beneficial when no gpu is present to do said work, but even at that point the avx improvements have been lost to lower clock speeds and the touted 25% ipc is ipc and not effeciency nor speed which was my points against avx being our saviour.
No, size of the task is irrelevant for AVX. What matters is whether the code is computational intensive and cache optimized, the latter of which more or less means the code have to be something like C or C++ with specific design considerations.

I don't know where you got your "touted 25% IPC" figue. Do you even know what IPC means? People commonly throw around "IPC measurements" for various benchmarks which doesn't measure IPC at all. IPC means Instructions Per Clock, while vector operations such as SSE and AVX exploits data level parallelism, it doesn't increase the instructions per clock at all, but it does increase the computational throughput.

AVX doesn't even give "25%" performance increase, AVX2 offers an 8× increase over ALUs/FPUs for 32-bit operations, AVX-512 offers 16× for 32-bit operations. The performance gains of AVX depends on the application, ranging from ~30-50% and up to >10×, all depending on how computational intensive the application may be, and also how many AVX units the CPU features (e.g. Skylake-X have dual AVX-512 FMA units per core).
Edit: Gains in AVX can be even greater with FMA, which combines calculations like a + b × c into a single operation. Normally a CPU would calculate the a + b, then wait 14-19 clocks and do the multiplication. Doing this in a single operation saves a huge amount of otherwise wasted CPU cycles.

As mentioned, AVX is used in most heavy workloads, including video encoding, graphical editing, 3D modeling and CAD. Examples include Blender and FFMPEG.

AVX may also be used in libraries used by many applications, giving a smaller but still appreciated performance boost. Look at this benchmark of Ubuntu vs. Intel Clear Linux, which is recompiled with some AVX optimizations on libraries such as glibc etc. These benchmarks compare the competing 16 core Threadripper 1950x vs. the 10 core i9 7900X, and even though this is a Linux distribution made by Intel, it clearly displays examples where the 16-core Threadripper jumps from lagging behind the 10-core i9 to taking the lead. AVX is amazing and it should excite even AMD fans, if you still don't get it, you should get educated on this matter. When Zen eventually gets AVX-512 too, it will scale incredible well in some future applications.
 
Last edited:
Joined
Nov 13, 2007
Messages
10,877 (1.74/day)
Location
Austin Texas
System Name stress-less
Processor 9800X3D @ 5.42GHZ
Motherboard MSI PRO B650M-A Wifi
Cooling Thermalright Phantom Spirit EVO
Memory 64GB DDR5 6400 1:1 CL30-36-36-76 FCLK 2200
Video Card(s) RTX 4090 FE
Storage 2TB WD SN850, 4TB WD SN850X
Display(s) Alienware 32" 4k 240hz OLED
Case Jonsbo Z20
Audio Device(s) Yes
Power Supply Corsair SF750
Mouse DeathadderV2 X Hyperspeed
Keyboard 65% HE Keyboard
Software Windows 11
Benchmark Scores They're pretty good, nothing crazy.
My AVX-512 timespy Extreme @4.4Ghz is about 80-100% faster than regular calculations @4.7 GHZ and uses less power.

These instructions seem to be very power efficient for the amount of performance they bring.
 
Joined
Apr 8, 2010
Messages
1,012 (0.19/day)
Processor Intel Core i5 8400
Motherboard Gigabyte Z370N-Wifi
Cooling Silverstone AR05
Memory Micron Crucial 16GB DDR4-2400
Video Card(s) Gigabyte GTX1080 G1 Gaming 8G
Storage Micron Crucial MX300 275GB
Display(s) Dell U2415
Case Silverstone RVZ02B
Power Supply Silverstone SSR-SX550
Keyboard Ducky One Red Switch
Software Windows 10 Pro 1909
when I've already used milliseconds to start my task the gpu have already been able to do the same and when it starts it does it in a shorter time.
the point was that avx is meant to replace sse for tasks running for minutes, at that point a gpu might as well be used.
also Smaller tasks using heavy avx is worthless.

AVX heavy is still beneficial when no gpu is present to do said work, but even at that point the avx improvements have been lost to lower clock speeds and the tounted 25% ipc is ipc and not effeciency nor speed which was my points against avx being our saviour.

I won't write off avx being an failure just that it does see some major hurdles and issues to gain more traction and is generally misunderstood by many because many do say that 25% ipc and it's a game changer while it's not, it has it's nichè's but that's about it.

That's my question regarding AVX as well. I known very little about it while I do a lot of GPU computing at work. What is it that AVX can do better than CUDA, OpenCL, compute shaders, or Metal kernel?
 
Joined
Jan 8, 2017
Messages
9,525 (3.26/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
What is it that AVX can do better than CUDA, OpenCL, compute shaders, or Metal kernel?

Very little , it's also much more difficult to add AVX support, you have to resort to inline assembly , mnemonics all of it which are deterrent to most programmers.
 
Joined
Apr 12, 2013
Messages
1,192 (0.28/day)
Processor 11700
Motherboard TUF z590
Memory G.Skill 32gb 3600mhz
Video Card(s) ROG Vega 56
Case Deepcool
Power Supply RM 850
Was expecting it to finally be a full fledged quad core though but for 15W it's good enough I guess.

Not that would have been extremely nice if it was an full 4 core but then again too good to be true.
 
Joined
Feb 16, 2012
Messages
415 (0.09/day)
Location
Sweden
when I've already used milliseconds to start my task the gpu have already been able to do the same and when it starts it does it in a shorter time.
the point was that avx is meant to replace sse for tasks running for minutes, at that point a gpu might as well be used.
also Smaller tasks using heavy avx is worthless.

AVX heavy is still beneficial when no gpu is present to do said work, but even at that point the avx improvements have been lost to lower clock speeds and the tounted 25% ipc is ipc and not effeciency nor speed which was my points against avx being our saviour.

I won't write off avx being an failure just that it does see some major hurdles and issues to gain more traction and is generally misunderstood by many because many do say that 25% ipc and it's a game changer while it's not, it has it's nichè's but that's about it.
Having AVX-512 makes these processors very capable of AI inferencing. So it's of much greater use then what you are considering.
 
Joined
Feb 1, 2013
Messages
1,273 (0.29/day)
System Name Gentoo64 /w Cold Coffee
Processor 9900K 5.2GHz @1.312v
Motherboard MXI APEX
Cooling Raystorm Pro + 1260mm Super Nova
Memory 2x16GB TridentZ 4000-14-14-28-2T @1.6v
Video Card(s) RTX 4090 LiquidX Barrow 3015MHz @1.1v
Storage 660P 1TB, 860 QVO 2TB
Display(s) LG C1 + Predator XB1 QHD
Case Open Benchtable V2
Audio Device(s) SB X-Fi
Power Supply MSI A1000G
Mouse G502
Keyboard G815
Software Gentoo/Windows 10
Benchmark Scores Always only ever very fast
Was going to suggest that AVX-512 doesn't have a place in an i3, but the internet landscape isn't the same anymore. Not too long ago, Google search has gone full-bore HTTPS. Browsers are also enforcing this as well, by flagging HTTP sites and warning about un-certified HTTPS certificates. As a result, the low-end CPUs do need a boost in cryptographic efficiency to keep up.

https://blog.cloudflare.com/on-the-dangers-of-intels-frequency-scaling/
 
Joined
Jun 10, 2014
Messages
3,006 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Was going to suggest that AVX-512 doesn't have a place in an i3, but the internet landscape isn't the same anymore. Not too long ago, Google search has gone full-bore HTTPS. Browsers are also enforcing this as well, by flagging HTTP sites and warning about un-certified HTTPS certificates. As a result, the low-end CPUs do need a boost in cryptographic efficiency to keep up.

https://blog.cloudflare.com/on-the-dangers-of-intels-frequency-scaling/
Do you know why AES is fasted without AVX? It's because modern CPUs have dedicated hardware to do AES, the CPUs are in fact a mixture of singular ALUs and FPUs, vector units (SIMD) and application specific hardware accelerators for AES etc. Have you ever wondered how your phone manages to play videos at ~1W or your Blu-ray player manages to decode 4K h265 with ease? It's all due to application specific hardware, either ASICs or integrated into the CPU. (This is also how people do Bitcoin mining…) Such chips will always offer the ultimate performance for that specific workload, but will be 100% useless for anything else.

This application specific hardware is essentially a whole algorithm implemented in hardware, and that's fine when you design a piece of hardware for a dedicated use case. But this hardware can't be upgraded in software to support new algorithms/codecs/standards. This might not be a big deal for a cell phone you toss away every other year anyway, but is annoying for hardware which is meant to last, both in the consumer or professional space. Intel have offered acceleration for AES-256, SHA-1 and various video codecs since Sandy Bridge, and have since then extended that to include "dead" formats like JPEG(!). The problem with this is an ever-increasing amount of die space and power consumed by dedicated accelerators, which all eventually become less relevant. This die space could have been spent on general ALUs, FPUs and vector/SIMD units, which can accelerate anything.

Regarding the complaint of frequency scaling with AVX. Even though AVX may operate at lower clocks than non-AVX instructions, they outperform any ALU/FPU operations by a factor of 10-20× or more, and is superior in terms of energy efficiency. The only thing beating SIMD in efficiency is application specific hardware, but then again this is application specific, making it useless for anything else. CPUs today have hit a frequency wall, and the amount of voltage needed to operate at ~5 GHz vs. ~4 GHz is extreme. There is no way we can continue to scale like this, and the aggressive boost from both Intel and AMD is already pushing it too far. The only way forward is increasing and balancing single ALUs/FPUs and SIMD units. A CPU with more actual throughput at lower clocks is still faster than one trying to push a little higher clocks.
 
Top