• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel "Raptor Lake" is a 24-core (8 Big + 16 Little) Processor

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,173 (2.78/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
You are really grasping at straws here.
Doing the same work without AVX (or other SIMD) would usually require >20x the instructions, and you want to offset that extra power required by running the core at a very low clock speed, probably making it about 100x slower, this is not a very realistic usage scenario.
The fact remains that AVX is more power efficient.
Once again, you're assuming the vector is completely occupied which is a bad assumption. I'm not saying your wrong, I'm saying your premise is bad. Also you're also assuming that the increased time to execute is going to harm performance when you have no idea if it's the bottleneck for the application. If it's running on a low power core, it's probably not, otherwise it wouldn't be running there. The reality is that the case you're describing is the case that'd already be hitting the high power cores.
I assume you are still talking in the context of auto-vectorizing here.
Your assumptions here about saturating the vector units is fundamentally flawed. Auto-vectorizing only happens when the data is dense and the operations in a loop easily translates to AVX operations. It's not like the compiler will take random FPU operations and stuff them together in vectors.

Auto-vectorization will not hurt your efficiency or performance, but there are some considerations;
- Sometimes the gains are negligible, because the code is too bloated, the data isn't dense and/or the operations inside the loops aren't simple enough.
- If FMA is enabled, the data produced will no longer be binary compatible, which may or may not be a problem.
Agreed on FMA. If you vectorize an operation though, there is a power penelty from driving the full width of AVX if you're not using the entire thing. Even half occupancy in situations where time isn't the limiting factor, a smaller, slim core is likely going to use less power. I think you underestimate how many more transistors it takes to implement AVX and the cache backing to support it. Just because auto-vectorization runs doesn't mean that it's always a perfect situation where you'll get 100% occupancy. Sure, performance might not get worse, but power consumption can.
ARM achieve efficiency with special instructions to accelerate specific workloads, and yes, ASIC will beat SIMD in efficiency, but SIMD is general purpose.
...and Apple specifically made the choice to make slim cores that couldn't do everything for that case. It's not like the slim cores are ASIC circuits. That hardware is going to be in the high power cores. Once again, you're focusing on full load, high power situations. Outside of that context, it doesn't make sense to have that kind of bulk in the slim cores because you're negating the advantage they have and you might as well just have a bunch of high power cores instead, but the reality is that most CPUs aren't running full tilt most of the time and there are efficiency benefits to be had from that.
Hell, i learned a fair bit about AVX from you nerds arguing


keep it up, education is good
Yes it is, but so is properly understanding the problem being solved.
 
Top