• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Intel "Alder Lake-S" Confirmed to Introduce LGA1700 Socket, Technical Docs Out for Partners

What I think Intel should do is connect the two chips with traces thru the substrate itself and call it hyper tunneling. Basically convert hyper threading into actual physical cores on another package with a chip that matches the base clock performance and activate when turbo boost performances heat throttles. Going further because voltages rise and fall naturally peaks and dips they could make each physical core have 3 threads then sync them to match and put a physical core on each dip representing base clock performance and the peak representing the turbo boost performance. That way when the turbo boost performance throttles the two physical cores on each rising and falling signal take over allowing the turbo performance to cool down and kick back in sooner. Squeeze more turbo cores onto a single package and supplement that performance more base clock cores from another package in the form of hyper threading with the turbo performance sandwiched in between.

The cool thing is the two CPU packages could ping pong the power throttling off and on between inactivity and activity so when one package gets engaged the other can disengage and to reduce heat and energy. If they can do that and sync it well it could be quite effective much the fan profiles on GPU's at least when setup and working right are quite nice from the 0db fan profiles to just when they trigger higher fan RPM's to operate and how long they operate cooling things down and then wind down the fan RPM's after they've lowered the GPU temp's.

Im not so sure that'll work as well as you think it might. Plus it'll get expensive from a price per package standpoint.

I think clock skew between the two would be hell and a half to compensate for and manage.
 
Yeah IDK what makes the most sense given the scheduler isn't perfect in the first place in terms of leveraging flexibility perfectly to adapt to best case user scenario's. They need to come up with some kind of practical perk to utilize a big LITTLE design if that's what they are aiming at leveraging. If they could improve hyper threading via a 2nd package maybe that's a option, but if it's practical or possible I'm not certain and I'm certainly not a technical design engineer on the matter. I mean if they took some instruction sets off one package and placed them on the other and used that die space area to leverage the remaining things it already does well I can see that being possibility perhaps. In a scenario like that say you have 4 CPU die packages with some different instructions between them though some might have some universal instruction sets that they all share while some might only have specific ones for example perhaps SSE 4.1/4.2/AVX 2/FMA3 only go on one package while the other lacks those, but makes up for it on other ways. Perhaps Intel puts a new FML instruction set on one package that cover security flaws with the chips designs who knows it's Intel the rabbit holes the limit.
 
Last edited:
Yeah IDK what makes the most sense given the scheduler isn't perfect in the first place in terms of leveraging flexibility perfectly to adapt to best case user scenario's. They need to come up with some kind of practical perk to utilize a big LITTLE design if that's what they are aiming at leveraging. If they could improve hyper threading via a 2nd package maybe that's a option, but if it's practical or possible I'm not certain and I'm certainly not a technical design engineer on the matter.
My issue with big-little designs is the state of OS' schedulers (the ancient Windows scheduler in particular), and how far should we expect OS schedulers to be optimized for specific microarchitectures.

Just balancing HT is bad enough, hopefully if Intel chooses a big-little design on some or all CPUs they will drop HT, the combination of the two would be a scheduling nightmare. If anything, big-little might be easier to balance than HT, if done properly. HT also have complex security considerations, as we've come to learn the past couple of years, and HT sometimes cause latency issues and cache pollution, which does negatively impact some tasks.

I mean if they took some instruction sets off one package and placed them on the other and used that die space area to leverage the remaining things it already does well I can see that being possibility perhaps. In a scenario like that say you have 4 CPU die packages with some different instructions between them though some might have some universal instruction sets that they all share while some might only have specific ones for example perhaps SSE 4.1/4.2/AVX 2/FMA3 only go on one package while the other lacks those, but makes up for it on other ways. Perhaps Intel puts a new FML instruction set on one package that cover security flaws with the chips designs who knows it's Intel the rabbit holes the limit.
I'm very skeptical about having different instruction sets on different cores. I don't know if executables have all ISA features flagged in their header, but this would be a requirement.
An alternative would be to implement slower FPUs which uses fewer transistors and more clocks for the little cores, but retain ISA compatibility.
 
My issue with big-little designs is the state of OS' schedulers (the ancient Windows scheduler in particular), and how far should we expect OS schedulers to be optimized for specific microarchitectures.

Just balancing HT is bad enough, hopefully if Intel chooses a big-little design on some or all CPUs they will drop HT, the combination of the two would be a scheduling nightmare. If anything, big-little might be easier to balance than HT, if done properly. HT also have complex security considerations, as we've come to learn the past couple of years, and HT sometimes cause latency issues and cache pollution, which does negatively impact some tasks.


I'm very skeptical about having different instruction sets on different cores. I don't know if executables have all ISA features flagged in their header, but this would be a requirement.
An alternative would be to implement slower FPUs which uses fewer transistors and more clocks for the little cores, but retain ISA compatibility.
To that I'll argue that I think we should certainly expect OS schedulers to improve in particular the ancient Windows one. I think HT is likely on it's way to being phased back out in favor more physical cores to do what HT was a stop gap solution to in the first place, but a convoluted scheduling mess especially on a OS like Windows that's poorly optimized in that area. I see HT as adding a layer of complexity that doesn't even achieve what it sets out to in the first place. When it works it's fine, but when it doesn't it's a mess. HT takes up some die space I'm sure as well that might be better to just use for more legitimate resources. I think the bigger issue with the Windows scheduler is scaling moving forward clearly looks to be at a bit of impasse at the very high end for some of these extremely multi-core AMD chips. Basically AMD has pushed the core count much higher than Microsoft seemingly anticipated and have been caught with it's pants down. It's to the point where the HT on the AMD chips are a real bottleneck and you're better off outright disabling them to avoid all the thread contention or that was my take away from some Linus's benchmarks on one of those AMD Uber FX chips.

I think with all the thread contention in mind getting rid of HT entirely could make more sense going forward especially as we're able to utilize more legitimate physical cores now today anyway. It's my belief that it'll lead to more consistent and reliable performance as a whole. There are of course middle ground solutions like taking a single HT and spreading it adjacently between two CPU core's that could be utilize in a round robin nature on a need be basis. By doing it that way AMD/Intel could diminish the overall scheduler contention issue in extreme chip core count scenario's til or if Microsoft is able to better resolve those concerns and issues.

I think the big thing is different options needs to be on the table presented and considered the CPU has evolve if it wishes to improve. I think big LITTLE certainly presents itself as a option to inserted somewhere in the overall grand scheme of things going forward, but where it injects itself is hard to say and the first designing on something radically different is always the biggest learning curve.
 
I think HT is likely on it's way to being phased back out in favor more physical cores to do what HT was a stop gap solution to in the first place, but a convoluted scheduling mess especially on a OS like Windows that's poorly optimized in that area. I see HT as adding a layer of complexity that doesn't even achieve what it sets out to in the first place. When it works it's fine, but when it doesn't it's a mess. HT takes up some die space I'm sure as well that might be better to just use for more legitimate resources.
At the time, adding HT only costed a few percent extra transistors, and allowed to utilize some of the stalled clock cycles for other threads. As CPUs have grown more efficient, this waste has been reduced, so there are less and less free cycles to use. Additionally CPUs are only growing more reliant on cache and prefetching, so having two threads share this can certainly hurt performance. Thirdly, the ever-advancing CPU front-ends results in more and more complexity to handle HT/SMT safely (which they failed to do). I believe we're at the point where it should be cut, as it makes less and less sense for non-server workloads.

One interesting thing is the rumors of AMD moving to 4-way SMT. I do sincerely hope this is either untrue or limited to server CPUs. This is the wrong move.
 
I think Big-Little makes a lot of sense, expecially considering the work apple did with the M1, which smokes Intel's previous offerings on the platform and runs circles around most - if not all - solutions currently on the market when running native apps. A non-symmetrical core design seems the way to go to improve both power efficiency and performance. And if Apple could do it and implement in iOS, I don't see why Microsoft couldn't.
 
I think Big-Little makes a lot of sense, expecially considering the work apple did with the M1, which smokes Intel's previous offerings on the platform and runs circles around most - if not all - solutions currently on the market when running native apps. A non-symmetrical core design seems the way to go to improve both power efficiency and performance. And if Apple could do it and implement in iOS, I don't see why Microsoft couldn't.
I guess you dont know anything about apple because they control everything from the hardware to the software.
I guess you dont know anything about Microsoft either, they only control the software. Kinda hard for Microsoft and/or intel to do something like apple did with the M1, all companies would have to sit down and agree to a joint multiple company agreement, good luck with that.
On your next post, I would advice on teaching yourself more about tech companies.
 
I guess you dont know anything about apple because they control everything from the hardware to the software.
I guess you dont know anything about Microsoft either, they only control the software. Kinda hard for Microsoft and/or intel to do something like apple did with the M1, all companies would have to sit down and agree to a joint multiple company agreement, good luck with that.
On your next post, I would advice on teaching yourself more about tech companies.
Yeap. AMD wrote about this as the main problem to big/little... it's useless on windows due to the scheduler not knowing how to manage or make use of it.
 
Yeap. AMD wrote about this as the main problem to big/little... it's useless on windows due to the scheduler not knowing how to manage or make use of it.
Its funny how a TPU "news editor" would write that and think it would be easy.
 
Its funny how a TPU "news editor" would write that and think it would be easy.
Well it does make sense but it's not practical given this is MS we are talking about and their scheduler. And for AMD's part they were talking about developing a way of doing it in hardware since... well MSFT. lol
 
Back
Top