Monday, December 9th 2024

Intel 18A Yields Are Actually Okay, And The Math Checks Out

Editorial by

Dec 9th, 2024 00:49 Discuss (40 Comments)

A few days ago, we published a report about Intel's 18A yields being at an abysmal 10%. This sparked quite a lot of discussion among the tech community, as well as responses from industry analysts and Intel's now ex-CEO Pat Gelsinger. Today, we are diving into known information about Intel's 18A node and checking out what the yields of possible products could be, using tools such as Die Yield Calculator from SemiAnalysis. First, we know that the defect rate of the 18A node is 0.4 defects per cm². This information is from August, and up-to-date defect rates could be much lower, especially since semiconductor nodes tend to evolve even when they are production-ready. To measure yields, manufacturers use various yield models based on the information they have, like the aforementioned 0.4 defect density. Expressed in defects per square centimeter (def/cm²), it measures manufacturing process quality by quantifying the average number of defects present in each unit area of a semiconductor wafer.

Measuring yields is a complex task. Manufacturers design some smaller chips for mobile and some bigger chips for HPC tasks. Thus, these two would have different yields, as bigger chips require more silicon area and are more prone to defects. Smaller mobile chips occupy less silicon area, and defects occurring on the wafer often yield more usable chips than wasted silicon. Stating that a node only yields x% of usable chips is only one side of the story, as the size of the test production chip is not known. For example, NVIDIA's H100 die is measuring at 814 mm²—a size that is pushing modern manufacturing to its limits. The size of a modern photomask, the actual pattern mask used in printing the design of a chip to silicon wafer, is only 858 mm² (26x33 mm). Thus, that is the limit before exceeding the mask and needing a redesign. At that size, nodes are yielding much less usable chips than something like a 100 mm² mobile chip, where defects don't wreak havoc on the yield curve.

Next, the problem occurs with the actual design of a chip. Each silicon print for each specific chip will carry its signature pattern to be etched onto the silicon wafer. Each design carries its problems and defect rates, where specific placement of wires and transistors onto the silicon wafer can accumulate a lot more defects than other specific designs ready for manufacturing. Even when a chip is designed and ready, it still sometimes gets small redesigns to improve final yields. Remember that makers like NVIDIA pay TSMC the same regardless of its yields, so increasing the yields and improving the design is saving NVIDIA silicon that would otherwise go to waste or trickle down to non-flagship SKUs.

To calculate yields, manufacturers use various yield models—mathematical equations that help fabs better understand and predict yield loss by translating defect density distributions into yield predictions. While several models exist (including Murphy, Exponential, and Seeds models), the fundamental Poisson Yield Model (Y = e^(-AD)) assumes randomly distributed point defects, where Y is yield, A is chip area, and D is defect density. This model is derived from Poisson's probability distribution, which calculates the likelihood of zero defects occurring on a chip. However, the Poisson model is often considered pessimistic because real-world defects tend to cluster rather than distribute randomly, typically resulting in higher actual yields than the model predicts. Manufacturers choose their yield model by comparing various models' predictions against their actual fab data, selecting the one that best fits their specific manufacturing process and conditions.

When using the older data of the Intel 18A node, that d0=0.4, we must check a few different designs and compare their yields before drawing any conclusions. We are today measuring yields using the SemiAnalysis Die Yield Calculator tool, integrating all known yield models, Murphy, Exponential, Seeds, and Poisson. At the EUV reticle size limit of 858 mm² and with an applied 0.4 defect rate of 18A node, the most pessimistic estimate is the Poisson model, which gives a yield of 3.23%. However, the most optimistic (Seeds) model yields 22.56%. The default Murphy's model is yielding only 7.95% of usable chips. That is five good dies on a 300 mm wafer with 59 leftover dies.

A while back, we covered a leak of "Panther Lake," Intel's next-generation Core Ultra 300 series CPUs. The leaked package thankfully included information about die sizes, which we can input into the calculator to find out the yield, assuming Intel is manufacturing Panther Lake compute tiles on the 18A node, using the aforementioned d0 of 0.4. The die number four with a CPU and NPU on it, measuring 8.004x14.288 mm, a 114.304 mm² silicon die, gives a yield of 64.4% on the default Murphy's model. Moore's model is the most pessimistic, with only about 50% of usable dies. The die number five of Panther Lake, housing the Xe3 GPU, is even smaller and measures only 53.6 mm², yielding an impressive 81% of usable dies. Even with the most pessimistic assumptions, the yield curve drops to 60%.

If we assume that Intel has refined its 18A node more, we can conclude that even with some larger designs hitting the EUV machine reticle limit of 858 mm², Intel's yields could be hitting the 50% mark. If we assume that the best player TSMC achieves a 0.1 defect rate, yields of chips at 858 mm² size are barely above 50% using all available models. That is the fully functioning silicon die, of course. Non fully functional dies are later repurposed for lower-end SKUs. With modern Foveros and EMIB packaging, Intel could use smaller dies and interconnect them on a larger interposer to act as a single uniform chip, saving costs and boosting its yields. However, this is only a part of the silicon manufacturing story, as manufacturers use other techniques to save costs.

The initial reported Broadcom disappointment with Intel 18A node was prior to the PDK 1.0, which we assume is now in customer's hands to optimize their designs for version 1.0. Additionally, Intel should be enabling a few more optimizations based on the initial defect rate reported in August. Indeed, the revamp of Intel Foundry has been difficult, and the ousting of CEO Pat Gelsinger may have been a premature move. The semiconductor manufacturing supply chain takes years to fix; it isn't exactly a kid's toy manufacturing supply chain but rather the world's most advanced and complex industry. We assume that the Intel 18A node is fully functional and that we will see external customers pick up Intel's business significantly. Of course, in the beginning, Intel will be its own biggest customer until more fabless designers start rolling in.

Add your own comment

40 Comments on Intel 18A Yields Are Actually Okay, And The Math Checks Out

#26

Nanochip

DavenShould Intel be held to the same standards and not allowed to release any powerpoint slides about products unless they have been released and available to the public? How many times has a code name or process technology never seen the light of day at Intel?

By my count, Intel has lied so many times and cancelled so many products that never were; all just to keep the stock price from tanking that their proverbial nose could reach orbit. How are news sites suppose to publish 100% accurate info under such conditions?

TPU is doing just fine and I always wear my big boy pants when I go onto the internet looking for information. I can figure out things for myself.

Some of us didn’t know how defect density translates into yield percentages… or that chip size is a very relevant consideration. Context is always helpful, especially when rebroadcasting rumors that are potentially harmful to a company’s reputation or stock price. You don’t want to unwittingly be a vector of fake news yourself.

#27

Visible Noise

TristanXIntel said they have defect density 0.4, but it is unknown if this is average density on minimal on best part of wafer. So total PTL yield from wafer may be lower.

It was also four months ago. Who thinks yields havn’t improved?

#28

damric

RAJA KOUDRI for INTEL CEO

#29

Darmok N Jalad

Really, the process nose itself only matters in the advantage it brings. Arrow Lake is on a pretty new process, but it’s not winning them benchmarks, but it does offer some efficiency gains. Intel’s 14nm+++ was pretty old, but it was proven, and it was sufficient to give Intel a performance advantage, at the cost of efficiency. I don’t know that Intel necessarily needs to have the best node, but the best design for the node they are using. Hasn’t NVIDIA been basically doing just that?

#30

Post Nut Clairvoyance

marios15Someone in general - not fun.
Intel - extremely fun!

I meant the amount of half hate (which is deserved all things considered) half mocking posts, which is a lot of bitterness.
I am not at all opposed to mocking intel. And that post was old technology, intel has had much meme materials since then (anyone watching gn would have couple ideas)

#31

scottslayer

So is this the unofficial "correction" post after getting dunked on by literally everyone and their mother in the tech sector?

#32

Chaitanya

This same Pat Gelsinger is calling out yeild rates?
timesofindia.indiatimes.com/technology/tech-news/fired-intel-ceo-will-fast-this-week-for-100000-intel-employees-as-they-/articleshow/116126521.cms

#33

Robin Seina

I think, the author of the article should check-out this one link below, since it looks like he forgets that yield is not just defect rate.
diit-cz.translate.goog/clanek/vyteznost-v-procentech-nebo-denzita-defektu?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=cs&_x_tr_pto=wapp

#34

londiste

Robin SeinaI think, the author of the article should check-out this one link below, since it looks like he forgets that yield is not just defect rate.
diit-cz.translate.goog/clanek/vyteznost-v-procentech-nebo-denzita-defektu?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=cs&_x_tr_pto=wapp

The link brings nothing new to the discussion.

Clock speeds, leak voltages and other parameters are usually handled and phrased separately from yield. The 10% yield number matches very well with defect rate Intel has publicly stated, if the die size is very large. Broadcom does have enormous dies for the chips like the one the news bit said was being evaluated. Lacking other details this is the easiest conclusion to come to.

Lunar Lake - same as Arrow Lake - came now, at a time when by Intel's own claim 18A is just starting to become ready. Given the delays from starting production to launching a product, neither 20A nor 18A were simply not possible for either of those. The questions for Lunar Lake and Arrow Lake might be what is going on with Intel 3 but that one most likely has a very simple answer - Xeons.

#35

Robin Seina

londisteLunar Lake - same as Arrow Lake - came now, at a time when by Intel's own claim 18A is just starting to become ready. Given the delays from starting production to launching a product, neither 20A nor 18A were simply not possible for either of those. The questions for Lunar Lake and Arrow Lake might be what is going on with Intel 3 but that one most likely has a very simple answer - Xeons.

If 18A is at great condition and becoming ready, why is Intel reserving production space at TSMC' 3nm and 2nm processes for its CPU production?

#36

Hecate91

The defect rate could be anything between what Broadcom claims and what others claim, even though it seems like damage control.
But if the yields were fine Intel would be using 18A instead of TSMC for Arrow Lake.

#37

londiste

Robin SeinaIf 18A is at great condition and becoming ready, why is Intel reserving production space at TSMC' 3nm and 2nm processes for its CPU production?

This isn't black and white. Intel is reserving production on TSMC. For what exactly we are not sure. They will be producing GPUs at TSMC, AI and other data center stuff as well. Last desktop CPUs with tiles will need SoC and iGPU dies and Intel is primarily focused on producing the compute dies on their own - if they can.

Remember that in August alongside news they did show Panther Lake with at least compute die supposedly produced on 18A. If I remember correctly Panther Lake is on roadmap for 2025H2. Production is only starting and it is quite a few quarters to go before we see a launched product.

#38

LittleBro

londisteThere are no interesting steps between booting some chips and getting something mass produced. Some form of gradually reducing defect density and that is about it. 18A should be somewhere on the verge of mass production and this is most likely the next news Intel will publish - mass producing this-or-that. After making sure yields are reasonably OK of course.

Also, news was not about 18A booting. They had test stuff that was working earlier. August news was about booting Panther Lake and Clearwater Forest dies produced on 18A.
They also stated that first external customer tape-out is 2025H1 which should give us a timeframe.

So news about something being stable enough just to boot is much more important/interesting than news about process itself?
Stakeholders don't give a **** about booting. They don't even know what word "to boot" stands for.
All they want to know is information on their investments, meaning how many chips can be produced at what cost and what will they sell for.
Everything is about money, after all. Delivering on technology is not the end goal. The end goal is being financially successful.

With such pile of shit Intel is dealing right now, I think their press release dept. would release anything just to mitigate this desperate situation.
Huge layoffs and firing Gelsinger in middle of process in nothing else but a proof to the fact that Intel is encountering serious troubles right now.

There's a saying: A drowning man will clutch at a straw.

#39

londiste

Looks like I was wrong, there are news to be made from interim steps :D
www.techpowerup.com/329790/intel-panther-lake-confirmed-on-18a-node-powering-on-with-es0-silicon-revision

#40

LittleBro

londisteLooks like I was wrong, there are news to be made from interim steps :D
www.techpowerup.com/329790/intel-panther-lake-confirmed-on-18a-node-powering-on-with-es0-silicon-revision

So finally, after booting up, it powers on!

... said interim co-CEO Michelle Johnston Holthaus, adding, "But just to give some assurances, on Panther Lake, we have our ES0 samples out with customers. We have eight customers that have powered on, which gives you just kind of an idea that the health of the silicon is good and the health of the Foundry is good." - Intel

So if I power a chip on, that means the silicon is good and health of the foundry that made that chip is good as well. Oh boy, I miss Pat already, he at least understood technology.
I just realized my 5600X is a godlike product! It not only powers on, then boots OS, but also does everything else and is stable.

So health of the foundry is good. What does it mean for foundry yield? What is considered good?
If you have been manufacturing 10% for a year, even moving to 20% yield might be considered good.

This is what you get when MBA graduate get's to assure customers about technology success.

Add your own comment

Intel 18A Yields Are Actually Okay, And The Math Checks Out

40 Comments on Intel 18A Yields Are Actually Okay, And The Math Checks Out

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

Intel 18A Yields Are Actually Okay, And The Math Checks Out

Related News

40 Comments on Intel 18A Yields Are Actually Okay, And The Math Checks Out

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts