NVIDIA cuLitho Computational Lithography Platform is Moving to Production at TSMC

GFreeman · Oct 8, 2024

TSMC, the world leader in semiconductor manufacturing, is moving to production with NVIDIA's computational lithography platform, called cuLitho, to accelerate manufacturing and push the limits of physics for the next generation of advanced semiconductor chips. A critical step in the manufacture of computer chips, computational lithography is involved in the transfer of circuitry onto silicon. It requires complex computation - involving electromagnetic physics, photochemistry, computational geometry, iterative optimization and distributed computing. A typical foundry dedicates massive data centers for this computation, and yet this step has traditionally been a bottleneck in bringing new technology nodes and computer architectures to market.

Computational lithography is also the most compute-intensive workload in the entire semiconductor design and manufacturing process. It consumes tens of billions of hours per year on CPUs in the leading-edge foundries. A typical mask set for a chip can take 30 million or more hours of CPU compute time, necessitating large data centers within semiconductor foundries. With accelerated computing, 350 NVIDIA H100 Tensor Core GPU-based systems can now replace 40,000 CPU systems, accelerating production time, while reducing costs, space and power.

NVIDIA cuLitho brings accelerated computing to the field of computational lithography. Moving cuLitho to production is enabling TSMC to accelerate the development of next-generation chip technology, just as current production processes are nearing the limits of what physics makes possible.

"Our work with NVIDIA to integrate GPU-accelerated computing in the TSMC workflow has resulted in great leaps in performance, dramatic throughput improvement, shortened cycle time and reduced power requirements," said Dr. C.C. Wei, CEO of TSMC, at the GTC conference earlier this year.

NVIDIA has also developed algorithms to apply generative AI to enhance the value of the cuLitho platform. A new generative AI workflow has been shown to deliver an additional 2x speedup on top of the accelerated processes enabled through cuLitho.

The application of generative AI enables creation of a near-perfect inverse mask or inverse solution to account for diffraction of light involved in computational lithography. The final mask is then derived by traditional and physically rigorous methods, speeding up the overall optical proximity correction process by 2x.

The use of optical proximity correction in semiconductor lithography is now three decades old. While the field has benefited from numerous contributions over this period, rarely has it seen a transformation quite as rapid as the one provided by the twin technologies of accelerated computing and AI. These together allow for the more accurate simulation of physics and the realization of mathematical techniques that were once prohibitively resource-intensive.

This enormous speedup of computational lithography accelerates the creation of every single mask in the fab, which speeds the total cycle time for developing a new technology node. More importantly, it makes possible new calculations that were previously impractical.

For example, while inverse lithography techniques have been described in the scientific literature for two decades, an accurate realization at full chip scale has been largely precluded because the computation takes too long. With cuLitho, that's no longer the case. Leading-edge foundries will use it to ramp up inverse and curvilinear solutions that will help create the next generation of powerful semiconductors.

View at TechPowerUp Main Site | Source

Wirko · Oct 8, 2024

Wow.

(That's my comment to the PR, not to the first comment.)

Steevo · Oct 8, 2024

Sweet ass....

chodaboy19 · Oct 8, 2024

Intel and Samsung foundries could also benefit...

Wirko · Oct 9, 2024

I don't understand why it needs such a huge amount of computation. A single photomask is a one-bit very high resolution image, and processing involves some special kind of sharpening and blurring, right? But the image contains a huge number of repeating patterns (such as SRAM cells), so it should be possible to reuse the results very many times.

OSdevr · Oct 9, 2024

Wirko said:
About the technology itself, I don't understand why it needs such a huge amount of computation. A single photomask is a one-bit very high resolution image, and processing involves some special kind of sharpening and blurring, right? But the image contains a huge number of repeating patterns (such as SRAM cells), so it should be possible to reuse the results very many times.

The problem is that lithography today is well below the wavelength of light used to expose the photoresist. As the post suggests, you normally need to perform full electromagnetic wave simulations to get usable lithography masks. What's on the masks is very different from what's on the GDSII files that's given to the fab. And while repeating patterns like SRAM can in principal help, chips contain more than that and you need to generate very, very high resolution masks and with multiple patterning you may need to generate multiple masks for every layer (of which there are quite a few these days).

Minus Infinity · Oct 10, 2024

Wirko said:
About the technology itself, I don't understand why it needs such a huge amount of computation. A single photomask is a one-bit very high resolution image, and processing involves some special kind of sharpening and blurring, right? But the image contains a huge number of repeating patterns (such as SRAM cells), so it should be possible to reuse the results very many times.

How do you design the mask in the first place and then you have to write the mask which requires ludicrously complex optics which only gets more ludicrous for High NA EUV.

Anandtech had a nce little article on cuLitho last year: Here's short excerpt

Modern process technologies push wafer fab equipment to its limits and often require finer resolution than is physically possible, which is where computational lithography comes into play. The primary purpose of computational lithography is to enhance the achievable resolution in photolithography processes without modifying the tools. To do so, CL employs algorithms that simulate the production process, incorporating crucial data from ASML's equipment and shuttle (test) wafers. These simulations aid in refining the reticle (photomask) by deliberately altering the patterns to counteract the physical and chemical influences that arise throughout the lithography and patterning steps.

Dr. Dro · Oct 10, 2024

Neo_Morpheus said:
I recall they did something similar with Hairworks and it was proven it was done on purpose since it would bog down other gpus but not theirs for no good reason. So maybe something similar now?

This is an advanced chipmaking tool, despite the funny name. Sabotaging this in any way sabotages only themselves.

Hairworks was never designed intentionally to cripple Radeon cards, instead it was the Radeons who had (and to an extent, still have if we're talking DirectX 11) poor instancing performance.

TressFX was then designed to largely solve the problem, although if you recall the few games that use it (such as Tomb Raider 2013) still perform relatively poorly on hardware like Tahiti and Hawaii with TressFX on.

Neo_Morpheus · Oct 10, 2024

Dr. Dro said:
This is an advanced chipmaking tool, despite the funny name. Sabotaging this in any way sabotages only themselves.

Hairworks was never designed intentionally to cripple Radeon cards, instead it was the Radeons who had (and to an extent, still have if we're talking DirectX 11) poor instancing performance.

TressFX was then designed to largely solve the problem, although if you recall the few games that use it (such as Tomb Raider 2013) still perform relatively poorly on hardware like Tahiti and Hawaii with TressFX on.

You are correct, shouldn't apply on this scenario.

About Hairworks, knowing Ngreedia, we cannot say so firmly that they didn't have a second intention. Please remember, since they bought Aegia and removed the access to PhysX, they have been trying to get people locked into their hardware with such shenanigans.

Anyways, did a quick search and it was refreshing and frustrating to recall what I normally say here, back then, consumers and reviewers would call out such proprietary tech, instead of how we now see the blind worshipping of DLSS.

Here are a couple of links, not to expand much, since is definitely off topic:

https://www.reddit.com/r/witcher/comments/36jpe9

https://www.reddit.com/r/pcgaming/comments/36xyst

Funny how history seems to repeat itself when people used to say that Hairworks and even TressFX was just a gimmick not worth the performance hit. Sounds like RT to me today.

Wirko · Oct 10, 2024

OSdevr said:
The problem is that lithography today is well below the wavelength of light used to expose the photoresist. As the post suggests, you normally need to perform full electromagnetic wave simulations to get usable lithography masks. What's on the masks is very different from what's on the GDSII files that's given to the fab. And while repeating patterns like SRAM can in principal help, chips contain more than that and you need to generate very, very high resolution masks and with multiple patterning you may need to generate multiple masks for every layer (of which there are quite a few these days).

Photoshop with a specific sharpening plug-in would easily handle that on a single PC with a GPU ... for a single 10-megapixel image. That's after the simulations of physical and chemical effects are done, including pesky non-linear effects, which requires much more than a PC - but needs to be done only once.
But I understand this is on another scale. The real images here could be about 10 megapixels big in each direction, so a hundred terapixels or so, and there are a few masks for EUV plus a few tens for DUV. Waiting weeks for the computations alone is a lot of money wasted. Therefore a few hundred H100 accelerators are probably a good investment. So much better if TSMC had a very inefficient system until now. 40,000 CPUs, huh? Does than mean processing on CPUs, not GPUs?
It's unclear what the role of AI is in this processing. Maybe it can identify many more repeating patterns than other methods, so it can reduce the amount of computation by 2x.

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

System Name	Compy 386
Processor	7800X3D
Motherboard	Asus
Cooling	Air for now.....
Memory	64 GB DDR5 6400Mhz
Video Card(s)	7900XTX 310 Merc
Storage	Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s)	55" Samsung 4K HDR
Audio Device(s)	ATI HDMI
Mouse	Logitech MX518
Keyboard	Razer
Software	A lot.
Benchmark Scores	Its fast. Enough.

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

Processor	Threadripper 1950X
Motherboard	ASRock X399 Professional Gaming
Cooling	IceGiant ProSiphon Elite
Memory	48GB DDR4 2934MHz
Video Card(s)	MSI GTX 1080
Storage	4TB Crucial P3 Plus NVMe, 1TB Samsung 980 NVMe, 1TB Inland NVMe, 2TB Western Digital HDD
Display(s)	2x 4K60
Power Supply	Cooler Master Silent Pro M (1000W)
Mouse	Corsair Ironclaw Wireless
Keyboard	Corsair K70 MK.2
VR HMD	HTC Vive Pro
Software	Windows 10, QubesOS

Processor	13th Gen Intel Core i9-13900KS
Motherboard	ASUS ROG Maximus Z790 Apex Encore
Cooling	Pichau Lunara ARGB 360 + Honeywell PTM7950
Memory	32 GB G.Skill Trident Z5 RGB @ 7600 MT/s
Video Card(s)	Palit GameRock OC GeForce RTX 5090 32 GB
Storage	500 GB WD Black SN750 + 4x 300 GB WD VelociRaptor WD3000HLFS HDDs
Display(s)	55-inch LG G3 OLED
Case	Cooler Master MasterFrame 700 benchtable
Audio Device(s)	EVGA NU Audio + Sony MDR-V7 headphones
Power Supply	EVGA 1300 G2 1.3kW 80+ Gold
Mouse	Microsoft Classic IntelliMouse
Keyboard	IBM Model M type 1391405
Software	Windows 10 Pro 22H2
Benchmark Scores	I pulled a Qiqi~

System Name	GameStation
Processor	AMD R5 5600X
Motherboard	Gigabyte B550
Cooling	Artic Freezer II 120
Memory	16 GB
Video Card(s)	Sapphire Pulse 7900 XTX
Storage	2 TB SSD
Case	Cooler Master Elite 120

NVIDIA cuLitho Computational Lithography Platform is Moving to Production at TSMC

News Editor