• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA cuLitho Computational Lithography Platform is Moving to Production at TSMC

GFreeman

News Editor
Staff member
Joined
Mar 6, 2023
Messages
1,423 (2.43/day)
TSMC, the world leader in semiconductor manufacturing, is moving to production with NVIDIA's computational lithography platform, called cuLitho, to accelerate manufacturing and push the limits of physics for the next generation of advanced semiconductor chips. A critical step in the manufacture of computer chips, computational lithography is involved in the transfer of circuitry onto silicon. It requires complex computation - involving electromagnetic physics, photochemistry, computational geometry, iterative optimization and distributed computing. A typical foundry dedicates massive data centers for this computation, and yet this step has traditionally been a bottleneck in bringing new technology nodes and computer architectures to market.

Computational lithography is also the most compute-intensive workload in the entire semiconductor design and manufacturing process. It consumes tens of billions of hours per year on CPUs in the leading-edge foundries. A typical mask set for a chip can take 30 million or more hours of CPU compute time, necessitating large data centers within semiconductor foundries. With accelerated computing, 350 NVIDIA H100 Tensor Core GPU-based systems can now replace 40,000 CPU systems, accelerating production time, while reducing costs, space and power.



NVIDIA cuLitho brings accelerated computing to the field of computational lithography. Moving cuLitho to production is enabling TSMC to accelerate the development of next-generation chip technology, just as current production processes are nearing the limits of what physics makes possible.

"Our work with NVIDIA to integrate GPU-accelerated computing in the TSMC workflow has resulted in great leaps in performance, dramatic throughput improvement, shortened cycle time and reduced power requirements," said Dr. C.C. Wei, CEO of TSMC, at the GTC conference earlier this year.

NVIDIA has also developed algorithms to apply generative AI to enhance the value of the cuLitho platform. A new generative AI workflow has been shown to deliver an additional 2x speedup on top of the accelerated processes enabled through cuLitho.

The application of generative AI enables creation of a near-perfect inverse mask or inverse solution to account for diffraction of light involved in computational lithography. The final mask is then derived by traditional and physically rigorous methods, speeding up the overall optical proximity correction process by 2x.

The use of optical proximity correction in semiconductor lithography is now three decades old. While the field has benefited from numerous contributions over this period, rarely has it seen a transformation quite as rapid as the one provided by the twin technologies of accelerated computing and AI. These together allow for the more accurate simulation of physics and the realization of mathematical techniques that were once prohibitively resource-intensive.

This enormous speedup of computational lithography accelerates the creation of every single mask in the fab, which speeds the total cycle time for developing a new technology node. More importantly, it makes possible new calculations that were previously impractical.

For example, while inverse lithography techniques have been described in the scientific literature for two decades, an accurate realization at full chip scale has been largely precluded because the computation takes too long. With cuLitho, that's no longer the case. Leading-edge foundries will use it to ramp up inverse and curvilinear solutions that will help create the next generation of powerful semiconductors.

View at TechPowerUp Main Site | Source
 
Joined
Jan 3, 2021
Messages
3,343 (2.43/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
Wow.

(That's my comment to the PR, not to the first comment.)
 
Low quality post by Steevo
Joined
Nov 4, 2005
Messages
11,939 (1.73/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
Joined
Jan 3, 2021
Messages
3,343 (2.43/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
I don't understand why it needs such a huge amount of computation. A single photomask is a one-bit very high resolution image, and processing involves some special kind of sharpening and blurring, right? But the image contains a huge number of repeating patterns (such as SRAM cells), so it should be possible to reuse the results very many times.
 
Last edited by a moderator:
Joined
Mar 16, 2017
Messages
224 (0.08/day)
Location
behind you
Processor Threadripper 1950X (4.0 GHz OC)
Motherboard ASRock X399 Professional Gaming
Cooling Enermax Liqtech TR4
Memory 48GB DDR4 2934MHz
Video Card(s) Nvidia GTX 1080, GTX 660TI
Storage 2TB Western Digital HDD, 500GB Samsung 850 EVO SSD, 280GB Intel Optane 900P
Display(s) 2x 1920x1200
Power Supply Cooler Master Silent Pro M (1000W)
Mouse Logitech G602
Keyboard Corsair K70 MK.2
Software Windows 10
About the technology itself, I don't understand why it needs such a huge amount of computation. A single photomask is a one-bit very high resolution image, and processing involves some special kind of sharpening and blurring, right? But the image contains a huge number of repeating patterns (such as SRAM cells), so it should be possible to reuse the results very many times.
The problem is that lithography today is well below the wavelength of light used to expose the photoresist. As the post suggests, you normally need to perform full electromagnetic wave simulations to get usable lithography masks. What's on the masks is very different from what's on the GDSII files that's given to the fab. And while repeating patterns like SRAM can in principal help, chips contain more than that and you need to generate very, very high resolution masks and with multiple patterning you may need to generate multiple masks for every layer (of which there are quite a few these days).
 
Joined
May 3, 2018
Messages
2,849 (1.21/day)
About the technology itself, I don't understand why it needs such a huge amount of computation. A single photomask is a one-bit very high resolution image, and processing involves some special kind of sharpening and blurring, right? But the image contains a huge number of repeating patterns (such as SRAM cells), so it should be possible to reuse the results very many times.
How do you design the mask in the first place and then you have to write the mask which requires ludicrously complex optics which only gets more ludicrous for High NA EUV.

Anandtech had a nce little article on cuLitho last year: Here's short excerpt

Modern process technologies push wafer fab equipment to its limits and often require finer resolution than is physically possible, which is where computational lithography comes into play. The primary purpose of computational lithography is to enhance the achievable resolution in photolithography processes without modifying the tools. To do so, CL employs algorithms that simulate the production process, incorporating crucial data from ASML's equipment and shuttle (test) wafers. These simulations aid in refining the reticle (photomask) by deliberately altering the patterns to counteract the physical and chemical influences that arise throughout the lithography and patterning steps.
 
Last edited by a moderator:
Joined
Dec 25, 2020
Messages
6,297 (4.54/day)
Location
São Paulo, Brazil
System Name "Icy Resurrection"
Processor 13th Gen Intel Core i9-13900KS Special Edition
Motherboard ASUS ROG MAXIMUS Z790 APEX ENCORE
Cooling Noctua NH-D15S upgraded with 2x NF-F12 iPPC-3000 fans and Honeywell PTM7950 TIM
Memory 32 GB G.SKILL Trident Z5 RGB F5-6800J3445G16GX2-TZ5RK @ 7600 MT/s 36-44-44-52-96 1.4V
Video Card(s) ASUS ROG Strix GeForce RTX™ 4080 16GB GDDR6X White OC Edition
Storage 500 GB WD Black SN750 SE NVMe SSD + 4 TB WD Red Plus WD40EFPX HDD
Display(s) 55-inch LG G3 OLED
Case Pichau Mancer CV500 White Edition
Power Supply EVGA 1300 G2 1.3kW 80+ Gold
Mouse Microsoft Classic Intellimouse
Keyboard Generic PS/2
Software Windows 11 IoT Enterprise LTSC 24H2
Benchmark Scores I pulled a Qiqi~
I recall they did something similar with Hairworks and it was proven it was done on purpose since it would bog down other gpus but not theirs for no good reason. So maybe something similar now?

This is an advanced chipmaking tool, despite the funny name. Sabotaging this in any way sabotages only themselves.

Hairworks was never designed intentionally to cripple Radeon cards, instead it was the Radeons who had (and to an extent, still have if we're talking DirectX 11) poor instancing performance.

TressFX was then designed to largely solve the problem, although if you recall the few games that use it (such as Tomb Raider 2013) still perform relatively poorly on hardware like Tahiti and Hawaii with TressFX on.
 
Joined
Dec 6, 2022
Messages
333 (0.49/day)
Location
NYC
System Name GameStation
Processor AMD R5 5600X
Motherboard Gigabyte B550
Cooling Artic Freezer II 120
Memory 16 GB
Video Card(s) Sapphire Pulse 7900 XTX
Storage 2 TB SSD
Case Cooler Master Elite 120
This is an advanced chipmaking tool, despite the funny name. Sabotaging this in any way sabotages only themselves.

Hairworks was never designed intentionally to cripple Radeon cards, instead it was the Radeons who had (and to an extent, still have if we're talking DirectX 11) poor instancing performance.

TressFX was then designed to largely solve the problem, although if you recall the few games that use it (such as Tomb Raider 2013) still perform relatively poorly on hardware like Tahiti and Hawaii with TressFX on.
You are correct, shouldn't apply on this scenario.

About Hairworks, knowing Ngreedia, we cannot say so firmly that they didn't have a second intention. Please remember, since they bought Aegia and removed the access to PhysX, they have been trying to get people locked into their hardware with such shenanigans.

Anyways, did a quick search and it was refreshing and frustrating to recall what I normally say here, back then, consumers and reviewers would call out such proprietary tech, instead of how we now see the blind worshipping of DLSS.

Here are a couple of links, not to expand much, since is definitely off topic:

https://www.reddit.com/r/witcher/comments/36jpe9
https://www.reddit.com/r/pcgaming/comments/36xyst
Funny how history seems to repeat itself when people used to say that Hairworks and even TressFX was just a gimmick not worth the performance hit. Sounds like RT to me today.:D:D
 
Joined
Jan 3, 2021
Messages
3,343 (2.43/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
The problem is that lithography today is well below the wavelength of light used to expose the photoresist. As the post suggests, you normally need to perform full electromagnetic wave simulations to get usable lithography masks. What's on the masks is very different from what's on the GDSII files that's given to the fab. And while repeating patterns like SRAM can in principal help, chips contain more than that and you need to generate very, very high resolution masks and with multiple patterning you may need to generate multiple masks for every layer (of which there are quite a few these days).
Photoshop with a specific sharpening plug-in would easily handle that on a single PC with a GPU ... for a single 10-megapixel image. That's after the simulations of physical and chemical effects are done, including pesky non-linear effects, which requires much more than a PC - but needs to be done only once.
But I understand this is on another scale. The real images here could be about 10 megapixels big in each direction, so a hundred terapixels or so, and there are a few masks for EUV plus a few tens for DUV. Waiting weeks for the computations alone is a lot of money wasted. Therefore a few hundred H100 accelerators are probably a good investment. So much better if TSMC had a very inefficient system until now. 40,000 CPUs, huh? Does than mean processing on CPUs, not GPUs?
It's unclear what the role of AI is in this processing. Maybe it can identify many more repeating patterns than other methods, so it can reduce the amount of computation by 2x.
 
Top