Monday, May 29th 2023

TSMC N3 Nodes Show SRAM Scaling is Hitting the Wall

When TSMC introduced its N3 lineup of nodes, the company only talked about the logic scaling of the two new semiconductor manufacturing steps. However, it turns out that there was a reason for it, as WikiChip confirms that the SRAM bit cells of N3 nodes are almost identical to the SRAM bit cells of N5 nodes. At TSMC 2023 Technology Symposium, TSMC presented additional details about its N3 node lineup, including logic and SRAM density. For starters, the N3 node is TSMC's "3 nm" node family that has two products: a Base N3 node (N3B) and an Enhanced N3 node (N3E). The base N3B uses a new (for TSMC) self-aligned contact (SAC) scheme that Intel introduced back in 2011 with a 22 nm node, which improves the node's yield.

Regardless of N3's logic density improvements compared to the "last-generation" N5, the SRAM density is almost identical. Initially, TSMC claimed N3B SRAM density was 1.2x over the N5 process. However, recent information shows that the actual SRAM density is merely a 5% difference. With SRAM taking a large portion of the transistor and area budget of a processor, N3B's soaring manufacturing costs are harder to justify when there is almost no area improvement. For some time, SRAM scaling wasn't following logic scaling; however, the two have now completely decoupled.
Source: WikiChip
Add your own comment

31 Comments on TSMC N3 Nodes Show SRAM Scaling is Hitting the Wall

#1
TristanX
there is no problem with small size of caches, but problem with unoptimized software.
For well optimized software, few megabytes of cache is sufficient
Posted on Reply
#2
mechtech
When you take the time to look at the graph and realize its logarithmic on the area, it has been flatlined since 5nm, and if you include 7nm, it's still a pretty flat line.
Posted on Reply
#3
Wye
TristanXthere is no problem with small size of caches, but problem with unoptimized software.
For well optimized software, few magabytes of cache is sufficient
MAGA bytes now? Damn, the Trump fans are getting desperate.
Posted on Reply
#4
Count von Schwalbe
How long until the SRAM is on a separate chip entirely (think X3D style) and the logic chip is only cores and interconnect?
Posted on Reply
#5
AnotherReader
Backside power delivery, or PowerVia in Intel parlance, should help with SRAM scaling. Nanosheet transistors will also help, but these are all slated for either Intel's 20A node or TSMC's N2P node. These aren't expected to be available until 2024 and 2026 respectively.
Count von SchwalbeHow long until the SRAM is on a separate chip entirely (think X3D style) and the logic chip is only cores and interconnect?
That will increase latency of SRAM as off-chip communication is costly in both latency and power. It could only be done with large, last level caches like AMD's LLC for RDNA3. Smaller caches like L1 and L2 will remain on-chip.
Posted on Reply
#6
TumbleGeorge
It's a miracle that some SRAM scaling still fits between 7nm and 3nm. ASML's 3000 series(3400&3600) lithography scanners are both fully identical wavelengths.
Posted on Reply
#7
AnotherReader
TumbleGeorgeIt's a miracle that some SRAM scaling still fits between 7nm and 3nm. ASML's 3000 series(3400&3600) lithography scanners are both fully identical wavelengths.
It's not a miracle. The light source is a necessary part of the process, but it doesn't govern the minimum size of the current processes which are all greater than 13.5 nm. Besides, N7 doesn't use EUV. Instead, it uses light with a wavelength of 193 nm.
Posted on Reply
#8
thegnome
TumbleGeorgeIt's a miracle that some SRAM scaling still fits between 7nm and 3nm. ASML's 3000 series(3400&3600) lithography scanners are both fully identical wavelengths.
Any chance of any new scanners having shorter wavelengths then? If we can't go further than that we'll be stuck with the chips only getting tiny improvements.
Posted on Reply
#10
TumbleGeorge
AnotherReaderN7 doesn't use EUV
Yes, N7 doesn't use EUV, but there is much more than one "7"nm variants.
Posted on Reply
#11
TheoneandonlyMrK
Count von SchwalbeHow long until the SRAM is on a separate chip entirely (think X3D style) and the logic chip is only cores and interconnect?
You answered your own question x3D already brought that.

First off die would be L3, they're not getting the L1/2 cache's off die, the optic chips or another massive in memory compute evolution is necessary to change that I think.
Posted on Reply
#12
AnotherReader
TristanXthere is no problem with small size of caches, but problem with unoptimized software.
For well optimized software, few magabytes of cache is sufficient
In the real world, the working set of most programs isn't defined by their code. Perhaps you have heard of servers that usually have hundreds of GB of RAM. Do you think they would do fine with CPUs with less than 10 MB of last level cache.
TumbleGeorgeYes, N7 doesn't use EUV, but there is much more than one "7"nm variants.
True, but the most popular variant is the one that forgoes EUV.
Posted on Reply
#13
cchi
AnotherReaderBackside power delivery, or PowerVia in Intel parlance, should help with SRAM scaling. Nanosheet transistors will also help, but these are all slated for either Intel's 20A node or TSMC's N2P node. These aren't expected to be available until 2024 and 2026 respectively.


That will increase latency of SRAM as off-chip communication is costly in both latency and power. It could only be done with large, last level caches like AMD's LLC for RDNA3. Smaller caches like L1 and L2 will remain on-chip.
With proper die stacking there is no large latency penalty, heck it might even be lower due to lower distance in z direction compared to x-y.

What is a problem though is heat dissipation, which is why it currently is limited to the LLC of Zen3/4, because of its lower power density compared to the core area.
Still the X3D chips run much hotter due to the structural silicon pieces, but would be even hotter if it was covered with active silicon.
Posted on Reply
#14
AnotherReader
cchiWith proper die stacking there is no large latency penalty, heck it might even be lower due to lower distance in z direction compared to x-y.

What is a problem though is heat dissipation, which is why it currently is limited to the LLC of Zen3/4, because of its lower power density compared to the core area.
Still the X3D chips run much hotter due to the structural silicon pieces, but would be even hotter if it was covered with active silicon.
I was thinking of non stacked chips, but your're right; die stacking solves the downsides of off-chip cache, but in its current form, it brings new issues too.
Posted on Reply
#15
kondamin
Can't the mosfet's be stacked so the sram cell is flipped 90°?
Posted on Reply
#16
TumbleGeorge
thegnomeAny chance of any new scanners having shorter wavelengths then? If we can't go further than that we'll be stuck with the chips only getting tiny improvements.
Yes 5000 series. Very first 5000 are delivered to Intel. First 5200 will be delivered in 2024.
Posted on Reply
#17
PhantomTaco
thegnomeAny chance of any new scanners having shorter wavelengths then? If we can't go further than that we'll be stuck with the chips only getting tiny improvements.
With all of lithography the process of converting to a "shorter wavelength" means either an optical improvement (lenses/mirrors) or a new light source. At this point, there's not many good candidates for a new light source sub 13.5nm. Like someone else said in the thread, the ASML EXE platform is the next step on the optics side of things to reduce the wavelength. The platform is also called High NA (Numerical Aperture), and essentially allow for wavelength reductions down to around 8nm. The core design behind how the light source is generated, however, remains the same as the current EUV tools.

For more information on how these minimium resolutions are calculated, you can look into the Rayleigh Criterion, which is basically what governs all of this in terms of minimum critical dimension
Posted on Reply
#18
Wirko
kondaminCan't the mosfet's be stacked so the sram cell is flipped 90°?
That would describe the CFET (complementary FET), which is a stack of two transistors. Yes, just two. And I'm not sure anyone has produced even an experimental working chip with those.
Posted on Reply
#19
Panther_Seraphin
thegnomeAny chance of any new scanners having shorter wavelengths then? If we can't go further than that we'll be stuck with the chips only getting tiny improvements.
From what I heard that 13.5nm is the optimal wavelength to etch on current materials as anything smaller tends to go through the material vs reflect/etch



So it will probably take a massive leap in materials technolgy again to get the next "leap" vs just optimising 13.5nm utilisation.
Posted on Reply
#20
Wirko
AnotherReaderSmaller caches like L1 and L2 will remain on-chip.
AMD said the stacked L3 chip adds four clock cycles to access latency. Assuming the same were true for L2, it might actually be beneficial if a Zen core could have, for example, 1 MB plus stacked 2 MB of L2 compared to just 1 MB of faster L2.
Posted on Reply
#21
Count von Schwalbe


L1 and L2 are nothing compared to the vast expanse of L3.

What seems likely is a "blank area" where the L3 sits currently, with interconnects on-chip but no actual transistors. Then the L3, made on a larger node, is laid in the same area but is considerably higher capacity.
Posted on Reply
#22
Wirko
Count von SchwalbeL1 and L2 are nothing compared to the vast expanse of L3.
What do you mean, nothing? 1 MB of L2 is about one third the size of a slice of L3 (= 4 MB next to each core).
Posted on Reply
#23
Count von Schwalbe
WirkoWhat do you mean, nothing? 1 MB of L2 is about one third the size of a slice of L3 (= 4 MB next to each core).
You have 4X the L3 as L2, and that is on Zen 4. I understand that L3 sizes are going to increase again pretty soon.
Posted on Reply
#24
lexluthermiester
Panther_SeraphinSo it will probably take a massive leap in materials technolgy again to get the next "leap" vs just optimising 13.5nm utilisation.
This. We need a replacement for strained Silicon.
Posted on Reply
#25
Chrispy_
This is why AMD pushed cache+memory for Navi 31 to older process node chiplets - because the cache and memory controller scaling with new process nodes has been up against diminishing returns for several years. The fact that TSMC are admitting almost non-existent cache scaling isn't news, it's been godawful for the last half-dozen node shrinks.

IMO the first-gen GPU chiplet design barely justified the effort, but it ought to improve with subsequent generations.
Posted on Reply
Add your own comment
Nov 17th, 2024 14:20 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts