Tuesday, November 1st 2022

Micron Ships World's Most Advanced DRAM Technology With 1-Beta Node

Micron Technology, Inc., announced today that it is shipping qualification samples of its 1β (1-beta) DRAM technology to select smartphone manufacturers and chipset partners and has achieved mass production readiness with the world's most advanced DRAM technology node. The company is debuting its next generation of process technology on its low-power double data rate 5X (LPDDR5X) mobile memory, delivering top speed grades of 8.5 gigabits (Gb) per second. The node delivers significant gains across performance, bit density and power efficiency that will have sweeping market benefits. Beyond mobile, 1β delivers the low-latency, low-power, high-performance DRAM that is essential to support highly responsive applications, real-time services, personalization and contextualization of experiences, from intelligent vehicles to data centers.

The world's most advanced DRAM process node, 1β represents an advancement of the company's market leadership cemented with the volume shipment of 1α (1-alpha) in 2021. The node delivers around a 15% power efficiency improvement and more than a 35% bit density improvement with a 16Gb per die capacity. "The launch of our 1-beta DRAM signals yet another leap forward for memory innovation, brought to life by our proprietary multi-patterning lithography in combination with leading-edge process technology and advanced materials capabilities," said Scott DeBoer, executive vice president of technology and products at Micron. "In delivering the world's most advanced DRAM technology with more bits per memory wafer than ever before, this node lays the foundation to usher in a new generation of data-rich, intelligent and energy-efficient technologies from the edge to the cloud."
This milestone also follows quickly on the heels of Micron's shipment of the world's first 232-layer NAND in July, architected to drive unprecedented performance and areal density for storage. With these new firsts, Micron continues to set the pace for the market across memory and storage innovations—both made possible by the company's deep roots in cutting-edge research and development (R&D) and manufacturing process technology.

With sampling of LPDDR5X, the mobile ecosystem will be the first to reap the benefits of 1β DRAM's significant gains, which will unlock next-generation mobile innovation and advanced smartphone experiences—all while consuming less power. With 1β's speed and density, high-bandwidth use cases will be responsive and smooth during downloads, launches and simultaneous use of data-hungry 5G and artificial intelligence (AI) applications. Additionally, 1β-based LPDDR5X will not only enhance smartphone camera launch, night mode and portrait mode with speed and clarity, but it will enable shake-free, high-resolution 8K video recording and intuitive in-phone video editing.

The low power per bit consumption of 1β process technology delivers the most power-efficient memory technology on the market for smartphones yet. This allows smartphone manufacturers to design devices with more efficient battery life—crucial as consumers look to prolong their batteries while using energy-draining, data-intensive apps.

The power savings are also enabled by the implementation of new JEDEC enhanced dynamic voltage and frequency scaling extensions core (eDVFSC) techniques on this 1β-based LPDDR5X. The addition of eDVFSC at a doubled frequency tier of up to 3,200 megabits per second provides improved power savings controls to enable more efficient use of power based off unique end user patterns.

Micron challenges laws of physics with sophisticated lithography and nanomanufacturing
Micron's industry-first 1β node allows higher memory capacity in a smaller footprint—enabling lower cost per bit of data. DRAM scaling has largely been defined by this ability to deliver more and faster memory per square millimeter of semiconductor area, which requires shrinking the circuits to fit billions of memory cells on a chip roughly the size of a fingernail. With each process node, the semiconductor industry has been shrinking devices every year or two for decades; however, as chips have grown smaller, defining circuit patterns on wafers requires challenging the laws of physics.

While the industry has begun to shift to a new tool that uses extreme ultraviolet light to overcome these technical challenges, Micron has tapped into its proven leading-edge nanomanufacturing and lithography prowess to bypass this still emergent technology. Doing so involves applying the company's proprietary, advanced multi-patterning techniques and immersion capabilities to pattern these miniscule features with the highest precision. The greater capacity delivered by this reduction will also enable devices with small form factors, such as smartphones and IoT devices, to fit more memory into compact footprints.

To achieve its competitive edge with 1β and 1α, Micron has also aggressively advanced its manufacturing excellence, engineering capabilities and pioneering R&D over the past several years. This accelerated innovation first enabled Micron's unprecedented ramp of its 1α node one year ahead of its competition, which established Micron's leadership in both DRAM and NAND for the first time in the company's history. Over the years, Micron has further invested billions of dollars in transforming its fabs into leading-edge, highly automated, sustainable and AI-driven facilities. This includes investments in Micron's plant in Hiroshima, Japan, which will be mass producing DRAM on 1β.

1-beta lays ubiquitous foundation for a more interconnected, sustainable world
As energy-sapping use cases like machine-to-machine communications, AI and machine learning take flight, power-efficient technologies are an increasingly critical need for businesses, especially those looking to meet strict sustainability targets and reduce operating expenses. Researchers have found that training a single AI model can emit five times the lifetime carbon emissions of an American car, including its manufacture. Further, information and communication technology is already predicted to use 20% of the world's electricity by 2030.

Micron's 1β DRAM node provides a versatile foundation to power advancements in a connected world that needs fast, ubiquitous, energy-efficient memory to fuel digitization, optimization and automation. The high-density, low-power memory fabricated on 1β enables more energy-efficient flow of data between data-hungry smart things, systems and applications, and more intelligence from edge to cloud. Over the next year, the company will begin to ramp the rest of its portfolio on 1β across embedded, data center, client, consumer, industrial and automotive segments, including graphics memory, high-bandwidth memory and more.

Source: Micron
Add your own comment

28 Comments on Micron Ships World's Most Advanced DRAM Technology With 1-Beta Node

#1
Valantar
Here's a question for you smart people out there: is it possible to make non-power of two capacity memory dice? I.e. could they make 12Gb chips instead of 8 or 16? With the types of memory capacities and bus widths we're seeing on current cards I feel like that would be extremely useful - give those mid-range GPUs 12GB of VRAM on a suitably sized 256-bit bus, without dumping the extra cost of 16GB of VRAM on them. Or 9GB with a 192-bit bus? The choice between either 1GB or 2GB VRAM chips seems like too big of a jump to me to really make sense.
Posted on Reply
#2
Wirko
ValantarHere's a question for you smart people out there: is it possible to make non-power of two capacity memory dice? I.e. could they make 12Gb chips instead of 8 or 16? With the types of memory capacities and bus widths we're seeing on current cards I feel like that would be extremely useful - give those mid-range GPUs 12GB of VRAM on a suitably sized 256-bit bus, without dumping the extra cost of 16GB of VRAM on them. Or 9GB with a 192-bit bus? The choice between either 1GB or 2GB VRAM chips seems like too big of a jump to me to really make sense.
It should not be hard, it would only be necessary to cut away one fourth of memory cells at the end of the addtess space (at design stage of course, not after manufacturing). The remaining cells would not in any way depend on those that would be missing.
Regarding the economics of doing that, though ... let's hear what other smart people have to say.
Posted on Reply
#3
Mussels
Freshwater Moderator
ValantarHere's a question for you smart people out there: is it possible to make non-power of two capacity memory dice? I.e. could they make 12Gb chips instead of 8 or 16? With the types of memory capacities and bus widths we're seeing on current cards I feel like that would be extremely useful - give those mid-range GPUs 12GB of VRAM on a suitably sized 256-bit bus, without dumping the extra cost of 16GB of VRAM on them. Or 9GB with a 192-bit bus? The choice between either 1GB or 2GB VRAM chips seems like too big of a jump to me to really make sense.
squares dont work that way
Posted on Reply
#4
Valantar
Musselssquares dont work that way
But the die shot is rectangular ;) In all seriousness though, all I can see is a segmented die with a series of repeated memory arrays. What would stand in the way of just having 25% less in one implementation?

@Wirko I think economically this could make sense, especially if one sees these as just as likely to replace a 1GB chip as a 2GB one in terms of sales - if a 9GB GPU is just a bit more expensive than a 6GB one, while not having the "less than 8GB is insufficient" stigma, while also being a bit cheaper than a 12GB one, wouldn't that be a win-win situation? The memory maker gets to sell a die that's smaller than the 2GB (=cheaper, better yields) but also more attractive and thus better priced than the 1GB variant, and they might well end up selling more units due to a better mix of perceived value and performance for the end products.
Posted on Reply
#5
Mussels
Freshwater Moderator
it's mostly because computers run on binary, two values
key being two

if you mix and match you end up with sitations like the GTX 970 with it's 3.5GB + .05GB with different speeds, or like the latency delays between the seperate caches on dual CCX ryzen chips


Theres math i dont understand so i cant explain it about the address spaces involved here, and basically they'd have to do a total re-design to make it work in other setups
Data is accessed by means of a binary number called a memory address applied to the chip's address pins, which specifies which word in the chip is to be accessed. If the memory address consists of M bits, the number of addresses on the chip is 2M, each containing an N bit word. Consequently, the amount of data stored in each chip is N2M bits.[5] The memory storage capacity for M number of address lines is given by 2M, which is usually in power of two: 2, 4, 8, 16, 32, 64, 128, 256 and 512
Even storage with its weirdness is done the same way, in SSD's and graphics cards

3090 is 12x 4GB memory modules, 384 bit memory bus for 32 bit per 8GB memory chip

It's like asking why you cant slap 9 sticks of memory into your 4 slot motherboard - they got nothing to connect to, but it's possible to run 4 3 2 or 1 sticks of memory mixing and matching their sizes - but with performance losses for asymmetrical designs since binary CPU's like binary memory layouts
Posted on Reply
#6
Wirko
Musselsit's possible to run 4 3 2 or 1 sticks of memory mixing and matching their sizes - but with performance losses for asymmetrical designs since binary CPU's like binary memory layouts
With 2 x 8GB plus 2 x 16 GB, you don't lose anything, as long as pairs with same capacity are on different channels. There's no reason why a CPU would feel better with 2^n bytes, it checks the available RAM on booting and its memory management unit (MMU) uses what's available. Even cache sizes are sometimes odder than odd.
Posted on Reply
#7
TheinsanegamerN
WirkoWith 2 x 8GB plus 2 x 16 GB, you don't lose anything, as long as pairs with same capacity are on different channels. There's no reason why a CPU would feel better with 2^n bytes, it checks the available RAM on booting and its memory management unit (MMU) uses what's available. Even cache sizes are sometimes odder than odd.
There *IS* a difference in performance. It's small, but its there, because in dual rank dual channel setups the memory controller bounces back each cycle between channel A and B, so if you have mismatched memory module syou can end up with necessary data only being available on one channel, not evenly distributed, resulting in performance misses if the controller is on the wrong clock when said data is needed.
ValantarHere's a question for you smart people out there: is it possible to make non-power of two capacity memory dice? I.e. could they make 12Gb chips instead of 8 or 16? With the types of memory capacities and bus widths we're seeing on current cards I feel like that would be extremely useful - give those mid-range GPUs 12GB of VRAM on a suitably sized 256-bit bus, without dumping the extra cost of 16GB of VRAM on them. Or 9GB with a 192-bit bus? The choice between either 1GB or 2GB VRAM chips seems like too big of a jump to me to really make sense.
It's possible, samsung ran 1.5GB and 3GB chips for awhile for their phones.
Posted on Reply
#8
Wirko
TheinsanegamerNThere *IS* a difference in performance. It's small, but its there, because in dual rank dual channel setups the memory controller bounces back each cycle between channel A and B, so if you have mismatched memory module syou can end up with necessary data only being available on one channel, not evenly distributed, resulting in performance misses if the controller is on the wrong clock when said data is needed.
That's true if you populate the slots the wrong way, 16+16 in channel A and 8+8 in channel B. Then you get partial dual-channel operation. But is it still true if you do it the right way, by putting 16+8 in channel A and 16+8 in channel B?
Posted on Reply
#9
Mussels
Freshwater Moderator
WirkoWith 2 x 8GB plus 2 x 16 GB, you don't lose anything, as long as pairs with same capacity are on different channels. There's no reason why a CPU would feel better with 2^n bytes, it checks the available RAM on booting and its memory management unit (MMU) uses what's available. Even cache sizes are sometimes odder than odd.
that's still a binary setup, because each channel has a binary amount

Just like how my 3090 has 24GB and not 32GB

You can have 32 bit memory busses with binary 1/2/4/8 etc gigabit modules at the end, but the entire total design has to made binary

You can disable some (like the 3080 does to get 10GB)
You can mix and match sized memory modules (half 4Gb half 8Gb)
you can get hardware reserved memory (IGP's stealing memory for their own use from the system, etc)

So while the total amounts can get odd, the total must always be divisible by two, because our computer systems are binary
Posted on Reply
#10
Valantar
Musselsit's mostly because computers run on binary, two values
key being two

if you mix and match you end up with sitations like the GTX 970 with it's 3.5GB + .05GB with different speeds, or like the latency delays between the seperate caches on dual CCX ryzen chips


Theres math i dont understand so i cant explain it about the address spaces involved here, and basically they'd have to do a total re-design to make it work in other setups



Even storage with its weirdness is done the same way, in SSD's and graphics cards

3090 is 12x 4GB memory modules, 384 bit memory bus for 32 bit per 8GB memory chip

It's like asking why you cant slap 9 sticks of memory into your 4 slot motherboard - they got nothing to connect to, but it's possible to run 4 3 2 or 1 sticks of memory mixing and matching their sizes - but with performance losses for asymmetrical designs since binary CPU's like binary memory layouts
Musselsthat's still a binary setup, because each channel has a binary amount

Just like how my 3090 has 24GB and not 32GB

You can have 32 bit memory busses with binary 1/2/4/8 etc gigabit modules at the end, but the entire total design has to made binary

You can disable some (like the 3080 does to get 10GB)
You can mix and match sized memory modules (half 4Gb half 8Gb)
you can get hardware reserved memory (IGP's stealing memory for their own use from the system, etc)

So while the total amounts can get odd, the total must always be divisible by two, because our computer systems are binary
This is kind of a fascinating twist of a misunderstanding - the "problem" you're describing here is the precise reason why i posited this question to begin with. Obviously these theoretical 1.5GB memory chips would still have 32-bit interfaces - otherwise they wouldn't be compatible at all. After all, both 1GB and 2GB memory chips have the same 32-bit interface.

This would mean no GTX 970-like situations - every bit of memory would be accessible at the same speed. Avoiding this is precisely the point. There would be no mixing and matching.

I guess that's the core of my question - if there's some structural thing tying the bus width to internal blocks that can only scale in integer GB/power-of-two Gb capacities, or something like that (i.e. that you could do 32-bit with 8Gb or 16Gb, but 12Gb would necessitate either 24-bit or 48-bit bus widths, or something like that). This is precisely why I'm asking, as theis is well beyond my level of knowledge - but unless there is such a restriction, it would be unproblematic to make 12Gb/1.5GB GDDR chips with 32-bit interfaces. Which would give a lot more capacity options:

Bus width8Gb/1GB capacity16Gb/2GB capacity (or 2x8Gb/channel)Theoretical 12Gb/1.5GB capacity
1284GB8GB6GB
1926GB12GB9GB
2568GB16GB12GB
32010GB20GB15GB
38412GB24GB18GB


IMO, the capacities in that last column make a lot of sense relative to the bus width and expected budget range of a GPU with that bus width - while the first column is mostly on the low side, and the second column is generally higher than what is necessary. All while avoiding any "half the VRAM is half the speed" situation or anything like that.
Posted on Reply
#11
Mussels
Freshwater Moderator
They'd have to be a combination
You'd need a three channel memory controller on a CPU, which we know has happened in the past

You're forgetting the RAM has to connect to a memory controller that needs to exist, and since the designs are binary they'll always aim for even numbers - dual and quad channel, with the tri channel nehalem being a freak of nature (designed that way due to failed yields on one of them?)
Posted on Reply
#12
Wirko
MusselsThey'd have to be a combination
You'd need a three channel memory controller on a CPU, which we know has happened in the past

You're forgetting the RAM has to connect to a memory controller that needs to exist, and since the designs are binary they'll always aim for even numbers - dual and quad channel, with the tri channel nehalem being a freak of nature (designed that way due to failed yields on one of them?)
Think about cache sizes, sometimes very weird if you're thinking in binary. Alder, 1.25 MiB L2 per E-core. Sapphire, 105 MiB L3 (1.875 per core). The width of internal data buses is still 2^n bits, like 512 bits.

Also, when you say even numbers you probably mean powers of two, right?
Posted on Reply
#13
Valantar
MusselsThey'd have to be a combination
You'd need a three channel memory controller on a CPU, which we know has happened in the past
... what? No. Can you please take a step back and look at this again? 'Cause what you're saying here is fundamentally missing the point here.

GPUs have any number of 32-bit memory controllers, each connecting to 1 or 2 memory chips. These combine into n-bit memory buses, with n/32*[memory chip capacity] total VRAM size.

What I'm asking is if you could make memory chips with a different capacity on the same bus width. No change in memory controller layout required - just as you currently have either 1GB or 2GB VRAM chips (like the 24GB RTX 3090 having 24 1GB chips, on the front and back of the card, while the RTX 3090 Ti has 12 2GB chips, only on the front of the card). My question is whether it would be possible to make 1.5GB (still 32-bit bus) memory chips for in-between capacities.

If this is possible, there wouldn't need to be any "combination" of anything. It'd be identical to current layouts except for the capacity of each memory die. There'd be no need to mix die sizes (like the GTX 970, or Xbox Series X) - avoiding that is precisely the point. With 1.5GB memory chips, you'd get an in-between memory amount at full bandwidth for the entire capacity.
MusselsYou're forgetting the RAM has to connect to a memory controller that needs to exist, and since the designs are binary they'll always aim for even numbers - dual and quad channel, with the tri channel nehalem being a freak of nature (designed that way due to failed yields on one of them?)
... did you read the post above at all? Did you see the literal table I made with different bus widths - i.e. multiples of 32-bit GDDR controllers? A 128-bit VRAM bus is a 4-channel VRAM layout. 256-bit is 8-channel. And so on.

Based on those common and existing bus widths, I showed how 1.5GB memory chips could help keeping VRAM sizes better suited to GPU power while avoiding mixed die capacity layouts like the GTX 970 (which kills performance for some proportion of the RAM).

It seems that what I'm trying to ask here is entirely bypassing your brain for some reason, as you're not responding to what I'm saying at all.
Posted on Reply
#14
Wirko
They would have to make dies with 3 x 2^n cells. For example, 48k rows instead of 64k rows, and same number of columns as usual. Also same number of banks, ranks, dies per packages and packages per module as usual.
Posted on Reply
#15
Valantar
WirkoThey would have to make dies with 3 x 2^n cells. For example, 48k rows instead of 64k rows, and same number of columns as usual. Also same number of banks, ranks, dies per packages and packages per module as usual.
That's the type of info I was looking for! Is there any reason why this wouldn't work, do you think?
Posted on Reply
#16
Wirko
ValantarThat's the type of info I was looking for! Is there any reason why this wouldn't work, do you think?
I'm quite sure it would work without issues with any capacity, as long as it's a reasonably round number in binary notation, so the MMU can handle that. Like, for example, multiples of 16 megabytes.

Making it come true would not be trivial but not hard either. Each DRAM package contains several (2^n) stacked dies/dice, each of them divided up in several (2^n) banks. All banks in all dies work in parallel to make slow memory fast (for sequential transfers at least). A 256KB chunk of memory that a CPU sees as contiguous space is therefore scattered across all banks on all dies on all packages on both DIMMs (1 DIMM/channel). You can't simply assemble a package with 6 dies instead of 8, you'd have holes in memory space. What you need is a different die with 1/4 fewer rows in each bank. Note that I'm talking about DDRx here, I am just vaguely familiar with GDDRx but the organisation of memory can't be totally different.

Heck, my very first computer had 48 KB of RAM. It needed chips of two different capacities for that. This one.
Posted on Reply
#17
Mussels
Freshwater Moderator
You'd just need the simple task of designing your memory controllers entirely from scratch to support these new fancy combinations and make sure modern operating systems support them

Trivial stuff
Posted on Reply
#18
Valantar
MusselsYou'd just need the simple task of designing your memory controllers entirely from scratch to support these new fancy combinations and make sure modern operating systems support them

Trivial stuff
Why? How would the memory controllers need to be redesigned to support these? If you know something here, maybe explain it? 'Cause all we know so far is that we currently have VRAM in 8Gb and 16Gb packages working on the same controllers, seeming to indicate that they have no problem addressing different amounts per channel. And of course historically we've had all kinds of VRAM packages - 4Gb, 2Gb, 1Gb, etc.

Regular DRAM controllers have no problems running, say, dual channel setups with unmatched DIMMs, 4+8GB/channel for example. And that's even across separate pcbs! Why would a GDDR controller need significant changes to support a 1536MB package vs a 1024 or 2048MB one?
Posted on Reply
#19
Mussels
Freshwater Moderator
Because they have to be designed exactly to support what they need and nothing more


You dont see memory controllers in a CPU supporting DDR5/4/3/2/1 in a single CPU for the same reason - it costs money in the form of space on the die
Posted on Reply
#20
Valantar
MusselsBecause they have to be designed exactly to support what they need and nothing more


You dont see memory controllers in a CPU supporting DDR5/4/3/2/1 in a single CPU for the same reason - it costs money in the form of space on the die
... But supporting a different memory standard is entirely different from supporting a different capacity of the same standard. Different standards use different signalling and signal processing, requiring bespoke hardware for each, or integrated hardware capable of supporting many different operating modes. Addressing 25% less RAM per channel should require no such changes, perhaps outside of some firmware tweaks for the controller to recognize the new capacity. Current 8Gb and 16Gb packages are both addressed in the same way over the same 32-bit bus after all, so why wouldn't that work for a 12Gb package?
Posted on Reply
#21
Wirko
MusselsYou'd just need the simple task of designing your memory controllers entirely from scratch to support these new fancy combinations and make sure modern operating systems support them

Trivial stuff
At least the operating systems don't have issues with that. I can set available RAM to 11157 MB for my Windows 10 install in VirtualBox and it will run happily.
Posted on Reply
#23
Valantar
The_EnigmaLol. All this arguing about how it can or cant work. Meanwhile, they are already being made

www.servethehome.com/sk-hynix-shows-non-binary-ddr5-capacities-at-intel-innovation-2022/
That's definitely ingeresting, even if the topology between DDR and GDDR is quite different. But as mentioned above, your average DDR memory controller has no problem running, say, 4+8GB per channel, so there's no reason why that should be a problem on the DDR side at least. Cool to see those being made on the DDR side at least!
Posted on Reply
#24
Mussels
Freshwater Moderator
WirkoAt least the operating systems don't have issues with that. I can set available RAM to 11157 MB for my Windows 10 install in VirtualBox and it will run happily.
Because the primary OS does that mapping

The non-binary DDR5 is because DDR5 is two 32 bit RAM modules together (hence the quad-channel reading in CPU-Z)

That means they can mix 16+32 and dual/quad channel still works as long as you have binary numbers (2/4/8) of each stick

2x 16+32 is fine

1x 16+32 and 1x 32+64 would not work in dual channel
Posted on Reply
#25
Valantar
MusselsThe non-binary DDR5 is because DDR5 is two 32 bit RAM modules together (hence the quad-channel reading in CPU-Z)
But DDR4 also has no problems running two different sized kits in dual channel.

Also, DDR5 sub-channels are AFAIK teamed together on a low enough level that describing them as sub-channels is more accurate than treating them as separate channels.
Mussels2x 16+32 is fine
If so, then why would a GDDR controller not be fine with a similar "1.5x power of two" amount of memory per channel? Even on DDR5 it's not like that would split into 16GB on one sub-channel and 32 on the other - both DIMMs would be split evenly.
Posted on Reply
Add your own comment
Dec 22nd, 2024 10:34 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts