• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Samsung Brings In-memory Processing Power to Wider Range of Applications

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,233 (7.55/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
Samsung Electronics the world leader in advanced memory technology, today showcased its latest advancements with processing-in-memory (PIM) technology at Hot Chips 33—a leading semiconductor conference where the most notable microprocessor and IC innovations are unveiled each year. Samsung's revelations include the first successful integration of its PIM-enabled High Bandwidth Memory (HBM-PIM) into a commercialized accelerator system, and broadened PIM applications to embrace DRAM modules and mobile memory, in accelerating the move toward the convergence of memory and logic.

In February, Samsung introduced the industry's first HBM-PIM (Aquabolt-XL), which incorporates the AI processing function into Samsung's HBM2 Aquabolt, to enhance high-speed data processing in supercomputers and AI applications. The HBM-PIM has since been tested in the Xilinx Virtex Ultrascale+ (Alveo) AI accelerator, where it delivered an almost 2.5X system performance gain as well as more than a 60% cut in energy consumption.



"HBM-PIM is the industry's first AI-tailored memory solution being tested in customer AI-accelerator systems, demonstrating tremendous commercial potential," said Nam Sung Kim, senior vice president of DRAM Product & Technology at Samsung Electronics. "Through standardization of the technology, applications will become numerous, expanding into HBM3 for next-generation supercomputers and AI applications, and even into mobile memory for on-device AI as well as for memory modules used in data centers."

"Xilinx has been collaborating with Samsung Electronics to enable high-performance solutions for data center, networking and real-time signal processing applications starting with the Virtex UltraScale+ HBM family, and recently introduced our new and exciting Versal HBM series products," said Arun Varadarajan Rajagopal, senior director, Product Planning at Xilinx, Inc. "We are delighted to continue this collaboration with Samsung as we help to evaluate HBM-PIM systems for their potential to achieve major performance and energy-efficiency gains in AI applications."

DRAM modules powered by PIM
The Acceleration DIMM (AXDIMM) brings processing to the DRAM module itself, minimizing large data movement between the CPU and DRAM to boost the energy efficiency of AI accelerator systems. With an AI engine built inside the buffer chip, the AXDIMM can perform parallel processing of multiple memory ranks (sets of DRAM chips) instead of accessing just one rank at a time, greatly enhancing system performance and efficiency. Since the module can retain its traditional DIMM form factor, the AXDIMM facilitates drop-in replacement without requiring system modifications. Currently being tested on customer servers, the AXDIMM can offer approximately twice the performance in AI-based recommendation applications and a 40% decrease in system-wide energy usage.

"SAP has been continuously collaborating with Samsung on their new and emerging memory technologies to deliver optimal performance on SAP HANA and help database acceleration," said Oliver Rebholz, head of HANA core research & innovation at SAP. "Based on performance projections and potential integration scenarios, we expect significant performance improvements for in-memory database management system (IMDBMS) and higher energy efficiency via disaggregated computing on AXDIMM. SAP is looking to continue its collaboration with Samsung in this area."

Mobile memory that brings AI from data center to device
Samsung's LPDDR5-PIM mobile memory technology can provide independent AI capabilities without data center connectivity. Simulation tests have shown that the LPDDR5-PIM can more than double performance while reducing energy usage by over 60% when used in applications such as voice recognition, translation and chatbot.

Energizing the ecosystem
Samsung plans to expand its AI memory portfolio by working with other industry leaders to complete standardization of the PIM platform in the first half of 2022. The company will also continue to foster a highly robust PIM ecosystem in assuring wide applicability across the memory market.

View at TechPowerUp Main Site
 
Joined
Jan 15, 2012
Messages
944 (0.20/day)
Location
Slovenia
System Name PC.
Processor i7 2600K 5.0Gh,i7 3770K 5.00Gh. EK, Liqed Coooleng
Motherboard P67A-UD7-B3 Gigabyte T.,ASUS,P8Z77-V PREMIUM,MAXIMUS V EXTRIME..
Cooling Liqed Cooleng ,EK Suprime LTX Nickel,EK for Motherboard,Aqua computer (WGA), Thermaltake .... 0i,
Memory G.SKILL F3-17600CL7-2GBPISG. 16GBSkill Sniper F3-17000CL94GBSR on 2400Hz 10-12-11-29 1
Video Card(s) GTX590 ,SLI ,POV TGT best 691Hz ,LiqedCoold,GTX480.....GTX1080MSI SeaHawkEK SLI
Storage OCZ-REVODRIVE 3-240GB,2xCrucialMX100.512.R-0,1x LMT-32L3m,3x 1TB-WD,1x;1x2TbSEAGATE1x2Tb Seagate
Display(s) DELL-U2412Mb,Samsung Synkmaster245B,HP ENVY 34c
Case Thermaltake, NZXT SWITCH 810SE
Audio Device(s) CREATIVE BLASTER X-Fi Titanium HD , AUNE T1MK2 TUBE USB
Power Supply ENERMAX Platimax 1500W,Thermaltake 1500W
Mouse VIPER V560,FUNC MS-3, Prestigio, R.A.T.E.7 and 5,LogitechG502,RAZER,Inperator.,dead...a.s.o.
Keyboard Trust ....LogotechG410
Software Windows7 64....
Benchmark Scores 3DMark Fire Strike 21.385 (37.234,11.828,7.176)
WoW . Nice Bla BLLA , wish to see som disc RAM tests scores !:cool:
 
Joined
Jun 3, 2010
Messages
2,540 (0.48/day)
Joined
Aug 6, 2020
Messages
729 (0.46/day)
the closest thing released for memristors has been Intel 3d Xpoint


It has a lot lower latency than flash, but it is still worse than DRAM for access times and cell life. It's also between Flash and DRAM on density
 
Joined
Jun 3, 2010
Messages
2,540 (0.48/day)
HP has been trying to make them into a PIM computer which is different than a CPU computer in that the instruction is sent to the memory, not to the cpu.
 
Joined
Aug 6, 2020
Messages
729 (0.46/day)
HP has been trying to make them into a PIM computer which is different than a CPU computer in that the instruction is sent to the memory, not to the cpu.


it's kinda impossible to do without that Memristor - it's the only way you can get distributed compute, along with permanent storage attached.

But if 3d X Point is the future or Memristor, you're going to have ti keep separate cache RAM (Optane performance is too slow)
 
Joined
Jun 3, 2010
Messages
2,540 (0.48/day)
it's kinda impossible to do without that Memristor - it's the only way you can get distributed compute, along with permanent storage attached.

But if 3d X Point is the future or Memristor, you're going to have ti keep separate cache RAM (Optane performance is too slow)
It is a whole new ballgame. The data has no coherency issues. The programs are racing towards the data. However there might be issues with integrity since programs inherently change the operands unless they are saved for a backup first. Crazy architecture...
 
Joined
Jan 3, 2021
Messages
3,486 (2.45/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
The second illustration is immensely informative. Here's another one that reveals a little bit more:
 
Joined
Aug 6, 2020
Messages
729 (0.46/day)
It is a whole new ballgame. The data has no coherency issues. The programs are racing towards the data. However there might be issues with integrity since programs inherently change the operands unless they are saved for a backup first. Crazy architecture...


You're still going to have timing issues (there will always be a delay between accessioning different parts of your distributed compute), so you're stuck going Asynbcchronius Compute. But you will still need to figure out how to interconnect all thos data lines between DSP blocks

Coherency is a relatively tame beast, by-comparison.
 
Last edited:
Joined
Jun 3, 2010
Messages
2,540 (0.48/day)
You're dstill going to have toiming issues (there will always be a delay between accessioning different parts of your distributed compute).

Coherency is a relativly tabe beast, by-comparison.
I said it as a security problem, same as meltdown.
 
Top