• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

MemryX Announces Production Availability of the MX3 Edge AI Accelerator

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,343 (7.51/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
MemryX, a leading innovator in AI accelerators for Edge applications, today announced production availability of MX3 Edge AI Accelerator. Available today as both as a standalone chip and a 4-chip 2280 M.2 Module, the MX3 offers industry leading ease of use, best-in-class performance, and high energy efficiency. "The MemryX team has decades of experience in bringing high quality and high volume production silicon and software to market," said Keith Kressin, CEO of MemryX. "After many months of rigorous testing, we are very excited to announce we have reached the production milestone of our Edge AI Accelerator. We have tested the quality, performance, latency, and accuracy on 1000s of AI models and are confident customers will be very pleased when choosing MemryX for Edge AI applications"



MemryX listened to customers who have often been frustrated with other Edge AI solutions, before bringing to market the transformative MX3 solution. Key advantages of the MemryX solution include:
  • High FPS - MemryX dataflow and at-memory computing architecture excels at pipelined operation. For example, a single low power MemryX M.2 card can continuously run one or more AI models on 10s of incoming camera streams, which is a game changer for edge applications such as Video Management Systems.
  • High model accuracy with just 1-click - MemryX automated tools can compile and execute 1000s of AI models with high accuracy with just 1-click. MX3 uses floating-point activations, and the compilation process maintains AI models as trained. This means customers using MX3 do not need to retrain models or use pilot images during feature map quantization to increase accuracy or approximate operators unsupported in silicon.
  • No Model Zoo or hidden model changes—MemryX does not use or require a model zoo, where customer models are modified to fit the target hardware. Instead, original models remain fully intact when compiled and run on MX3. Of course, a customer always has the option to prune, compress, or distill a given model to make desired design trade-offs. But MemryX does not require model changes for efficient and high utilization of hardware.
  • Automated Pre/Post processing - While an AI processor is used for AI workloads, many models contain code designed for CPU pre and post processing. Programmers must then figure out pre & post processing code themselves. Instead, MemryX automatically identifies and packages this code, helping the programmer accelerate application deployment using autocropping.
  • Scalability - A single MX3 can be used, or can be combined with additional MX3 chips, all seamlessly acting as one logical unit connected to the host. This means MX3 based configurations could scale from a single chip supporting AI in an advanced smart camera to a 4-chip Edge PC, to an 8 or even 16-chip Edge Server application, all using the exact same software and host interface without added any hardware such as PCIe switches.
  • Low Power - Each MX3 uses 0.5-2.0 W, depending on the demands of the AI model and system settings. This enables the MX3 to offer high performance AI computing even in fanless devices for uses such as industrial PCs. An entire 4-chip M.2 module uses less than 1/10th the power of leading mainstream GPUs, while at the same time providing higher Edge AI performance.
  • Broad support - MemryX supports a broad set of x86, ARM, and RISC-V platforms out of the box using multiple OS.
MemryX has been sampling a variety of customers for months. Customers applications in development include retail, security, agriculture, auto, robotics and more. With high ease of use and scalability, along with industrial temperature specs, MemryX is poised to become the top choice for customers looking for accelerated Edge AI processing.

"The MX3 enables ASUS to deliver advanced AI analytics at the edge with reduced computing requirements, empowering real-time AI inference for our customers by simply adding the MX3 to their existing IPC devices," said Jessy Li, ASUS IoT Solution Director.

"DYNICS has integrated the MemryX MX3 module into our AI-driven platform and the results are phenomenal. The MX3 provides the computing power we need to run our most demanding AI models in real-time, with minimal power usage, enabling us to deploy AI at scale solving specific industrial opportunities," said Ed Gatt, CEO of DYNICS.

Today, MX3 based M.2 modules can be purchased through WPG Americas. Later in Q4, additional distributors in North America and abroad will provide MemryX solutions. Also, later in Q4 2024, MemryX will provide a public developer hub with open source software showcasing 100s of models and end applications.

View at TechPowerUp Main Site
 
Joined
Jul 16, 2022
Messages
626 (0.69/day)
Thinking out loud - I need to read this over thoroughly. Seeking an answer: how does this work in a workstation? Are there any pros/cons? OR incorrect application?
 
Joined
Oct 17, 2021
Messages
87 (0.07/day)
System Name Nirn
Processor Amd Ryzen 7950X3D
Motherboard MSI MEG ACE X670e
Cooling Noctua NH-D15
Memory 128 GB Kingston DDR5 6000 (running at 4000)
Video Card(s) Radeon RX 7900XTX (24G) + Geforce 4070ti (12G) Physx
Storage SAMSUNG 990 EVO SSD 2TB Gen 5 x2 (OS)+SAMSUNG 980 SSD 1TB PCle 3.0x4 (Primocache) +2X 22TB WD Gold
Display(s) Samsung UN55NU8000 (Freesync)
Case Corsair Graphite Series 780T White
Audio Device(s) Creative Soundblaster AE-7 + Sennheiser GSP600
Power Supply Seasonic PRIME TX-1000 Titanium
Mouse Razer Mamba Elite Wired
Keyboard Razer BlackWidow Chroma v1
VR HMD Oculus Quest 2
Software Windows 10
this says nothing about the performance
 
Joined
Jul 8, 2022
Messages
263 (0.29/day)
Location
USA
Processor i9-11900K
Motherboard Asus ROG Maximus XIII Hero
Cooling Arctic Liquid Freezer II 360
Memory 4x8GB DDR4
Video Card(s) Alienware RTX 3090 OEM
Storage OEM Kioxia 2tb NVMe (OS), 4TB WD Blue HDD (games)
Display(s) LG 27GN950-B
Case Lian Li Lancool II Mesh Performance (black)
Audio Device(s) Logitech Pro X Wireless
Power Supply Corsair RM1000x
Keyboard HyperX Alloy Elite 2
Very cool use of an M.2 slot. The price from WPG Americas is $212 (here) but as cal5582 stated, there’s no performance numbers so who’s to say if it’s a good value?
 
Joined
Oct 17, 2021
Messages
87 (0.07/day)
System Name Nirn
Processor Amd Ryzen 7950X3D
Motherboard MSI MEG ACE X670e
Cooling Noctua NH-D15
Memory 128 GB Kingston DDR5 6000 (running at 4000)
Video Card(s) Radeon RX 7900XTX (24G) + Geforce 4070ti (12G) Physx
Storage SAMSUNG 990 EVO SSD 2TB Gen 5 x2 (OS)+SAMSUNG 980 SSD 1TB PCle 3.0x4 (Primocache) +2X 22TB WD Gold
Display(s) Samsung UN55NU8000 (Freesync)
Case Corsair Graphite Series 780T White
Audio Device(s) Creative Soundblaster AE-7 + Sennheiser GSP600
Power Supply Seasonic PRIME TX-1000 Titanium
Mouse Razer Mamba Elite Wired
Keyboard Razer BlackWidow Chroma v1
VR HMD Oculus Quest 2
Software Windows 10
Very cool use of an M.2 slot. The price from WPG Americas is $212 (here) but as cal5582 stated, there’s no performance numbers so who’s to say if it’s a good value?
yeah theres also things like the Hailo-8 and you know those have 26 tops for instance.
 
Joined
Jan 3, 2021
Messages
3,666 (2.50/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
What about memory? No memory chips can be seen, it might be able to use system RAM using HMB (the same method as SSDs use) but they don't tell... the only thing they tell is "~10M parameters stored on-die", presumably on all four dies combined.
 

mxdev

New Member
Joined
Oct 3, 2024
Messages
2 (0.02/day)
Hey, MemryX engineer here! I can address some of the comments:
  1. This is just an announcement that production version is out for our existing customers. In a couple weeks we'll have more volume for direct purchase on WPGA plus a couple other distributors. Also, I'm pretty sure the price will decrease as volume ramps up, since the current inventory is from a small batch.
  2. The SDK & documentation will be open-sourced around the same time, and our docs will include benchmarks across a few hundred models. We don't maintain a "zoo" of tuned models, but instead download -> compile -> run out-of-the-box and update the perf. table. If somebody did want to prune & tune the model (retrain), they could do it and get higher performance but it's not necessary.
  3. MX3 out-of-the-box FPS is typically faster than Hailo-8's tuned model zoo FPS. But tuned vs. tuned or OOTB vs. OOTB, the MX3 is much faster. Admittedly we're still early in creating tuned versions of models, since our focus is on running models as-is.
  4. It's important to note that TOPS is simply the # of MAC units * frequency. True performance comparisons need to consider the utilization of those MAC units when running a real model, not a theoretical max. A 10-TOPS chip at 50% utilization can have higher FPS than a 40-TOPS chip at 10% utilization. Comparing max TOPS alone between different architectures doesn't equate to final performance.
  5. It's 10M weights per chip, 40M total for the M.2 at default precision. Half-precision can go up to 80M for the M.2 but that needs accuracy considerations. Being focused on computer vision (think YOLOs) and not LLMs, the # of weights should give reasonably good coverage. In a future SDK release we'll have ways to run bigger models by swapping chunks to/from the host's memory, but it is always better to run on-chip only.
 
Joined
Jul 8, 2022
Messages
263 (0.29/day)
Location
USA
Processor i9-11900K
Motherboard Asus ROG Maximus XIII Hero
Cooling Arctic Liquid Freezer II 360
Memory 4x8GB DDR4
Video Card(s) Alienware RTX 3090 OEM
Storage OEM Kioxia 2tb NVMe (OS), 4TB WD Blue HDD (games)
Display(s) LG 27GN950-B
Case Lian Li Lancool II Mesh Performance (black)
Audio Device(s) Logitech Pro X Wireless
Power Supply Corsair RM1000x
Keyboard HyperX Alloy Elite 2
Hey, MemryX engineer here! I can address some of the comments:
  1. This is just an announcement that production version is out for our existing customers. In a couple weeks we'll have more volume for direct purchase on WPGA plus a couple other distributors. Also, I'm pretty sure the price will decrease as volume ramps up, since the current inventory is from a small batch.
  2. The SDK & documentation will be open-sourced around the same time, and our docs will include benchmarks across a few hundred models. We don't maintain a "zoo" of tuned models, but instead download -> compile -> run out-of-the-box and update the perf. table. If somebody did want to prune & tune the model (retrain), they could do it and get higher performance but it's not necessary.
  3. MX3 out-of-the-box FPS is typically faster than Hailo-8's tuned model zoo FPS. But tuned vs. tuned or OOTB vs. OOTB, the MX3 is much faster. Admittedly we're still early in creating tuned versions of models, since our focus is on running models as-is.
  4. It's important to note that TOPS is simply the # of MAC units * frequency. True performance comparisons need to consider the utilization of those MAC units when running a real model, not a theoretical max. A 10-TOPS chip at 50% utilization can have higher FPS than a 40-TOPS chip at 10% utilization. Comparing max TOPS alone between different architectures doesn't equate to final performance.
  5. It's 10M weights per chip, 40M total for the M.2 at default precision. Half-precision can go up to 80M for the M.2 but that needs accuracy considerations. Being focused on computer vision (think YOLOs) and not LLMs, the # of weights should give reasonably good coverage. In a future SDK release we'll have ways to run bigger models by swapping chunks to/from the host's memory, but it is always better to run on-chip only.
Thank you for sharing this info :toast:
 
Joined
Jul 16, 2022
Messages
626 (0.69/day)
I read over the page, and I am trying to figure out if something like this is beneficial. How does this help a system? No software is needed, so how does it integrate into the system?
Hey, MemryX engineer here! I can address some of the comments:
  1. This is just an announcement that production version is out for our existing customers. In a couple weeks we'll have more volume for direct purchase on WPGA plus a couple other distributors. Also, I'm pretty sure the price will decrease as volume ramps up, since the current inventory is from a small batch.
  2. The SDK & documentation will be open-sourced around the same time, and our docs will include benchmarks across a few hundred models. We don't maintain a "zoo" of tuned models, but instead download -> compile -> run out-of-the-box and update the perf. table. If somebody did want to prune & tune the model (retrain), they could do it and get higher performance but it's not necessary.
  3. MX3 out-of-the-box FPS is typically faster than Hailo-8's tuned model zoo FPS. But tuned vs. tuned or OOTB vs. OOTB, the MX3 is much faster. Admittedly we're still early in creating tuned versions of models, since our focus is on running models as-is.
  4. It's important to note that TOPS is simply the # of MAC units * frequency. True performance comparisons need to consider the utilization of those MAC units when running a real model, not a theoretical max. A 10-TOPS chip at 50% utilization can have higher FPS than a 40-TOPS chip at 10% utilization. Comparing max TOPS alone between different architectures doesn't equate to final performance.
  5. It's 10M weights per chip, 40M total for the M.2 at default precision. Half-precision can go up to 80M for the M.2 but that needs accuracy considerations. Being focused on computer vision (think YOLOs) and not LLMs, the # of weights should give reasonably good coverage. In a future SDK release we'll have ways to run bigger models by swapping chunks to/from the host's memory, but it is always better to run on-chip only.

Thanks for the info.

Do any of your tuned models have anything to do with enhancing for CAD/3D/BIM/Rendering?
 
Joined
Oct 17, 2021
Messages
87 (0.07/day)
System Name Nirn
Processor Amd Ryzen 7950X3D
Motherboard MSI MEG ACE X670e
Cooling Noctua NH-D15
Memory 128 GB Kingston DDR5 6000 (running at 4000)
Video Card(s) Radeon RX 7900XTX (24G) + Geforce 4070ti (12G) Physx
Storage SAMSUNG 990 EVO SSD 2TB Gen 5 x2 (OS)+SAMSUNG 980 SSD 1TB PCle 3.0x4 (Primocache) +2X 22TB WD Gold
Display(s) Samsung UN55NU8000 (Freesync)
Case Corsair Graphite Series 780T White
Audio Device(s) Creative Soundblaster AE-7 + Sennheiser GSP600
Power Supply Seasonic PRIME TX-1000 Titanium
Mouse Razer Mamba Elite Wired
Keyboard Razer BlackWidow Chroma v1
VR HMD Oculus Quest 2
Software Windows 10
I read over the page, and I am trying to figure out if something like this is beneficial. How does this help a system? No software is needed, so how does it integrate into the system?


Thanks for the info.

Do any of your tuned models have anything to do with enhancing for CAD/3D/BIM/Rendering?
man between that and rigging/skinning models it seems like that entire industry got neglected
 

mxdev

New Member
Joined
Oct 3, 2024
Messages
2 (0.02/day)
I read over the page, and I am trying to figure out if something like this is beneficial. How does this help a system? No software is needed, so how does it integrate into the system?


Thanks for the info.

Do any of your tuned models have anything to do with enhancing for CAD/3D/BIM/Rendering?

Ah sorry, by models I meant neural network models, not CAD, etc. If there's a neural net that accelerates/improves these applications though, they might be possible -- integration of our runtime libraries into the applications would likely need us to work with the software vendor though.

The MX3 is designed for real-time computer vision, so applications we could be used in includes AI security cameras, factories (defect inspection, etc.), "smart city" (traffic intersection monitoring, license plate readers, etc.), and robotics (self-driving, depth estimation / navigation, etc.). On desktop PCs we can do stuff like video super-resolution or video call filters, or there was a 3D avatar thing Lenovo did at CES 2024 that used the MX3. Not sure on the category for this one, but I've used an Orange Pi + MX3 + XREAL Air to make some homebrew AR glasses too.

So basically our main areas are secuity, industrial, and robotics.
 

Rob_mc_1

New Member
Joined
Dec 22, 2021
Messages
2 (0.00/day)
Ah sorry, by models I meant neural network models, not CAD, etc. If there's a neural net that accelerates/improves these applications though, they might be possible -- integration of our runtime libraries into the applications would likely need us to work with the software vendor though.

The MX3 is designed for real-time computer vision, so applications we could be used in includes AI security cameras, factories (defect inspection, etc.), "smart city" (traffic intersection monitoring, license plate readers, etc.), and robotics (self-driving, depth estimation / navigation, etc.). On desktop PCs we can do stuff like video super-resolution or video call filters, or there was a 3D avatar thing Lenovo did at CES 2024 that used the MX3. Not sure on the category for this one, but I've used an Orange Pi + MX3 + XREAL Air to make some homebrew AR glasses too.

So basically our main areas are secuity, industrial, and robotics.
Would this work in Windows 11 to enable Copilot+ features? Instead of buy a who new PC?
 
Top