MemryX Announces Production Availability of the MX3 Edge AI Accelerator

btarunr · Oct 3, 2024

MemryX, a leading innovator in AI accelerators for Edge applications, today announced production availability of MX3 Edge AI Accelerator. Available today as both as a standalone chip and a 4-chip 2280 M.2 Module, the MX3 offers industry leading ease of use, best-in-class performance, and high energy efficiency. "The MemryX team has decades of experience in bringing high quality and high volume production silicon and software to market," said Keith Kressin, CEO of MemryX. "After many months of rigorous testing, we are very excited to announce we have reached the production milestone of our Edge AI Accelerator. We have tested the quality, performance, latency, and accuracy on 1000s of AI models and are confident customers will be very pleased when choosing MemryX for Edge AI applications"

MemryX listened to customers who have often been frustrated with other Edge AI solutions, before bringing to market the transformative MX3 solution. Key advantages of the MemryX solution include:

High FPS - MemryX dataflow and at-memory computing architecture excels at pipelined operation. For example, a single low power MemryX M.2 card can continuously run one or more AI models on 10s of incoming camera streams, which is a game changer for edge applications such as Video Management Systems.
High model accuracy with just 1-click - MemryX automated tools can compile and execute 1000s of AI models with high accuracy with just 1-click. MX3 uses floating-point activations, and the compilation process maintains AI models as trained. This means customers using MX3 do not need to retrain models or use pilot images during feature map quantization to increase accuracy or approximate operators unsupported in silicon.
No Model Zoo or hidden model changes—MemryX does not use or require a model zoo, where customer models are modified to fit the target hardware. Instead, original models remain fully intact when compiled and run on MX3. Of course, a customer always has the option to prune, compress, or distill a given model to make desired design trade-offs. But MemryX does not require model changes for efficient and high utilization of hardware.
Automated Pre/Post processing - While an AI processor is used for AI workloads, many models contain code designed for CPU pre and post processing. Programmers must then figure out pre & post processing code themselves. Instead, MemryX automatically identifies and packages this code, helping the programmer accelerate application deployment using autocropping.
Scalability - A single MX3 can be used, or can be combined with additional MX3 chips, all seamlessly acting as one logical unit connected to the host. This means MX3 based configurations could scale from a single chip supporting AI in an advanced smart camera to a 4-chip Edge PC, to an 8 or even 16-chip Edge Server application, all using the exact same software and host interface without added any hardware such as PCIe switches.
Low Power - Each MX3 uses 0.5-2.0 W, depending on the demands of the AI model and system settings. This enables the MX3 to offer high performance AI computing even in fanless devices for uses such as industrial PCs. An entire 4-chip M.2 module uses less than 1/10th the power of leading mainstream GPUs, while at the same time providing higher Edge AI performance.
Broad support - MemryX supports a broad set of x86, ARM, and RISC-V platforms out of the box using multiple OS.

MemryX has been sampling a variety of customers for months. Customers applications in development include retail, security, agriculture, auto, robotics and more. With high ease of use and scalability, along with industrial temperature specs, MemryX is poised to become the top choice for customers looking for accelerated Edge AI processing.

"The MX3 enables ASUS to deliver advanced AI analytics at the edge with reduced computing requirements, empowering real-time AI inference for our customers by simply adding the MX3 to their existing IPC devices," said Jessy Li, ASUS IoT Solution Director.

"DYNICS has integrated the MemryX MX3 module into our AI-driven platform and the results are phenomenal. The MX3 provides the computing power we need to run our most demanding AI models in real-time, with minimal power usage, enabling us to deploy AI at scale solving specific industrial opportunities," said Ed Gatt, CEO of DYNICS.

Today, MX3 based M.2 modules can be purchased through WPG Americas. Later in Q4, additional distributors in North America and abroad will provide MemryX solutions. Also, later in Q4 2024, MemryX will provide a public developer hub with open source software showcasing 100s of models and end applications.

View at TechPowerUp Main Site

HBSound · Oct 3, 2024

Thinking out loud - I need to read this over thoroughly. Seeking an answer: how does this work in a workstation? Are there any pros/cons? OR incorrect application?

cal5582 · Oct 3, 2024

this says nothing about the performance

Canned Noodles · Oct 3, 2024

Very cool use of an M.2 slot. The price from WPG Americas is $212 (here) but as cal5582 stated, there’s no performance numbers so who’s to say if it’s a good value?

cal5582 · Oct 3, 2024

Canned Noodles said:
Very cool use of an M.2 slot. The price from WPG Americas is $212 (here) but as cal5582 stated, there’s no performance numbers so who’s to say if it’s a good value?

yeah theres also things like the Hailo-8 and you know those have 26 tops for instance.

Wirko · Oct 3, 2024

What about memory? No memory chips can be seen, it might be able to use system RAM using HMB (the same method as SSDs use) but they don't tell... the only thing they tell is "~10M parameters stored on-die", presumably on all four dies combined.

mxdev · Oct 3, 2024

Hey, MemryX engineer here! I can address some of the comments:

This is just an announcement that production version is out for our existing customers. In a couple weeks we'll have more volume for direct purchase on WPGA plus a couple other distributors. Also, I'm pretty sure the price will decrease as volume ramps up, since the current inventory is from a small batch.
The SDK & documentation will be open-sourced around the same time, and our docs will include benchmarks across a few hundred models. We don't maintain a "zoo" of tuned models, but instead download -> compile -> run out-of-the-box and update the perf. table. If somebody did want to prune & tune the model (retrain), they could do it and get higher performance but it's not necessary.
MX3 out-of-the-box FPS is typically faster than Hailo-8's tuned model zoo FPS. But tuned vs. tuned or OOTB vs. OOTB, the MX3 is much faster. Admittedly we're still early in creating tuned versions of models, since our focus is on running models as-is.
It's important to note that TOPS is simply the # of MAC units * frequency. True performance comparisons need to consider the utilization of those MAC units when running a real model, not a theoretical max. A 10-TOPS chip at 50% utilization can have higher FPS than a 40-TOPS chip at 10% utilization. Comparing max TOPS alone between different architectures doesn't equate to final performance.
It's 10M weights per chip, 40M total for the M.2 at default precision. Half-precision can go up to 80M for the M.2 but that needs accuracy considerations. Being focused on computer vision (think YOLOs) and not LLMs, the # of weights should give reasonably good coverage. In a future SDK release we'll have ways to run bigger models by swapping chunks to/from the host's memory, but it is always better to run on-chip only.

Canned Noodles · Oct 4, 2024

mxdev said:
Hey, MemryX engineer here! I can address some of the comments:

This is just an announcement that production version is out for our existing customers. In a couple weeks we'll have more volume for direct purchase on WPGA plus a couple other distributors. Also, I'm pretty sure the price will decrease as volume ramps up, since the current inventory is from a small batch.

The SDK & documentation will be open-sourced around the same time, and our docs will include benchmarks across a few hundred models. We don't maintain a "zoo" of tuned models, but instead download -> compile -> run out-of-the-box and update the perf. table. If somebody did want to prune & tune the model (retrain), they could do it and get higher performance but it's not necessary.

MX3 out-of-the-box FPS is typically faster than Hailo-8's tuned model zoo FPS. But tuned vs. tuned or OOTB vs. OOTB, the MX3 is much faster. Admittedly we're still early in creating tuned versions of models, since our focus is on running models as-is.

It's important to note that TOPS is simply the # of MAC units * frequency. True performance comparisons need to consider the utilization of those MAC units when running a real model, not a theoretical max. A 10-TOPS chip at 50% utilization can have higher FPS than a 40-TOPS chip at 10% utilization. Comparing max TOPS alone between different architectures doesn't equate to final performance.

It's 10M weights per chip, 40M total for the M.2 at default precision. Half-precision can go up to 80M for the M.2 but that needs accuracy considerations. Being focused on computer vision (think YOLOs) and not LLMs, the # of weights should give reasonably good coverage. In a future SDK release we'll have ways to run bigger models by swapping chunks to/from the host's memory, but it is always better to run on-chip only.

Thank you for sharing this info :toast:

HBSound · Oct 4, 2024

I read over the page, and I am trying to figure out if something like this is beneficial. How does this help a system? No software is needed, so how does it integrate into the system?

mxdev said:
Hey, MemryX engineer here! I can address some of the comments:

This is just an announcement that production version is out for our existing customers. In a couple weeks we'll have more volume for direct purchase on WPGA plus a couple other distributors. Also, I'm pretty sure the price will decrease as volume ramps up, since the current inventory is from a small batch.

The SDK & documentation will be open-sourced around the same time, and our docs will include benchmarks across a few hundred models. We don't maintain a "zoo" of tuned models, but instead download -> compile -> run out-of-the-box and update the perf. table. If somebody did want to prune & tune the model (retrain), they could do it and get higher performance but it's not necessary.

MX3 out-of-the-box FPS is typically faster than Hailo-8's tuned model zoo FPS. But tuned vs. tuned or OOTB vs. OOTB, the MX3 is much faster. Admittedly we're still early in creating tuned versions of models, since our focus is on running models as-is.

It's important to note that TOPS is simply the # of MAC units * frequency. True performance comparisons need to consider the utilization of those MAC units when running a real model, not a theoretical max. A 10-TOPS chip at 50% utilization can have higher FPS than a 40-TOPS chip at 10% utilization. Comparing max TOPS alone between different architectures doesn't equate to final performance.

It's 10M weights per chip, 40M total for the M.2 at default precision. Half-precision can go up to 80M for the M.2 but that needs accuracy considerations. Being focused on computer vision (think YOLOs) and not LLMs, the # of weights should give reasonably good coverage. In a future SDK release we'll have ways to run bigger models by swapping chunks to/from the host's memory, but it is always better to run on-chip only.

Thanks for the info.

Do any of your tuned models have anything to do with enhancing for CAD/3D/BIM/Rendering?

cal5582 · Oct 4, 2024

HBSound said:
I read over the page, and I am trying to figure out if something like this is beneficial. How does this help a system? No software is needed, so how does it integrate into the system?

Thanks for the info.

Do any of your tuned models have anything to do with enhancing for CAD/3D/BIM/Rendering?

man between that and rigging/skinning models it seems like that entire industry got neglected

mxdev · Oct 4, 2024

HBSound said:
I read over the page, and I am trying to figure out if something like this is beneficial. How does this help a system? No software is needed, so how does it integrate into the system?

Thanks for the info.

Do any of your tuned models have anything to do with enhancing for CAD/3D/BIM/Rendering?

Ah sorry, by models I meant neural network models, not CAD, etc. If there's a neural net that accelerates/improves these applications though, they might be possible -- integration of our runtime libraries into the applications would likely need us to work with the software vendor though.

The MX3 is designed for real-time computer vision, so applications we could be used in includes AI security cameras, factories (defect inspection, etc.), "smart city" (traffic intersection monitoring, license plate readers, etc.), and robotics (self-driving, depth estimation / navigation, etc.). On desktop PCs we can do stuff like video super-resolution or video call filters, or there was a 3D avatar thing Lenovo did at CES 2024 that used the MX3. Not sure on the category for this one, but I've used an Orange Pi + MX3 + XREAL Air to make some homebrew AR glasses too.

So basically our main areas are secuity, industrial, and robotics.

HBSound · Oct 4, 2024

Thank you!

Rob_mc_1 · Oct 6, 2024

mxdev said:
Ah sorry, by models I meant neural network models, not CAD, etc. If there's a neural net that accelerates/improves these applications though, they might be possible -- integration of our runtime libraries into the applications would likely need us to work with the software vendor though.

The MX3 is designed for real-time computer vision, so applications we could be used in includes AI security cameras, factories (defect inspection, etc.), "smart city" (traffic intersection monitoring, license plate readers, etc.), and robotics (self-driving, depth estimation / navigation, etc.). On desktop PCs we can do stuff like video super-resolution or video call filters, or there was a 3D avatar thing Lenovo did at CES 2024 that used the MX3. Not sure on the category for this one, but I've used an Orange Pi + MX3 + XREAL Air to make some homebrew AR glasses too.

So basically our main areas are secuity, industrial, and robotics.

Would this work in Windows 11 to enable Copilot+ features? Instead of buy a who new PC?

System Name	RBMK-1000
Processor	AMD Ryzen 7 5700G
Motherboard	ASUS ROG Strix B450-E Gaming
Cooling	DeepCool Gammax L240 V2
Memory	2x 8GB G.Skill Sniper X
Video Card(s)	Palit GeForce RTX 2080 SUPER GameRock
Storage	Western Digital Black NVMe 512GB
Display(s)	BenQ 1440p 60 Hz 27-inch
Case	Corsair Carbide 100R
Audio Device(s)	ASUS SupremeFX S1220A
Power Supply	Cooler Master MWE Gold 650W
Mouse	ASUS ROG Strix Impact
Keyboard	Gamdias Hermes E2
Software	Windows 11 Pro

System Name	Nirn
Processor	Amd Ryzen 7950X3D
Motherboard	MSI MEG ACE X670e
Cooling	Noctua NH-D15
Memory	128 GB Kingston DDR5 6000 (running at 4000)
Video Card(s)	Radeon RX 7900XTX (24G) + Geforce 4070ti (12G) Physx
Storage	SAMSUNG 990 EVO SSD 2TB Gen 5 x2 (OS)+SAMSUNG 980 SSD 1TB PCle 3.0x4 (Primocache) +2X 22TB WD Gold
Display(s)	Samsung UN55NU8000 (Freesync)
Case	Corsair Graphite Series 780T White
Audio Device(s)	Creative Soundblaster AE-7 + Sennheiser GSP600
Power Supply	Seasonic PRIME TX-1000 Titanium
Mouse	Razer Mamba Elite Wired
Keyboard	Razer BlackWidow Chroma v1
VR HMD	Oculus Quest 2
Software	Windows 10

Processor	i9-11900K
Motherboard	Asus ROG Maximus XIII Hero
Cooling	Arctic Liquid Freezer II 360
Memory	4x8GB DDR4
Video Card(s)	Alienware RTX 3090 OEM
Storage	OEM Kioxia 2tb NVMe (OS), 4TB WD Blue HDD (games)
Display(s)	LG 27GN950-B
Case	Lian Li Lancool II Mesh Performance (black)
Audio Device(s)	Logitech Pro X Wireless
Power Supply	Corsair RM1000x
Keyboard	HyperX Alloy Elite 2

System Name	Nirn
Processor	Amd Ryzen 7950X3D
Motherboard	MSI MEG ACE X670e
Cooling	Noctua NH-D15
Memory	128 GB Kingston DDR5 6000 (running at 4000)
Video Card(s)	Radeon RX 7900XTX (24G) + Geforce 4070ti (12G) Physx
Storage	SAMSUNG 990 EVO SSD 2TB Gen 5 x2 (OS)+SAMSUNG 980 SSD 1TB PCle 3.0x4 (Primocache) +2X 22TB WD Gold
Display(s)	Samsung UN55NU8000 (Freesync)
Case	Corsair Graphite Series 780T White
Audio Device(s)	Creative Soundblaster AE-7 + Sennheiser GSP600
Power Supply	Seasonic PRIME TX-1000 Titanium
Mouse	Razer Mamba Elite Wired
Keyboard	Razer BlackWidow Chroma v1
VR HMD	Oculus Quest 2
Software	Windows 10

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

MemryX Announces Production Availability of the MX3 Edge AI Accelerator

btarunr

Editor & Senior Moderator

HBSound

cal5582

Canned Noodles

cal5582

Wirko

mxdev

New Member

Canned Noodles

HBSound

cal5582

mxdev

New Member

HBSound

Rob_mc_1

New Member