Thursday, October 3rd 2024

MemryX Announces Production Availability of the MX3 Edge AI Accelerator

MemryX, a leading innovator in AI accelerators for Edge applications, today announced production availability of MX3 Edge AI Accelerator. Available today as both as a standalone chip and a 4-chip 2280 M.2 Module, the MX3 offers industry leading ease of use, best-in-class performance, and high energy efficiency. "The MemryX team has decades of experience in bringing high quality and high volume production silicon and software to market," said Keith Kressin, CEO of MemryX. "After many months of rigorous testing, we are very excited to announce we have reached the production milestone of our Edge AI Accelerator. We have tested the quality, performance, latency, and accuracy on 1000s of AI models and are confident customers will be very pleased when choosing MemryX for Edge AI applications"
MemryX listened to customers who have often been frustrated with other Edge AI solutions, before bringing to market the transformative MX3 solution. Key advantages of the MemryX solution include:
  • High FPS - MemryX dataflow and at-memory computing architecture excels at pipelined operation. For example, a single low power MemryX M.2 card can continuously run one or more AI models on 10s of incoming camera streams, which is a game changer for edge applications such as Video Management Systems.
  • High model accuracy with just 1-click - MemryX automated tools can compile and execute 1000s of AI models with high accuracy with just 1-click. MX3 uses floating-point activations, and the compilation process maintains AI models as trained. This means customers using MX3 do not need to retrain models or use pilot images during feature map quantization to increase accuracy or approximate operators unsupported in silicon.
  • No Model Zoo or hidden model changes—MemryX does not use or require a model zoo, where customer models are modified to fit the target hardware. Instead, original models remain fully intact when compiled and run on MX3. Of course, a customer always has the option to prune, compress, or distill a given model to make desired design trade-offs. But MemryX does not require model changes for efficient and high utilization of hardware.
  • Automated Pre/Post processing - While an AI processor is used for AI workloads, many models contain code designed for CPU pre and post processing. Programmers must then figure out pre & post processing code themselves. Instead, MemryX automatically identifies and packages this code, helping the programmer accelerate application deployment using autocropping.
  • Scalability - A single MX3 can be used, or can be combined with additional MX3 chips, all seamlessly acting as one logical unit connected to the host. This means MX3 based configurations could scale from a single chip supporting AI in an advanced smart camera to a 4-chip Edge PC, to an 8 or even 16-chip Edge Server application, all using the exact same software and host interface without added any hardware such as PCIe switches.
  • Low Power - Each MX3 uses 0.5-2.0 W, depending on the demands of the AI model and system settings. This enables the MX3 to offer high performance AI computing even in fanless devices for uses such as industrial PCs. An entire 4-chip M.2 module uses less than 1/10th the power of leading mainstream GPUs, while at the same time providing higher Edge AI performance.
  • Broad support - MemryX supports a broad set of x86, ARM, and RISC-V platforms out of the box using multiple OS.
MemryX has been sampling a variety of customers for months. Customers applications in development include retail, security, agriculture, auto, robotics and more. With high ease of use and scalability, along with industrial temperature specs, MemryX is poised to become the top choice for customers looking for accelerated Edge AI processing.

"The MX3 enables ASUS to deliver advanced AI analytics at the edge with reduced computing requirements, empowering real-time AI inference for our customers by simply adding the MX3 to their existing IPC devices," said Jessy Li, ASUS IoT Solution Director.

"DYNICS has integrated the MemryX MX3 module into our AI-driven platform and the results are phenomenal. The MX3 provides the computing power we need to run our most demanding AI models in real-time, with minimal power usage, enabling us to deploy AI at scale solving specific industrial opportunities," said Ed Gatt, CEO of DYNICS.

Today, MX3 based M.2 modules can be purchased through WPG Americas. Later in Q4, additional distributors in North America and abroad will provide MemryX solutions. Also, later in Q4 2024, MemryX will provide a public developer hub with open source software showcasing 100s of models and end applications.
Add your own comment

12 Comments on MemryX Announces Production Availability of the MX3 Edge AI Accelerator

#1
HBSound
Thinking out loud - I need to read this over thoroughly. Seeking an answer: how does this work in a workstation? Are there any pros/cons? OR incorrect application?
Posted on Reply
#2
cal5582
this says nothing about the performance
Posted on Reply
#3
Canned Noodles
Very cool use of an M.2 slot. The price from WPG Americas is $212 (here) but as cal5582 stated, there’s no performance numbers so who’s to say if it’s a good value?
Posted on Reply
#4
cal5582
Canned NoodlesVery cool use of an M.2 slot. The price from WPG Americas is $212 (here) but as cal5582 stated, there’s no performance numbers so who’s to say if it’s a good value?
yeah theres also things like the Hailo-8 and you know those have 26 tops for instance.
Posted on Reply
#5
Wirko
What about memory? No memory chips can be seen, it might be able to use system RAM using HMB (the same method as SSDs use) but they don't tell... the only thing they tell is "~10M parameters stored on-die", presumably on all four dies combined.
Posted on Reply
#6
mxdev
Hey, MemryX engineer here! I can address some of the comments:
  1. This is just an announcement that production version is out for our existing customers. In a couple weeks we'll have more volume for direct purchase on WPGA plus a couple other distributors. Also, I'm pretty sure the price will decrease as volume ramps up, since the current inventory is from a small batch.
  2. The SDK & documentation will be open-sourced around the same time, and our docs will include benchmarks across a few hundred models. We don't maintain a "zoo" of tuned models, but instead download -> compile -> run out-of-the-box and update the perf. table. If somebody did want to prune & tune the model (retrain), they could do it and get higher performance but it's not necessary.
  3. MX3 out-of-the-box FPS is typically faster than Hailo-8's tuned model zoo FPS. But tuned vs. tuned or OOTB vs. OOTB, the MX3 is much faster. Admittedly we're still early in creating tuned versions of models, since our focus is on running models as-is.
  4. It's important to note that TOPS is simply the # of MAC units * frequency. True performance comparisons need to consider the utilization of those MAC units when running a real model, not a theoretical max. A 10-TOPS chip at 50% utilization can have higher FPS than a 40-TOPS chip at 10% utilization. Comparing max TOPS alone between different architectures doesn't equate to final performance.
  5. It's 10M weights per chip, 40M total for the M.2 at default precision. Half-precision can go up to 80M for the M.2 but that needs accuracy considerations. Being focused on computer vision (think YOLOs) and not LLMs, the # of weights should give reasonably good coverage. In a future SDK release we'll have ways to run bigger models by swapping chunks to/from the host's memory, but it is always better to run on-chip only.
Posted on Reply
#7
Canned Noodles
mxdevHey, MemryX engineer here! I can address some of the comments:
  1. This is just an announcement that production version is out for our existing customers. In a couple weeks we'll have more volume for direct purchase on WPGA plus a couple other distributors. Also, I'm pretty sure the price will decrease as volume ramps up, since the current inventory is from a small batch.
  2. The SDK & documentation will be open-sourced around the same time, and our docs will include benchmarks across a few hundred models. We don't maintain a "zoo" of tuned models, but instead download -> compile -> run out-of-the-box and update the perf. table. If somebody did want to prune & tune the model (retrain), they could do it and get higher performance but it's not necessary.
  3. MX3 out-of-the-box FPS is typically faster than Hailo-8's tuned model zoo FPS. But tuned vs. tuned or OOTB vs. OOTB, the MX3 is much faster. Admittedly we're still early in creating tuned versions of models, since our focus is on running models as-is.
  4. It's important to note that TOPS is simply the # of MAC units * frequency. True performance comparisons need to consider the utilization of those MAC units when running a real model, not a theoretical max. A 10-TOPS chip at 50% utilization can have higher FPS than a 40-TOPS chip at 10% utilization. Comparing max TOPS alone between different architectures doesn't equate to final performance.
  5. It's 10M weights per chip, 40M total for the M.2 at default precision. Half-precision can go up to 80M for the M.2 but that needs accuracy considerations. Being focused on computer vision (think YOLOs) and not LLMs, the # of weights should give reasonably good coverage. In a future SDK release we'll have ways to run bigger models by swapping chunks to/from the host's memory, but it is always better to run on-chip only.
Thank you for sharing this info :toast:
Posted on Reply
#8
HBSound
I read over the page, and I am trying to figure out if something like this is beneficial. How does this help a system? No software is needed, so how does it integrate into the system?
mxdevHey, MemryX engineer here! I can address some of the comments:
  1. This is just an announcement that production version is out for our existing customers. In a couple weeks we'll have more volume for direct purchase on WPGA plus a couple other distributors. Also, I'm pretty sure the price will decrease as volume ramps up, since the current inventory is from a small batch.
  2. The SDK & documentation will be open-sourced around the same time, and our docs will include benchmarks across a few hundred models. We don't maintain a "zoo" of tuned models, but instead download -> compile -> run out-of-the-box and update the perf. table. If somebody did want to prune & tune the model (retrain), they could do it and get higher performance but it's not necessary.
  3. MX3 out-of-the-box FPS is typically faster than Hailo-8's tuned model zoo FPS. But tuned vs. tuned or OOTB vs. OOTB, the MX3 is much faster. Admittedly we're still early in creating tuned versions of models, since our focus is on running models as-is.
  4. It's important to note that TOPS is simply the # of MAC units * frequency. True performance comparisons need to consider the utilization of those MAC units when running a real model, not a theoretical max. A 10-TOPS chip at 50% utilization can have higher FPS than a 40-TOPS chip at 10% utilization. Comparing max TOPS alone between different architectures doesn't equate to final performance.
  5. It's 10M weights per chip, 40M total for the M.2 at default precision. Half-precision can go up to 80M for the M.2 but that needs accuracy considerations. Being focused on computer vision (think YOLOs) and not LLMs, the # of weights should give reasonably good coverage. In a future SDK release we'll have ways to run bigger models by swapping chunks to/from the host's memory, but it is always better to run on-chip only.
Thanks for the info.

Do any of your tuned models have anything to do with enhancing for CAD/3D/BIM/Rendering?
Posted on Reply
#9
cal5582
HBSoundI read over the page, and I am trying to figure out if something like this is beneficial. How does this help a system? No software is needed, so how does it integrate into the system?


Thanks for the info.

Do any of your tuned models have anything to do with enhancing for CAD/3D/BIM/Rendering?
man between that and rigging/skinning models it seems like that entire industry got neglected
Posted on Reply
#10
mxdev
HBSoundI read over the page, and I am trying to figure out if something like this is beneficial. How does this help a system? No software is needed, so how does it integrate into the system?


Thanks for the info.

Do any of your tuned models have anything to do with enhancing for CAD/3D/BIM/Rendering?
Ah sorry, by models I meant neural network models, not CAD, etc. If there's a neural net that accelerates/improves these applications though, they might be possible -- integration of our runtime libraries into the applications would likely need us to work with the software vendor though.

The MX3 is designed for real-time computer vision, so applications we could be used in includes AI security cameras, factories (defect inspection, etc.), "smart city" (traffic intersection monitoring, license plate readers, etc.), and robotics (self-driving, depth estimation / navigation, etc.). On desktop PCs we can do stuff like video super-resolution or video call filters, or there was a 3D avatar thing Lenovo did at CES 2024 that used the MX3. Not sure on the category for this one, but I've used an Orange Pi + MX3 + XREAL Air to make some homebrew AR glasses too.

So basically our main areas are secuity, industrial, and robotics.
Posted on Reply
#12
Rob_mc_1
mxdevAh sorry, by models I meant neural network models, not CAD, etc. If there's a neural net that accelerates/improves these applications though, they might be possible -- integration of our runtime libraries into the applications would likely need us to work with the software vendor though.

The MX3 is designed for real-time computer vision, so applications we could be used in includes AI security cameras, factories (defect inspection, etc.), "smart city" (traffic intersection monitoring, license plate readers, etc.), and robotics (self-driving, depth estimation / navigation, etc.). On desktop PCs we can do stuff like video super-resolution or video call filters, or there was a 3D avatar thing Lenovo did at CES 2024 that used the MX3. Not sure on the category for this one, but I've used an Orange Pi + MX3 + XREAL Air to make some homebrew AR glasses too.

So basically our main areas are secuity, industrial, and robotics.
Would this work in Windows 11 to enable Copilot+ features? Instead of buy a who new PC?
Posted on Reply
Add your own comment
Jan 5th, 2025 05:18 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts