Wednesday, October 18th 2023

AMD, Arm, Intel, Meta, Microsoft, NVIDIA, and Qualcomm Standardize Next-Generation Narrow Precision Data Formats for AI

Realizing the full potential of next-generation deep learning requires highly efficient AI infrastructure. For a computing platform to be scalable and cost efficient, optimizing every layer of the AI stack, from algorithms to hardware, is essential. Advances in narrow-precision AI data formats and associated optimized algorithms have been pivotal to this journey, allowing the industry to transition from traditional 32-bit floating point precision to presently only 8 bits of precision (i.e. OCP FP8).

Narrower formats allow silicon to execute more efficient AI calculations per clock cycle, which accelerates model training and inference times. AI models take up less space, which means they require fewer data fetches from memory, and can run with better performance and efficiency. Additionally, fewer bit transfers reduces data movement over the interconnect, which can enhance application performance or cut network costs.
Bringing Together Key Industry Leaders to Set the Standard
Earlier this year, AMD, Arm, Intel, Meta, Microsoft, NVIDIA, and Qualcomm Technologies, Inc. formed the Microscaling Formats (MX) Alliance with the goal of creating and standardizing next-generation 6- and 4-bit data types for AI training and inferencing. The key enabling technology that enables sub 8-bit formats to work, referred to as microscaling, builds on a foundation of years of design space exploration and research. MX enhances the robustness and ease-of-use of existing 8-bit formats such as FP8 and INT8, thus lowering the barrier for broader adoption of single digit bit training and inference.

The initial MX specification introduces four concrete floating point and interger-based data formats (MXFP8, MXFP6, MXFP4, and MXINT8) that are compatible with current AI stacks, upport implementation flexibility across both hardware and software, and enable finegrain microscaling at the hardware level. Extensive studies demonstrate that MX formats can be easily deployed for many diverse real-world cases such as large language models, computer vision, and recommender systems. My technology also enables LLM pre-training at 6- and 4-bit precisions without any modifications to conventional training recipes.

Democratizing AI Capabilities
In the evolving landscape of AI, open standards are critical to foster innovation, collaboration, and widespread adoption. These standards offer a unifying framework that enables consistent toolchains, model development, and interoperability across the AI ecosystem. This further empowers developers and organizations to harness the full potential of AI while mitigating the fragmentation and technology constraints that could otherwise stifle progress.

In this spirit, the MX Alliance has released the Microscaling Formats (MX) Specification v1.0 in an open, license-free format through the Open Compute Project Foundation (OCP) to enable and encourage broad industry adoption and provide the foundation for potential future narrow-format innovations. Additionally, a white paper and emulation libraries have also been published to provide details on the data science approach and select results of MX in action. This inclusivity not only accelerates the pace of AI advancement but also promotes openness, accountability, and the responsible development of AI applications.

"AMD is pleased to be a founding member of the MX Alliance and has been a key contributor to the OCP MX Specification v1.0. This industry collaboration to standardize MX data formats provides an open and sustainable approach to continued AI innovations while providing the AI ecosystem time to prepare for the use of MX data formats in future hardware and software. AMD is committed to driving forward an open AI ecosystem and is happy to contribute our research results on MX data formats to the broader AI community." - Michael Schulte, Sr. Fellow, AMD

"As an industry we have a unique opportunity to collaborate and realize the benefits of AI technology, which will enable new use cases from cloud to edge to endpoint. This requires commitment to standardization for AI training and inference so that developers can focus on innovating where it really matters, and the release of the OCP MX specification is a significant milestone in this journey." - Ian Bratt, Fellow and Senior Director of Technology, Arm

"The OCP MX spec is the result of a fairly broad cross-industry collaboration and represents an important step forward in unifying and standardizing emerging sub-8bit data formats for AI applications. Portability and interoperability of AI models enabled by this should make AI developers very happy. Benefiting AI applications should see higher levels of performance and energy efficiency, with reduced memory needs." - Pradeep Dubey, Senior Fellow and Director of the Parallel Computing Lab, Intel

"To keep pace with the accelerating demands of AI, innovation must happen across every layer of the stack. The OCP MX effort is a significant leap forward in enabling more scalability and efficiency for the most advanced training and inferencing workloads. MX builds upon years of internal work, and now working together with our valued partners, has evolved into an open standard that will benefit the entire AI ecosystem and industry." - Brian Harry, Technical Fellow, Microsoft

"MX formats with a wide spectrum of sub-8-bit support provide efficient training and inference solutions that can be applied to AI models in various domains, from recommendation models with strict accuracy requirements, to the latest large language models that are latency-sensitive and compute intensive. We believe sharing these MX formats with the OCP and broader ML community will lead to more innovation in AI modeling." - Ajit Mathews, Senior Director of Engineering, Meta AI

"The OCP MX specification is a significant step towards accelerating AI training and inference workloads with sub-8-bit data formats. These formats accelerate applications by reducing memory footprint and bandwidth pressure, also allowing for innovation in math operation implementation. The open format specification enables platform interoperability, benefiting the entire industry." - Paulius Micikevicius, Senior Distinguished Engineer, NVIDIA

"The new OCP MX specification will help accelerate the transition to lower-cost, lower-power server-based forms of AI inference. We are passionate about democratizing AI through lower-cost inference and we are glad to join this effort." - Colin Verrilli, Senior Director, Qualcomm Technologies, Inc

About the Open Compute Project Foundation
The Open Compute Project (OCP) is a collaborative Community of hyperscale data center operators, telecom, colocation providers and enterprise IT users, working with the product and solution vendor ecosystem to develop open innovations deployable from the cloud to the edge. The OCP Foundation is responsible for fostering and serving the OCP Community to meet the market and shape the future, taking hyperscale-led innovations to everyone. Meeting the market is accomplished through addressing challenging market obstacles with open specifications, designs and emerging market programs that showcase OCP-recognized IT equipment and data center facility best practices. Shaping the future includes investing in strategic initiatives and programs that prepare the IT ecosystem for major technology changes, such as AI & ML, optics, advanced cooling techniques, composable memory and silicon. OCP Community-developed open innovations strive to benefit all, optimized through the lens of impact, efficiency, scale and sustainability.

Learn more at: www.opencompute.org.
Add your own comment

9 Comments on AMD, Arm, Intel, Meta, Microsoft, NVIDIA, and Qualcomm Standardize Next-Generation Narrow Precision Data Formats for AI

#1
unwind-protect
4 bits?

I want to use useful calculations on that data type. Maybe I am not up-to-date with ML. It this just for inference?
Posted on Reply
#2
Wirko
unwind-protect4 bits?

I want to use useful calculations on that data type. Maybe I am not up-to-date with ML. It this just for inference?
All these new formats are exponent-only if I'm reading this table right. That's also interesting.
Posted on Reply
#4
buildbot
unwind-protect4 bits?

I want to use useful calculations on that data type. Maybe I am not up-to-date with ML. It this just for inference?
This is for both training and inference! You end up with a small gap using MX4 compared to FP32, but that might be acceptable for your use case. MX6 is on par with FP32 training.
WirkoAll these new formats are exponent-only if I'm reading this table right. That's also interesting.
Not exactly - the element data type for MXFP4 for example is 2 exponent bits and 1 mantissa bit. These are grouped into a block of 32 elements, and scaled by an 8 bit exponent. So the effective bits per element for MXFP4 is 4+8/32 = 4.25 bits per element.
Posted on Reply
#5
Nhonho
The text of this news is long-winded and confusing.
Posted on Reply
#6
TheoneandonlyMrK
In essence then, standard format's should equate to transferable Algebraic IP across many diverse brands, some of which will still do better than others.

Cuda would be the main looser here IMHO

We consumers can only win from this news.

But damn if AI hasn't become the next 3Dtv, IMHO.
Posted on Reply
#7
Nhonho
I really hope that people open their eyes and stop making apps in CUDA and only make them in OpenCL and in other open APIs so that the apps can run on any GPU or dedicated chip for AI.
Posted on Reply
#8
unwind-protect
NhonhoI really hope that people open their eyes and stop making apps in CUDA and only make them in OpenCL and in other open APIs so that the apps can run on any GPU or dedicated chip for AI.
Unfortunately CUDA is much more convenient and approachable for programmers new to GPU computing.
Posted on Reply
#9
buildbot
NhonhoThe text of this news is long-winded and confusing.
It's really technical that is fair! If you have any questions I would be happy to try to explain!
TheoneandonlyMrKIn essence then, standard format's should equate to transferable Algebraic IP across many diverse brands, some of which will still do better than others.

Cuda would be the main looser here IMHO

We consumers can only win from this news.

But damn if AI hasn't become the next 3Dtv, IMHO.
Exactly - standardize the datatypes so that everyone can use the same number format and build hardware the supports it.

CUDA/Nvidia don't loose at all! In my opinion - they gain as much as everyone else, since Nvidia will also support the new more efficient datatypes and still have great hardware for those types with all of the ease CUDA brings.
NhonhoI really hope that people open their eyes and stop making apps in CUDA and only make them in OpenCL and in other open APIs so that the apps can run on any GPU or dedicated chip for AI.
CUDA is the default and has a huge amount of mindshare, but it is slowly happening - Pytorch is at least trying to support other backends with different levels of intermediate compilation to open up new GPUs and dedicated chips. They have quite a few already:
  • torch.backends.cpu
  • torch.backends.cuda
  • torch.backends.cudnn
  • torch.backends.mps
  • torch.backends.mkl
  • torch.backends.mkldnn
  • torch.backends.openmp
  • torch.backends.opt_einsum
  • torch.backends.xeon
unwind-protectUnfortunately CUDA is much more convenient and approachable for programmers new to GPU computing.
CUDA is somewhat pleasant to write compared to OpenCL which I have always really disliked, personally at least.
Posted on Reply
Nov 21st, 2024 08:04 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts