Friday, October 21st 2022
IBM Artificial Intelligence Unit (AIU) Arrives with 23 Billion Transistors
IBM Research has published information about the company's latest development of processors for accelerating Artificial Intelligence (AI). The latest IBM processor, called the Artificial Intelligence Unit (AIU), embraces the problem of creating an enterprise solution for AI deployment that fits in a PCIe slot. The IBM AIU is a half-height PCIe card with a processor powered by 23 Billion transistors manufactured on a 5 nm node (assuming TSMC's). While IBM has not provided many details initially, we know that the AIU uses an AI processor found in the Telum chip, a core of the IBM Z16 mainframe. The AIU uses Telum's AI engine and scales it up to 32 cores and achieve high efficiency.
The company has highlighted two main paths for enterprise AI adoption. The first one is to embrace lower precision and use approximate computing to drop from 32-bit formats to some odd-bit structures that hold a quarter as much precision and still deliver similar result. The other one is, as IBM touts, that "AI chip should be laid out to streamline AI workflows. Because most AI calculations involve matrix and vector multiplication, our chip architecture features a simpler layout than a multi-purpose CPU. The IBM AIU has also been designed to send data directly from one compute engine to the next, creating enormous energy savings."In the sea of AI accelerators, IBM hopes to differentiate its offerings by having an enterprise chip to solve more complex problems than current AI chips target. "Deploying AI to classify cats and dogs in photos is a fun academic exercise. But it won't solve the pressing problems we face today. For AI to tackle the complexities of the real world—things like predicting the next Hurricane Ian, or whether we're heading into a recession—we need enterprise-quality, industrial-scale hardware. Our AIU takes us one step closer. We hope to soon share news about its release," says the official IBM release.
Source:
IBM Research
The company has highlighted two main paths for enterprise AI adoption. The first one is to embrace lower precision and use approximate computing to drop from 32-bit formats to some odd-bit structures that hold a quarter as much precision and still deliver similar result. The other one is, as IBM touts, that "AI chip should be laid out to streamline AI workflows. Because most AI calculations involve matrix and vector multiplication, our chip architecture features a simpler layout than a multi-purpose CPU. The IBM AIU has also been designed to send data directly from one compute engine to the next, creating enormous energy savings."In the sea of AI accelerators, IBM hopes to differentiate its offerings by having an enterprise chip to solve more complex problems than current AI chips target. "Deploying AI to classify cats and dogs in photos is a fun academic exercise. But it won't solve the pressing problems we face today. For AI to tackle the complexities of the real world—things like predicting the next Hurricane Ian, or whether we're heading into a recession—we need enterprise-quality, industrial-scale hardware. Our AIU takes us one step closer. We hope to soon share news about its release," says the official IBM release.
15 Comments on IBM Artificial Intelligence Unit (AIU) Arrives with 23 Billion Transistors
So, instead of FP32 bit, we are going to INT8. And >>still deliver the same result.<< is just untrue. Because there are only a very few and specific number of simulation/calculation scenarios where you would get the same results. You might get nearly the same, most of the time, but never always the same all the time.
Crude savage approximate fast-math has it's applications for certain jobs. But let's not pretend this is suitable for all "AI" computational tasks, nor that it will "deliver the same result."
The Error Set of fast INT8 math vs. slow FP32/64/80/128 math is like a fractal - or mandelbrot set. It is "beautiful" in it's unpredictable deviation, and in some boundary areas, that deviation can be ENORMOUS.
They already working together for other 5nm designs.
Innovation at the Albany Nanotech Complex is often directed towards commercialization, and on that end of the chip lifecycle today the companies also announced that Samsung will manufacture IBM's chips at the 5 nm node.
I'm also curious why it always has to be powers of 2. Formats like INT12 or FP12 would be usable in some cases too.
Edit: Nvidia claims that FP8 can replace FP16 with no loss of accuracy.
Silicon Valley is rife with startups that are making more or less the same kind of chips. It's not that it has to be powers of 2, it has to be in multiples of 8 bits. It's very unusual and problematic to build processors that work on data which isn't a multiple of a byte. In every computing system 1 byte is the basic unit of storage/processing and everything else that doesn't match that will be problematic to integrate. INT4 still works fine because two INT4s can fit in one byte. The original floating point format was 80bits, which wasn't a power of 2 but was a multiple of 8.
Either way not interested in any thing AI based as you know it will get in the wrong hands, bad enough with whats available now.
What bad thing is going on that I don't know about?
I am just starting in this area of IA.