Tuesday, July 30th 2024
Apple Trained its Apple Intelligence Models on Google TPUs, Not NVIDIA GPUs
Apple has disclosed that its newly announced Apple Intelligence features were developed using Google's Tensor Processing Units (TPUs) rather than NVIDIA's widely adopted hardware accelerators like H100. This unexpected choice was detailed in an official Apple research paper, shedding light on the company's approach to AI development. The paper outlines how systems equipped with Google's TPUv4 and TPUv5 chips played a crucial role in creating Apple Foundation Models (AFMs). These models, including AFM-server and AFM-on-device, are designed to power both online and offline Apple Intelligence features introduced at WWDC 2024. For the training of the 6.4 billion parameter AFM-server, Apple's largest language model, the company utilized an impressive array of 8,192 TPUv4 chips, provisioned as 8×1024 chip slices. The training process involved a three-stage approach, processing a total of 7.4 trillion tokens. Meanwhile, the more compact 3 billion parameter AFM-on-device model, optimized for on-device processing, was trained using 2,048 TPUv5p chips.
Apple's training data came from various sources, including the Applebot web crawler and licensed high-quality datasets. The company also incorporated carefully selected code, math, and public datasets to enhance the models' capabilities. Benchmark results shared in the paper suggest that both AFM-server and AFM-on-device excel in areas such as Instruction Following, Tool Use, and Writing, positioning Apple as a strong contender in the AI race despite its relatively late entry. However, Apple's penetration tactic into the AI market is much more complex than any other AI competitor. Given Apple's massive user base and millions of devices compatible with Apple Intelligence, the AFM has the potential to change user interaction with devices for good, especially for everyday tasks. Hence, refining AI models for these tasks is critical before massive deployment. Another unexpected feature is transparency from Apple, a company typically known for its secrecy. The AI boom is changing some of Apple's ways, and revealing these inner workings is always interesting.
Source:
via Tom's Hardware
Apple's training data came from various sources, including the Applebot web crawler and licensed high-quality datasets. The company also incorporated carefully selected code, math, and public datasets to enhance the models' capabilities. Benchmark results shared in the paper suggest that both AFM-server and AFM-on-device excel in areas such as Instruction Following, Tool Use, and Writing, positioning Apple as a strong contender in the AI race despite its relatively late entry. However, Apple's penetration tactic into the AI market is much more complex than any other AI competitor. Given Apple's massive user base and millions of devices compatible with Apple Intelligence, the AFM has the potential to change user interaction with devices for good, especially for everyday tasks. Hence, refining AI models for these tasks is critical before massive deployment. Another unexpected feature is transparency from Apple, a company typically known for its secrecy. The AI boom is changing some of Apple's ways, and revealing these inner workings is always interesting.
28 Comments on Apple Trained its Apple Intelligence Models on Google TPUs, Not NVIDIA GPUs
Plus they have nor forgotten or forgiven Nvidia for the late 2000's bumpgate issue.
But regarding the Apple models, I'm shocked that the server version is only twice the size of the on-device version.
When will you people realize that business is business, and grudges are something only forum dwellers hold... there's an obvious logic behind this choice - if anything, Google's far more antagonistic to Apple than Nvidia will ever be - the hardware likely operates the same way their in-SoC model does, just on a much larger scale.
www.bloomberg.com/news/articles/2024-05-01/google-s-payments-to-apple-reached-20-billion-in-2022-cue-says
They also got Apple's business for at least 6-7 years after Bumpgate - if there was truly any bad blood here you'd never see nForce Macs in the early 2010s. They'd been cut off around 2008. AMD's deal with Apple was likely a combination of favorable pricing and software licensing provisions, considered that Apple likely were already considering transitioning to in-house silicon for at least a few years before the 2020 debut of M1. Similar situation, the Saudi Royal Family is vastly wealthier and more powerful than any who would have legitimate beef with them. Aramco, as any state oil company (you could say the same about Petrobras here in Brazil), is a national treasury in the strictest sense of it - oil, energy = money.
But many company's are generating their own TPU's - amazon, MS and such. It gets the work far more efficient then using a general purpose GPU.
Other companies might not have a real choice in the matter but Apple has the cash book to not suffer headache partners.
Just like when they cut ZFS from its integration (in progress) in macOS upon Oracle’s acquisition of Sun Microsystems.
We should all learn from Apple’s management and not work with business partners that are likely to or do cause us headaches.
BTW this is kind of misleading, it's getting reposted everywhere but if you go read the original articles on reuters/cnbc or the paper Apple published it's not that they used Google TPUs, they used Google Cloud which offers both Google TPU and Nvidia gpus. The idea making the rounds that it's all Google silicon is just clickbait