
IBM Announces Granite 4.0 Tiny Preview - an Extremely Compact & Compute Efficient AI Model
We're excited to present IBM Granite 4.0 Tiny Preview, a preliminary version of the smallest model in the upcoming Granite 4.0 family of language models, to the open source community. Granite 4.0 Tiny Preview is extremely compact and compute efficient: at FP8 precision, several concurrent sessions performing long context (128K) tasks can be run on consumer grade hardware, including GPUs commonly available for under $350 USD. Though the model is only partially trained—it has only seen 2.5T of a planned 15T or more training tokens—it already offers performance rivaling that of IBM Granite 3.3 2B Instruct despite fewer active parameters and a roughly 72% reduction in memory requirements. We anticipate Granite 4.0 Tiny's performance to be on par with that of Granite 3.3 8B Instruct by the time it has completed training and post-training.
As its name suggests, Granite 4.0 Tiny will be among the smallest offerings in the Granite 4.0 model family. It will be officially released this summer as part of a model lineup that also includes Granite 4.0 Small and Granite 4.0 Medium. Granite 4.0 continues IBM's firm commitment to making efficiency and practicality the cornerstone of its enterprise LLM development. This preliminary version of Granite 4.0 Tiny is now available on Hugging Face—though we do not yet recommend the preview version for enterprise use—under a standard Apache 2.0 license. Our intent is to allow even GPU-poor developers to experiment and tinker with the model on consumer-grade GPUs. The model's novel architecture is pending support in Hugging Face transformers and vLLM, which we anticipate will be completed shortly for both projects. Official support to run this model locally through platform partners including Ollama and LMStudio is expected in time for the full model release later this summer.
As its name suggests, Granite 4.0 Tiny will be among the smallest offerings in the Granite 4.0 model family. It will be officially released this summer as part of a model lineup that also includes Granite 4.0 Small and Granite 4.0 Medium. Granite 4.0 continues IBM's firm commitment to making efficiency and practicality the cornerstone of its enterprise LLM development. This preliminary version of Granite 4.0 Tiny is now available on Hugging Face—though we do not yet recommend the preview version for enterprise use—under a standard Apache 2.0 license. Our intent is to allow even GPU-poor developers to experiment and tinker with the model on consumer-grade GPUs. The model's novel architecture is pending support in Hugging Face transformers and vLLM, which we anticipate will be completed shortly for both projects. Official support to run this model locally through platform partners including Ollama and LMStudio is expected in time for the full model release later this summer.