Researchers Unveils Real-Time GPU-Only Pipeline for Fully Procedural Trees
A research team from Coburg University of Applied Sciences and Arts in Germany, alongside AMD Germany, introduced a game-changing approach to procedural tree creation that runs entirely on the GPU, delivering both speed and flexibility, unlike anything we've seen before. Showcased at High-Performance Graphics 2025 in Copenhagen, the new pipeline utilizes DirectX 12 work graphs and mesh nodes to construct detailed tree models on the fly, without any CPU muscle. Artists and developers can tweak more than 150 parameters, everything from seasonal leaf color shifts and branch pruning styles to complex animations and automatic level-of-detail adjustments, all in real-time. When tested on an AMD Radeon RX 7900 XTX, the system generated and pushed unique tree geometries into the geometry buffer in just over three milliseconds. It then automatically tunes detail levels to maintain a target frame rate, effortlessly demonstrating stable 120 FPS under heavy workloads.
Wind effects and environmental interactions update seamlessly, and the CPU's only job is to fill a small set of constants (camera matrices, timestamps, and so on) before dispatching a single work graph. There's no need for continuous host-device chatter or asset streaming, which simplifies integration into existing engines. Perhaps the most eye-opening result is how little memory the transient data consumes. A traditional buffer-heavy approach might need tens of GB, but researcher's demo holds onto just 51 KB of persistent state per frame—a mind-boggling 99.9999% reduction compared to conventional methods. A scratch buffer of up to 1.5 GB is allocated for work-graph execution, though actual usage varies by GPU driver and can be released or reused afterward. Static assets, such as meshes and textures, remain unaffected, leaving future opportunities for neural compression or procedural texturing to further enhance memory savings.
Wind effects and environmental interactions update seamlessly, and the CPU's only job is to fill a small set of constants (camera matrices, timestamps, and so on) before dispatching a single work graph. There's no need for continuous host-device chatter or asset streaming, which simplifies integration into existing engines. Perhaps the most eye-opening result is how little memory the transient data consumes. A traditional buffer-heavy approach might need tens of GB, but researcher's demo holds onto just 51 KB of persistent state per frame—a mind-boggling 99.9999% reduction compared to conventional methods. A scratch buffer of up to 1.5 GB is allocated for work-graph execution, though actual usage varies by GPU driver and can be released or reused afterward. Static assets, such as meshes and textures, remain unaffected, leaving future opportunities for neural compression or procedural texturing to further enhance memory savings.