Editorial On NVIDIA's Tile-Based Rendering

Raevenlord · Mar 1, 2017

Looking back on NVIDIA's GDC presentation, perhaps one of the most interesting aspects approached was the implementation of tile-based rendering on NVIDIA's post-Maxwell architectures. This has been an adaptation of typically mobile approaches to graphics rendering which keeps their specific needs for power efficiency in mind - and if you'll "member", "Maxwell" was NVIDIA's first graphics architecture publicly touted for its "mobile first" design.

This approach essentially divides the screen into tiles, and then rasterizes the entire frame in a per-tile basis. 16×16 and 32×32 pixels are the usual tile sizes, but both Maxwell and Pascal can dynamically assess the required tile size for each frame, changing it on-the-fly as needed and according to the complexity of the scene. This looks to ensure that the processed data has a much smaller footprint than that of the full image rendering - small enough that it makes it possible for NVIDIA to keep the data in a much smaller amount of memory (essentially, the L2 memory), dynamically filling and flushing the available cache as possible until the full frame has been rendered. This means that the GPU doesn't have to access larger, slower memory pools as much, which primarily reduces the load on the VRAM subsystem (increasing available VRAM for other tasks), whilst simultaneously accelerating rendering speed. At the same time, a tile-based approach also lends itself pretty well to the nature of GPUs - these are easily parallelized operations, with the GPU being able to tackle many independent tiles simultaneously, depending on the available resources.

Thanks to NVIDIA's public acknowledgement on the usage of tile-based rendering strating with its Maxwell architectures, some design decisions on the Maxwell architecture now make much more sense. Below, is a screenshot taken from NVIDIA's "5 Things You Should Know About the New Maxwell GPU Architecture". Take a look at the L2 cache size. From Kepler to Maxwell, the cache size increased 8x, from 256 KB on Kepler to the 2048 KB on Maxwell. Now, we can attribute this gigantic leap in cache size to the need for a higher-size L2 cache so as to fit the required tile-based resources for the rasterizing process, which allowed NVIDIA the leap in memory performance and power efficiency they achieved with the Maxwell architecture compared to its Kepler predecessor. Incidentally, NVIDIA's GP102 chip (which powers the GTX Titan X and the upcoming, recently announced GTX 1080 Ti, doubles that amount of L2 cache again, to a staggering 4096 KB. Whether or not Volta will continue with the scaling of L2 cache remains to be seen, but I've seen worse bets.

An interesting tangent: the Xbox 360 and Xbox One ESRAM chips (running on AMD-architectured GPUs, no less) can make for a substitute for the tile-based rasterization process that post-Maxwell NVIDIA GPUs employ.

Tile-based rendering seems to have been a key part on NVIDIA's secret-sauce towards achieving the impressive performance-per-watt ratings of their last two architectures, and it's expected that their approach to this rendering mode will only improve with time. Some differences can be seen in the tile-based rendering between Maxwell and Pascal already, with the former dividing the scene into triangles, and the later breaking a scene up into squares or vertical rectangles as needed, so this means that NVIDIA has in fact put in some measure of work into the rendering system between both these architectures.

Perhaps we have already seen some seeds of this tile-based rendering on AMD's Vega architecture sneak peek, particularly in regards to its next-generation Pixel Engine: the render back-ends now being clients of the L2 cache substitute their previous architectures' non-coherent memory access, in which the pixel engine wrote to the memory controller. This could be AMD's way of tackling the same problem, with AMD's improvements to the pixel-engine with a new-generation draw-stream binning rasterizer supposedly helping to conserve clock cycles, whilst simultaneously improving on-die cache locality and memory footprint.

David Kanter, of Real World Tech, has a pretty interesting YouTube video where he goes in some depth on NVIDIA's tile-based approach, which you can check if you're interested.

View at TechPowerUp Main Site

TheLostSwede · Mar 1, 2017

Welcome to 2001 Nvidia...
At least it's good to see that they're finally catching up.

Ferrum Master · Mar 1, 2017

Kyro again.

Nokiron · Mar 1, 2017

TheLostSwede said:
Welcome to 2001 Nvidia...
At least it's good to see that they're finally catching up.

If Nvidia is welcomed to 2001 with tile-based rasterization, where does this leave AMD? 1995?

sutyi · Mar 1, 2017

Ferrum Master said:
Kyro again.

PowerVR lawsuit coming in 3... 2...

londiste · Mar 1, 2017

sutyi said:
PowerVR lawsuit coming in 3... 2...

no it isn't.

the way it is done by nvidia (and amd) is different enough that the original set of patents are unlikely to cover any of this.

also, a lot of mobile gpus do tiled rendering and there has not really been a wave of lawsuits.

Brusfantomet · Mar 1, 2017

londiste said:
no it isn't.

the way it is done by nvidia (and amd) is different enough that the original set of patents are unlikely to cover any of this.

also, a lot of mobile gpus do tiled rendering and there has not really been a wave of lawsuits.

Lots of mobile GPUs are based on the PowerVR design

londiste · Mar 1, 2017

yup, powervr is significant in mobile space.
but not only their gpus do tiles, perhaps most notably arm's mali and qualcomm's adreno should both do tile-based rendering.

TheLostSwede · Mar 1, 2017

Nokiron said:
If Nvidia is welcomed to 2001 with tile-based rasterization, where does this leave AMD? 1995?

PowerVR did tiled based rendering with their Kyro chips around 2001. AMD has as far as I'm aware been doing some form of tile based rendering for quite some time.

Nokiron · Mar 1, 2017

TheLostSwede said:
PowerVR did tiled based rendering with their Kyro chips around 2001. AMD has as far as I'm aware been doing some form of tile based rendering for quite some time.

I don't think AMD has ever used it in a desktop-product. The Adreno-products did.

Solidstate89 · Mar 1, 2017

TheLostSwede said:
PowerVR did tiled based rendering with their Kyro chips around 2001. AMD has as far as I'm aware been doing some form of tile based rendering for quite some time.

You would be wrong. Which makes your nVidia comment all the more hilarious as they are the first manufacturer to implement this on GPU chips outside of the mobile environment.

Steevo · Mar 1, 2017

Solidstate89 said:
You would be wrong. Which makes your nVidia comment all the more hilarious as they are the first manufacturer to implement this on GPU chips outside of the mobile environment.

Your special kid.

"An interesting tangent: the Xbox 360 and Xbox One ESRAM chips (running on AMD-architectured GPUs, no less) can make for a substitute for the tile-based rasterization process that post-Maxwell NVIDIA GPUs employ."

ZoneDymo · Mar 1, 2017

So is this more proof these gpu makers are sitting on a bunch of tech they COULD put in their new gpu and send us light years ahead in tech but dont because feeding it to the public in piecemeal portions means more money?

Nokiron · Mar 1, 2017

Steevo said:
Your special kid.

"An interesting tangent: the Xbox 360 and Xbox One ESRAM chips (running on AMD-architectured GPUs, no less) can make for a substitute for the tile-based rasterization process that post-Maxwell NVIDIA GPUs employ."

That's not proper tile-based rasterization though. And is it says, it's a substitute which is way slower.

Solidstate89 · Mar 1, 2017

Steevo said:
Your special kid.

"An interesting tangent: the Xbox 360 and Xbox One ESRAM chips (running on AMD-architectured GPUs, no less) can make for a substitute for the tile-based rasterization process that post-Maxwell NVIDIA GPUs employ."

You're**

TheLostSwede · Mar 1, 2017

Solidstate89 said:
You would be wrong. Which makes your nVidia comment all the more hilarious as they are the first manufacturer to implement this on GPU chips outside of the mobile environment.

Right, I guess they did it for mobile, but never desktop. Interesting. Trident did it (not sure the chip ever went into mass production though), but I guess no-one remembers them any more...
It also looks like PowerVR did it all the way back in 1996 when they started doing GPUs.

So welcome to 1996 Nvidia...

prtskg · Mar 1, 2017

Nokiron said:
That's not proper tile-based rasterization though. And is it says, it's a substitute which is way slower.

Where is the slower part written? May be I need to read the article again!

Nokiron · Mar 1, 2017

prtskg said:
Where is the slower part written? May be I need to read the article again!

The ESRAM in the Xbox One is inherently extremely slow compared to the low level cache found in a desktop GPU. That should really speak for itself.

They didn't say the last part though, i did. Not a native speaker.

Fluffmeister · Mar 1, 2017

Solidstate89 said:
You're**

No comment.

efikkan · Mar 1, 2017

Tiled rendering is one of several techniques which helps improve the efficiency of Maxwell/Pascal.
Generally this gives two great benefits:
- Tiles are rendered completely, instead of the screen rendering each pixel partially several times. This saves the data from taking several round-trips between the GPU and memory, which saves a lot of memory bandwidth.
- Lower risk of data hazards (multiple sections needing the same texture), so less stalls, improving GPU efficiency.
- Being cache local, reducing stalls, again improving GPU efficiency.

BTW; I recommend watching the referenced Youtube video in the article, it's visuals are good so even the non-programmers among you should be able to get the idea.

erixx · Mar 1, 2017

Then don't forget the good ole voxels! (Efficient Sparse Voxel Octrees) Novalogic was so futuristic with that.

Steevo · Mar 1, 2017

Solidstate89 said:
You're**

Gotta love auto correct on mobile devices.

Nokiron said:
The ESRAM in the Xbox One is inherently extremely slow compared to the low level cache found in a desktop GPU. That should really speak for itself.

They didn't say the last part though, i did. Not a native speaker.

The PS4 is a regular chip with caches, almost identical to the XBox1, just GDDR and caches. They are relatively quick, mostly held back by CPU cores that were not zen.

It's been known for awhile, at least since August of last year.

https://www.extremetech.com/gaming/...ets-of-nvidia-maxwell-pascal-power-efficiency

It's possible the tile based rendering will explain some of the artifacts they produce when running certain effects.

Super XP · Mar 2, 2017

Nokiron said:
If Nvidia is welcomed to 2001 with tile-based rasterization, where does this leave AMD? 1995?

It leaves AMD ahead of the game, in 2018,

Kanan · Mar 21, 2017

erixx said:
Then don't forget the good ole voxels! (Efficient Sparse Voxel Octrees) Novalogic was so futuristic with that.

Don't forget the epic Voxel game "Outcast", it was completely processed by the CPU in a time where 3D graphics were the newest and greatest shit

System Name	The Ryzening
Processor	AMD Ryzen 9 5900X
Motherboard	MSI X570 MAG TOMAHAWK
Cooling	Lian Li Galahad 360mm AIO
Memory	32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s)	Gigabyte RTX 3070 Ti
Storage	Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s)	Acer Nitro VG270UP (1440p 144 Hz IPS)
Case	Lian Li O11DX Dynamic White
Audio Device(s)	iFi Audio Zen DAC
Power Supply	Seasonic Focus+ 750 W
Mouse	Cooler Master Masterkeys Lite L
Keyboard	Cooler Master Masterkeys Lite L
Software	Windows 10 x64

System Name	Overlord Mk MLI
Processor	AMD Ryzen 7 7800X3D
Motherboard	Gigabyte X670E Aorus Master
Cooling	Noctua NH-D15 SE with offsets
Memory	32GB Team T-Create Expert DDR5 6000 MHz @ CL30-34-34-68
Video Card(s)	Gainward GeForce RTX 4080 Phantom GS
Storage	1TB Solidigm P44 Pro, 2 TB Corsair MP600 Pro, 2TB Kingston KC3000
Display(s)	Acer XV272K LVbmiipruzx 4K@160Hz
Case	Fractal Design Torrent Compact
Audio Device(s)	Corsair Virtuoso SE
Power Supply	be quiet! Pure Power 12 M 850 W
Mouse	Logitech G502 Lightspeed
Keyboard	Corsair K70 Max
Software	Windows 10 Pro
Benchmark Scores	https://valid.x86.fr/yfsd9w

System Name	HELLSTAR
Processor	AMD RYZEN 9 5950X
Motherboard	ASUS Strix X570-E
Cooling	2x 360 + 280 rads. 3x Gentle Typhoons, 3x Phanteks T30, 2x TT T140 . EK-Quantum Momentum Monoblock.
Memory	4x8GB G.SKILL Trident Z RGB F4-4133C19D-16GTZR 14-16-12-30-44
Video Card(s)	Sapphire Pulse RX 7900XTX. Water block. Crossflashed.
Storage	Optane 900P[Fedora] + WD BLACK SN850X 4TB + 750 EVO 500GB + 1TB 980PRO+SN560 1TB(W11)
Display(s)	Philips PHL BDM3270 + Acer XV242Y
Case	Lian Li O11 Dynamic EVO
Audio Device(s)	SMSL RAW-MDA1 DAC
Power Supply	Fractal Design Newton R3 1000W
Mouse	Razer Basilisk
Keyboard	Razer BlackWidow V3 - Yellow Switch
Software	FEDORA 41

System Name	masina
Processor	AMD Ryzen 5 3600
Motherboard	ASUS TUF B550M
Cooling	Scythe Kabuto 3 + Arctic BioniX P120 fan
Memory	16GB (2x8) DDR4-3200 CL16 Crucial Ballistix
Video Card(s)	Radeon Pro WX 2100 2GB
Storage	500GB Crucial MX500, 640GB WD Black
Display(s)	AOC C24G1
Case	SilentiumPC AT6V
Power Supply	Seasonic Focus GX 650W
Mouse	Logitech G203
Keyboard	Cooler Master MasterKeys L PBT
Software	Win 10 Pro

Processor	Ryzen 7800X3D
Motherboard	ROG STRIX B650E-F GAMING WIFI
Memory	2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s)	INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage	2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s)	42" LG C2 OLED, 27" ASUS PG279Q
Case	Thermaltake Core P5
Power Supply	Fractal Design Ion+ Platinum 760W
Mouse	Corsair Dark Core RGB Pro SE
Keyboard	Corsair K100 RGB
VR HMD	HTC Vive Cosmos

Editorial On NVIDIA's Tile-Based Rendering

Raevenlord

News Editor

TheLostSwede

News Editor

Ferrum Master

Nokiron

sutyi

londiste

Brusfantomet

londiste

TheLostSwede

News Editor

Nokiron

Solidstate89

Steevo

ZoneDymo

Nokiron

Solidstate89

TheLostSwede

News Editor

prtskg

Nokiron

Fluffmeister

efikkan

erixx

Steevo

Super XP

Kanan

Tech Enthusiast & Gamer

System Name	Games/internet/usage
Processor	I7 5820k 4.2 Ghz
Motherboard	ASUS X99-A2
Cooling	custom water loop for cpu and gpu
Memory	16GiB Crucial Ballistix Sport 2666 MHz
Video Card(s)	Radeon Rx 6800 XT
Storage	Samsung XP941 500 GB + 1 TB SSD
Display(s)	Dell 3008WFP
Case	Caselabs Magnum M8
Audio Device(s)	Shiit Modi 2 Uber -> Matrix m-stage -> HD650
Power Supply	beQuiet dark power pro 1200W
Mouse	Logitech MX518
Keyboard	Corsair K95 RGB
Software	Win 10 Pro

System Name	CUBE_NXT
Processor	i9 12900K @ 5.0Ghz all P-cores with E-cores enabled
Motherboard	Gigabyte Z690 Aorus Master
Cooling	EK AIO Elite Cooler w/ 3 Phanteks T30 fans
Memory	64GB DDR5 @ 5600Mhz
Video Card(s)	EVGA 3090Ti Ultra Hybrid Gaming w/ 3 Phanteks T30 fans
Storage	1 x SK Hynix P41 Platinum 1TB, 1 x 2TB, 1 x WD_BLACK SN850 2TB, 1 x WD_RED SN700 4TB
Display(s)	Alienware AW3418DW
Case	Lian-Li O11 Dynamic Evo w/ 3 Phanteks T30 fans
Power Supply	Seasonic PRIME 1000W Titanium
Software	Windows 11 Pro 64-bit

System Name	Compy 386
Processor	7800X3D
Motherboard	Asus
Cooling	Air for now.....
Memory	64 GB DDR5 6400Mhz
Video Card(s)	7900XTX 310 Merc
Storage	Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s)	55" Samsung 4K HDR
Audio Device(s)	ATI HDMI
Mouse	Logitech MX518
Keyboard	Razer
Software	A lot.
Benchmark Scores	Its fast. Enough.

System Name	Cyberline
Processor	Intel Core i7 2600k -> 12600k
Motherboard	Asus P8P67 LE Rev 3.0 -> Gigabyte Z690 Auros Elite DDR4
Cooling	Tuniq Tower 120 -> Custom Watercoolingloop
Memory	Corsair (4x2) 8gb 1600mhz -> Crucial (8x2) 16gb 3600mhz
Video Card(s)	AMD RX480 -> RX7800XT
Storage	Samsung 750 Evo 250gb SSD + WD 1tb x 2 + WD 2tb -> 2tb MVMe SSD
Display(s)	Philips 32inch LPF5605H (television) -> Dell S3220DGF
Case	antec 600 -> Thermaltake Tenor HTCP case
Audio Device(s)	Focusrite 2i4 (USB)
Power Supply	Seasonic 620watt 80+ Platinum
Mouse	Elecom EX-G
Keyboard	Rapoo V700
Software	Windows 10 Pro 64bit

Processor	AMD Ryzen 7 5700X3D
Motherboard	MSI MAG B550 TOMAHAWK
Cooling	Thermalright Peerless Assassin 120 SE
Memory	Team Group Dark Pro 8Pack Edition 3600Mhz CL16
Video Card(s)	Sapphire AMD Radeon RX 9070 XT NITRO+
Storage	Kingston A2000 1TB + Seagate HDD workhorse
Display(s)	Hisense 55" U7K 4K@144Hz
Case	Thermaltake Ceres 500 TG ARGB
Power Supply	Seasonic Focus GX-850
Mouse	Razer Deathadder Chroma
Keyboard	Logitech UltraX
Software	Windows 11

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

System Name	RiseZEN Gaming PC
Processor	AMD Ryzen 7 5800X @ Auto
Motherboard	Asus ROG Strix X570-E Gaming ATX Motherboard
Cooling	Corsair H115i Elite Capellix AIO, 280mm Radiator, Dual RGB 140mm ML Series PWM Fans
Memory	G.Skill TridentZ 64GB (4 x 16GB) DDR4 3200
Video Card(s)	ASUS DUAL RX 6700 XT DUAL-RX6700XT-12G
Storage	Corsair MP510 480GB M.2 - 2 x WD_BLACK 1TB SN850X M.2 1TB - Lexar NQ790 M.2 2TB
Display(s)	ASUS ROG Strix 34” XG349C 144Hz 1440p + Asus ROG 27" MG278Q 144Hz WQHD 1440p
Case	Corsair Obsidian Series 450D Gaming Case
Audio Device(s)	SteelSeries 5Hv2 w/ Sound Blaster Z SE
Power Supply	Corsair RM750x Power Supply
Mouse	Razer Death-Adder + Viper 8K HZ Ambidextrous Gaming Mouse - Ergonomic Left Hand Edition
Keyboard	Logitech G910 Orion Spectrum RGB Gaming Keyboard
Software	Windows 10 Pro - 64-Bit Edition (Back to Win 10 because 11 is garbage)
Benchmark Scores	I'm the Doctor, Doctor Who. The Definition of Gaming is PC Gaming...

System Name	eazen corp \| Xentronon 7.2
Processor	AMD Ryzen 7 3700X // PBO max.
Motherboard	Asus TUF Gaming X570-Plus
Cooling	Noctua NH-D14 SE2011 w/ AM4 kit // 3x Corsair AF140L case fans (2 in, 1 out)
Memory	G.Skill Trident Z RGB 2x16 GB DDR4 3600 @ 3800, CL16-19-19-39-58-1T, 1.4 V
Video Card(s)	Asus ROG Strix GeForce RTX 2080 Ti modded to MATRIX // 2000-2100 MHz Core / 1938 MHz G6
Storage	Silicon Power P34A80 1TB NVME/Samsung SSD 830 128GB&850 Evo 500GB&F3 1TB 7200RPM/Seagate 2TB 5900RPM
Display(s)	Samsung 27" Curved FS2 HDR QLED 1440p/144Hz&27" iiyama TN LED 1080p/120Hz / Samsung 40" IPS 1080p TV
Case	Corsair Carbide 600C
Audio Device(s)	HyperX Cloud Orbit S / Creative SB X AE-5 @ Logitech Z906 / Sony HD AVR @PC & TV @ Teufel Theater 80
Power Supply	EVGA 650 GQ
Mouse	Logitech G700 @ Steelseries DeX // Xbox 360 Wireless Controller
Keyboard	Corsair K70 LUX RGB /w Cherry MX Brown switches
VR HMD	Still nope
Software	Win 10 Pro
Benchmark Scores	15 095 Time Spy \| P29 079 Firestrike \| P35 628 3DM11 \| X67 508 3DM Vantage Extreme