As we mentioned earlier, Maxwell was likely supposed to be built on the newer 20 nanometer node, but TSMC's delays in implementing it may have forced NVIDIA into redesigning it for the existing 28 nm process, which can be a disadvantage because the transistor counts and power budgets are based on a newer, smaller node. NVIDIA now had to make sure these work just as well on the existing node. GM204 crams a staggering 5.2 billion transistors into a die that measures 398 mm², and a package that's roughly the same size. It has 2 billion more transistors than the GK104 yet 1.9 billion transistors less than the GK110 on which its predecessor, the GTX 780, was built.
At the heart of the Maxwell architecture is a redesigned streaming multiprocessor (SMM), the tertiary subunit of the GPU. Variants of NVIDIA's GeForce GTX products are carved out by setting the number of SMM units at the chip's disposal. The GM204 has a similar component hierarchy as the GK104.
The chip begins with a PCI-Express 3.0 x16 bus interface, a 256-bit wide GDDR5 memory interface, and a display controller that supports as many as three Ultra HD displays, or five physical displays in total. This display controller introduces support for HDMI 2.0, which has enough bandwidth to drive Ultra HD displays at 60 Hz refresh rates. The controller is ready for 5K (5120x2880, four times the pixels as QuadHD). The 256-bit wide memory interface holds a standardized 4 GB of memory, with the bus clocked at 7.00 GHz on both the GTX 980 and GTX 970, which works out to a memory bandwidth of 224 GB/s. Don't let that worry you as NVIDIA is implementing a new lossless Delta color compression algorithm to make the most of the available memory bandwidth.
The GigaThread Engine splits workloads between four graphics processing clusters (GPCs). The L2 cache cushion transfers between these GPCs, and other components have been quadrupled in size. The GM204 ships with 2 MB of cache, compared to the 512 KB on the GK104. Each GPC holds four streaming multiprocessors (SMMs) and a common raster engine between them. Each SMX holds a third-generation PolyMorph Engine, a component that performs a host of rendering tasks, such as fetch, transform, setup, tessellation, and output. The SMX has 128 CUDA cores, the number-crunching components of NVIDIA GPUs, spread across four subdivisions with dedicated warp-schedulers, registers, and caches. NVIDIA claims the SMM to have two times the performance-per-watt figure of "Kepler" SMX units.
With four such 128-core SMMs per GPC, and four GPCs, the GM204 features 2,048 CUDA cores in all. Other vital specs include 128 texture memory units (TMUs) and 64 raster-operations units (ROPs). The ROP count is interesting as it has doubled over the GK104 and is greater than the 48 on the GM110. The third-generation Delta-color memory compression tech helps the chip make the most of its 224 GB/s memory bandwidth. It provides lossless compression, so textures will not feel washed out. With this tech in place, NVIDIA achieved savings of up to 29 percent in memory bandwidth usage; and so 7 Gbps memory "effectively" runs at 9.3 Gbps (although NVIDIA doesn't use "effective" bandwidth in its specs sheets).
GeForce Features
With each new architecture, NVIDIA introduces innovations in the consumer graphics space that go beyond simple feature-level compatibility with new DirectX versions. NVIDIA says GeForce GTX 980 and GTX 970 cards to be DirectX 12 cards, but exact feature levels and requirements have not been finalized by Microsoft, yet support for OpenGL 4.4 has also been added. OpenGL 4.4 adds a few new features through its GameWorks SDK that give game developers easy-to-implement visual features through existing APIs.
According to NVIDIA, the first and most important is VXGI, or real-time voxel global illumination. VGXI adds realism to the way light behaves with different surfaces in a 3D scene. VXGI introduces volume pixels, or voxels, a new 3D graphics component. These are pixels with built-in 3-dimensional data, so their interactions in 3D objects with light look more photo-realistic.
No new NVIDIA GPU architecture launch is complete without advancements in post-processing, particularly anti-aliasing. NVIDIA introduced an interesting feature called Dynamic Super Resolution (DSR), which it claims offers "4K-like clarity on a 1080p display". To us, it comes across as a really nice super-sampling AA algorithm with a filter.
Using GeForce Experience, you can enable DSR arbitrarily for 3D apps. The other new algorithm is MFAA (multi-frame sampled AA), which offers MSAA-like image quality at a deficit of 30 percent in performance. Using GeForce Experience, MFAA can hence be substituted for MSAA, perhaps even arbitrarily.
Moving on, NVIDIA introduced VR Direct, a technology designed for the reemerging VR headset market, due to the growing interest in Facebook's Occulus Rift VR headset. VR Direct is an API designed to reduce latency between the headset's input and the change on the display, governed by the principle that head movements are more rapid and unpredictable than pointing and clicking with a mouse.
To meet the need of a low-cost (performance cost), realistic hair- or grass-rendering technology, NVIDIA came up with Turf Effects. NVIDIA PhysX also got a much needed feature-set update that introduces new gas dynamics and fluid adhesion effects. Epic's Unreal Engine 4 will implement the technology.
GeForce Experience
With its GeForce 320.18 WHQL drivers, NVIDIA released the first stable version of GeForce Experience. The application simplifies the process of configuring a game and is meant for PC gamers who aren't well-versed in all the necessary technobabble required to get a game to run at the best-possible settings with the hardware available to them. GeForce Experience is aptly named as it completes the experience of owning a GeForce graphics card; PCs, being the best-possible way to play video games, should not be any harder to use than gaming consoles.
NVIDIA Shadow Play
GeForce Experience Shadow Play is another feature NVIDIA recently debuted. Shadow Play lets you record gaming footage or stream content in real time, with a minimal performance drop to the game you're playing. The feature is handled by GeForce Experience, which lets you set hot-keys to toggle recording on the fly; or set output, format, quality, etc.
Unlike other apps, which record videos in loss-less AVI formats by tapping into the DirectX pipeline and clogging the system bus, disk, and memory with high bit-rate video streams, Shadow Play taps into a proprietary path that lets it copy the display output to the GPU's hardware H.264 encoder. This encoder neither strains the CPU nor the GPU's own unified shaders. Since the video stream being saved to a file comes out encoded, its bit-rate is infinitesimally lower than uncompressed AVI.
NVIDIA G-Sync
G-Sync is a new technology by NVIDIA that addresses monitor-stuttering issues that were deemed unsolvable. The GeForce GTX 980 features three full-size DisplayPort 1.2 connectors into which you can plug up to three G-Sync-ready monitors for your own fluid-smooth 3D Vision Surround setup.
For archaic reasons, such as essentially evolving off television sets, PC monitors feature fixed refresh-rates, the number of times a display refreshes what it is displaying per second. There's no technical reason why a modern flat-screen display should feature fixed refresh rates. Since it's the displays that dictate refresh rates, it has always been the GPU's job to ensure display output is fluid, which it did by deploying technologies such as V-Sync (vertical sync). Output won't appear fluid if a GPU sends out less frames per second than the display refresh rate, and artifacts, such as display tearing caused by parts of multiple frames overlapping, will appear if more frames per second are produced.
NVIDIA's solution to the problem is to kill fixed refresh rates on monitors, instead making them synchronize their refresh rates in real-time to the frame-rates generated by a GPU. Ever wonder why a movie watched in a theater feels more fluid at even 24 frames per second while a PC game being played at that frame-rate doesn't? It's because the monitor mandates that the GPU obey its refresh-rate. G-Sync tilts that equation and makes the monitor sync its refresh-rate to the GPU's frame-rates, so games will feel more fluid at frame-rates well below 60. To make this happen, NVIDIA developed hardware that resides inside the display—hardware that communicates with the GPU in real-time to coordinate G-Sync.
We've witnessed G-Sync with our own eyes at the London event and couldn't believe what we were seeing. Games (playable, so we could tell they weren't recordings) were butter-smooth and extremely fluid. At the demo, NVIDIA displayed games that were running at 40 to 59 FPS in a given scene, and it felt like a constant frame-rate throughout. NVIDIA obviously demonstrated cases where G-Sync unleashed its full potential—between 35 and 59 FPS. I am still a bit skeptical because it looks too good to be true, and I definitely look forward to testing G-Sync with my own setup and my own games for a complete assessment, mouse and keyboard included. One can't make a video recording of a display that's running G-Sync to show you. You really need to experience G-Sync first-hand to buy into the idea. NVIDIA's G-Sync will launch with a $100 price premium on monitors. That's not insignificant, but could also go down in the future. Also, from what I've seen, G-Sync promises lower framerates that look smoother, which means you no longer need an expensive card for 60 FPS—money you then put into a G-Sync-enabled monitor instead.