Wednesday, May 11th 2022

NVIDIA Releases Open-Source GPU Kernel Modules

NVIDIA is now publishing Linux GPU kernel modules as open source with dual GPL/MIT license, starting with the R515 driver release. You can find the source code for these kernel modules in the NVIDIA Open GPU Kernel Modules repo on GitHub. This release is a significant step toward improving the experience of using NVIDIA GPUs in Linux, for tighter integration with the OS and for developers to debug, integrate, and contribute back. For Linux distribution providers, the open-source modules increase ease of use.

They also improve the out-of-the-box user experience to sign and distribute the NVIDIA GPU driver. Canonical and SUSE are able to immediately package the open kernel modules with Ubuntu and SUSE Linux Enterprise Distributions. Developers can trace into code paths and see how kernel event scheduling is interacting with their workload for faster root cause debugging. In addition, enterprise software developers can now integrate the driver seamlessly into the customized Linux kernel configured for their project.
This will further help improve NVIDIA GPU driver quality and security with input and reviews from the Linux end-user community. With each new driver release, NVIDIA publishes a snapshot of the source code on GitHub. Community submitted patches are reviewed and if approved, integrated into a future driver release.

Supported functionality

The first release of the open GPU kernel modules is R515. Along with the source code, fully-built and packaged versions of the drivers are provided.

For data center GPUs in the NVIDIA Turing and NVIDIA Ampere architecture families, this code is production ready. This was made possible by the phased rollout of the GSP driver architecture over the past year, designed to make the transition easy for NVIDIA customers. We focused on testing across a wide variety of workloads to ensure feature and performance parity with the proprietary kernel-mode driver.

In the future, functionality such as HMM will be a foundational component for confidential computing on the NVIDIA Hopper architecture.

In this open-source release, support for GeForce and Workstation GPUs is alpha quality. GeForce and Workstation users can use this driver on Turing and NVIDIA Ampere architecture GPUs to run Linux desktops and use features such as multiple displays, G-SYNC, and NVIDIA RTX ray tracing in Vulkan and NVIDIA OptiX. Users can opt in using the kernel module parameter NVreg_EnableUnsupportedGpus as highlighted in the documentation. More robust and fully featured GeForce and Workstation support will follow in subsequent releases and the NVIDIA Open Kernel Modules will eventually supplant the closed-source driver.

Customers with Turing and Ampere GPUs can choose which modules to install. Pre-Turing customers will continue to run the closed source modules.

The open-source kernel-mode driver works with the same firmware and the same user-mode stacks such as CUDA, OpenGL, and Vulkan. However, all components of the driver stack must match versions within a release. For instance, you cannot take a release of the source code, build, and run it with the user-mode stack from a previous or future release.

Refer to the driver README document for instructions on installing the right versions and additional troubleshooting steps.

Installation opt in

The R515 release contains precompiled versions of both the closed-source driver and the open-source kernel modules. These versions are mutually exclusive, and the user can make the choice at install time. The default option ensures that silent installs will pick the optimal path for NVIDIA Volta and older GPUs versus Turing+ GPUs.

Users can build kernel modules from the source code and install them with the relevant user-mode drivers.

Partner ecosystem

NVIDIA has been working with Canonical, Red Hat, and SUSE for better packaging, deployment, and support models for our mutual customers.

Canonical

"The new NVIDIA open-source GPU kernel modules will simplify installs and increase security for Ubuntu users, whether they're AI/ML developers, gamers, or cloud users," commented Cindy Goldberg, VP of Silicon alliances at Canonical. "As the makers of Ubuntu, the most popular Linux-based operating system for developers, we can now provide even better support to developers working at the cutting edge of AI and ML by enabling even closer integration with NVIDIA GPUs on Ubuntu."

In the coming months, the NVIDIA Open GPU kernel modules will make their way into the recently launched Canonical Ubuntu 22.04 LTS.

SUSE

"We at SUSE are excited that NVIDIA is releasing their GPU kernel-mode driver as open source. This is a true milestone for the open-source community and accelerated computing. SUSE is proud to be the first major Linux distribution to deliver this breakthrough with SUSE Linux Enterprise 15 SP4 in June. Together, NVIDIA and SUSE power your GPU-accelerated computing needs across cloud, data center, and edge with a secure software supply chain and excellence in support." — Markus Noga, General Manager, Business Critical Linux at SUSE

Red Hat

"Enterprise open source can spur innovation and improve customers' experience, something that Red Hat has always championed. We applaud NVIDIA's decision to open source its GPU kernel driver. Red Hat has collaborated with NVIDIA for many years, and we are excited to see them take this next step. We look forward to bringing these capabilities to our customers and to improve interoperability with NVIDIA hardware." — Mike McGrath, Vice President, Linux Engineering at Red Hat

Upstream approach

NVIDIA GPU drivers have been designed over the years to share code across operating systems, GPUs and Jetson SOCs so that we can provide a consistent experience across all our supported platforms. The current codebase does not conform to the Linux kernel design conventions and is not a candidate for Linux upstream.

There are plans to work on an upstream approach with the Linux kernel community and partners such as Canonical, Red Hat, and SUSE.

In the meantime, published source code serves as a reference to help improve the Nouveau driver. Nouveau can leverage the same firmware used by the NVIDIA driver, exposing many GPU functionalities, such as clock management and thermal management, bringing new features to the in-tree Nouveau driver.

Frequently asked questions

Where can I download the R515 driver?

You can download the R515 development driver as part of CUDA Toolkit 11.7, or from the driver downloads page under "Beta" drivers. The R515 data center driver will follow in subsequent releases per our usual cadence.

Can open GPU Kernel Modules be distributed?

Yes, the NVIDIA open kernel modules are licensed under a dual GPL/MIT license; and the terms of licenses govern the distribution and repackaging grants.

Will the source for user-mode drivers such as CUDA be published?

These changes are for the kernel modules; while the user-mode components are untouched. So the user-mode will remain closed source and published with pre-built binaries in the driver and the CUDA toolkit.

Which GPUs are supported by Open GPU Kernel Modules?

Open kernel modules support all Ampere and Turing GPUs. Datacenter GPUs are supported for production, and support for GeForce and Workstation GPUs is alpha quality. Please refer to the Datacenter, NVIDIA RTX, and GeForce product tables for more details (Turing and above have compute capability of 7.5 or greater).

How to report bugs

Problems can be reported through the GitHub repository issue tracker or through our existing end-user support forum. Please report security issues through the channels listed on the GitHub repository security policy.

What is the process for patch submission and SLA/CLA for patches?

We encourage community submissions through pull requests on the GitHub page. The submitted patches will be reviewed and if approved, integrated with possible modifications into a future driver release. Please refer to the NVIDIA driver lifecycle document.

The published source code is a snapshot generated from a shared codebase, so contributions may not be reflected as separate Git commits in the GitHub repo. We are working on a process for acknowledging community contributions. We also advise against making significant reformatting of the code for the same reasons.

The process for submitting pull requests is described on our GitHub page and such contributions are covered under the Contributor License Agreement.
Source: NVIDIA
Add your own comment

35 Comments on NVIDIA Releases Open-Source GPU Kernel Modules

#26
R-T-B
trparkyYou can put some of the blame on the Linux kernel itself, namely in the fact that every time you turn around, they change an internal API thus breaking shit. Now if only the kernel had a stable set external APIs that developers would be able to rely on not changing from this week to the next things would be great, but the Linux kernel community is allergic to this idea. Their answer to that issue is to just put your code in the mainline kernel tree and if things happen to break, we'll fix it for you.

But what if you don't want your code to be open source? Oops, sorry. We don't care about you.
And now that nvidia's kernel code is open source, this is a completely moot point really.

Anytime you have a closed binary targeting an actively updated kernel (windows kernel seldom updates) you were going to have a similar issue. It's progress vs stagnation at that point.
Posted on Reply
#27
trparky
R-T-BIt's progress vs stagnation at that point.
And then you have the old adage... If it ain't broke, don't fix it. If it works, great. Don't touch it!
Posted on Reply
#28
bug
trparkyAnd then you have the old adage... If it ain't broke, don't fix it. If it works, great. Don't touch it!
You're not a software developer, are you?
Posted on Reply
#29
trparky
bugYou're not a software developer, are you?
No, not professionally. I do tinker a bit and have written my own programs for my own needs but that's about it. The last big program I had that people were actively using has been retired for years.
Posted on Reply
#30
bug
trparkyNo, not professionally. I do tinker a bit and have written my own programs for my own needs but that's about it. The last big program I had that people were actively using has been retired for years.
Then let me explain why "if it works, don't fix it" doesn't work for software: developers come and go, in a few short years nobody will understand the code, even if it's still working. And working code still needs maintenance, be it for bug/security fixes or dealing with new CPU architectures.
It sucks, I know, but that's just a fact in the software development world. Every single project that I have worked on that was left as it is for 5+ years, inevitably ended up needing a complete rewrite.
Posted on Reply
#31
trparky
bugThen let me explain why "if it works, don't fix it" doesn't work for software: developers come and go, in a few short years nobody will understand the code, even if it's still working. And working code still needs maintenance, be it for bug/security fixes or dealing with new CPU architectures.
It sucks, I know, but that's just a fact in the software development world. Every single project that I have worked on that was left as it is for 5+ years, inevitably ended up needing a complete rewrite.
But that's why a good developer comments his or her code so that later, when someone new comes along, they know what the hell is going on.

That was one thing that they hammered into our heads back when I was taking beginning programming classes in college. Comment your code! And when you're done commenting, comment some more. Make it damned sure that anyone will be able to look at your code and tell what the hell you did.
Posted on Reply
#32
bug
trparkyBut that's why a good developer comments his or her code so that later, when someone new comes along, they know what the hell is going on.
That only works to a degree (and trust me, everything that's not open-source, it's usually very poorly documented). For example, it's hard to document business decisions (e.g. we're not going to support more than 16 channels here, because manager X said we don't need that in prod), because that kind of documentation does not belong in the code. And everything that's not in the code get stale and forgotten.
There's CMMI certifications to work around that, but very few companies bother with those if they're not doing something critical, like software for planes or nuclear power plants.
Posted on Reply
#33
trparky
No wonder why the same old tired security flaws keep coming back to bite us in the ass. You'd think that at some point we'd have fixed all the damn bugs already.
Posted on Reply
#34
bug
trparkyNo wonder why the same old tired security flaws keep coming back to bite us in the ass. You'd think that at some point we'd have fixed all the damn bugs already.
There's an additional reason for that. As language grow towards higher level, people just don't learn the basics anymore. For example, I'm working a lot with Java. And Java goes to great lengths to shield the programmer from the platform details and memory management. And when uncle Bob comes out with his "clean code" lecture, what do people understand? Write methods that are no more than dozen lines in length. If they're longer than that, break up the function. As if function calls were for free... Probably today half of the Java programmers don't understand how a function works anyway.
Posted on Reply
#35
fibre
trparkyYou can put some of the blame on the Linux kernel itself, namely in the fact that every time you turn around, they change an internal API thus breaking shit. Now if only the kernel had a stable set external APIs that developers would be able to rely on not changing from this week to the next things would be great, but the Linux kernel community is allergic to this idea. Their answer to that issue is to just put your code in the mainline kernel tree and if things happen to break, we'll fix it for you.

But what if you don't want your code to be open source? Oops, sorry. We don't care about you.
When Qualcomm drivers are already compiled, it's not only about API compatibility, but mainly about ABI compatibility which is even stricter. Since Linux kernel needs to evolve and the architecture is monolithic, it needs to break internal APIs, but keep external API (for userspace) stable. But it can't be that hard to assign two people to maintain drivers, right?
Posted on Reply
Add your own comment
Jan 30th, 2025 17:00 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts