• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Khronos Group Releases OpenCL 3.0

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,732 (1.01/day)
Today, The Khronos Group, an open consortium of industry-leading companies creating advanced interoperability standards, publicly releases the OpenCL 3.0 Provisional Specifications. OpenCL 3.0 realigns the OpenCL roadmap to enable developer-requested functionality to be broadly deployed by hardware vendors, and it significantly increases deployment flexibility by empowering conformant OpenCL implementations to focus on functionality relevant to their target markets. OpenCL 3.0 also integrates subgroup functionality into the core specification, ships with a new OpenCL C 3.0 language specification, uses a new unified specification format, and introduces extensions for asynchronous data copies to enable a new class of embedded processors. The provisional OpenCL 3.0 specifications enable the developer community to provide feedback on GitHub before the specifications and conformance tests are finalized.



To cater to a widening diversity of OpenCL devices, OpenCL 3.0 makes all functionality beyond version 1.2 optional. All OpenCL 1.2 applications will continue to run unchanged on any OpenCL 3.0 device. All OpenCL 2.X features are coherently defined in the new unified specification, and current OpenCL 2.X implementations that upgrade to OpenCL 3.0 can continue to ship with their existing functionality with full backwards compatibility. All OpenCL 2.X API features can be queried, and OpenCL C 3.0 adds macros for querying optional language features.

"OpenCL is the most pervasive, cross-vendor, open standard for low-level heterogeneous parallel programming—widely used by applications, libraries, engines, and compilers that need to reach the widest range of diverse processors," said Neil Trevett, vice president at NVIDIA, president of the Khronos Group and OpenCL Working Group Chair. "OpenCL 2.X delivers significant functionality, but OpenCL 1.2 has proven itself as the baseline needed by all vendors and markets. OpenCL 3.0 integrates tightly organized optionality into the monolithic 2.2 specification, boosting deployment flexibility that will enable OpenCL to raise the bar on pervasively available functionality in future core specifications."

For C++ kernel development, the OpenCL Working Group has transitioned from the original OpenCL C++ kernel language, defined in OpenCL 2.2, to the 'C++ for OpenCL' community, open-source project supported by Clang. C++ for OpenCL provides compatibility with OpenCL C, enables developers to use most C++17 features in OpenCL kernels, and is compatible with any OpenCL 2.X or OpenCL 3.0 implementation that supports SPIR-V ingestion.

The Extended Asynchronous Copy and Asynchronous Work Group Copy Fence extensions released alongside OpenCL 3.0 enable efficient, ordered DMA transactions as first class citizens in OpenCL—ideal for Scratch Pad Memory based devices, which require fine-grained control over buffer allocation. These extensions are the first of significant upcoming advances in OpenCL to enhance support for embedded processors.

To accompany today's release, the OpenCL Working Group has updated its OpenCL Resource Guide to help computing specialists, developers and researchers of all skill levels effectively harness the power of OpenCL. The OpenCL Working Group will continuously evolve the guide and welcomes any feedback on how it can be improved via GitHub.

OpenCL 3.0 at IWOCL
OpenCL Working Group members will be participating in the Khronos Panel Session at the IWOCL / SYCLcon online conference on April 28 at 4 PM GMT. IWOCL / SYCLcon is the leading forum for high-performance computing specialists working with OpenCL, SYCL, Vulkan and SPIR-V, and registration is free.

Industry Support for OpenCL 3.0
"In recent years there has been an impressive adoption of OpenCL to drive heterogeneous processing systems within many market segments," said Andrew Richards, founder and CEO of Codeplay Software. "This update to OpenCL 3.0 brings important flexibility benefits that will allow many evolving industries, from AI and HPC to automotive, to focus on their specific requirements and embrace open standards. Codeplay is excited to enable hardware vendors to support OpenCL 3.0 and to take advantage of the flexibility provided in its ecosystem of software products."

Mark Butler, vice president of software engineering, Imagination Technologies, says; "With its focus on deployment flexibility, we see OpenCL 3.0 as an excellent step forward in providing critical features for developers, with the ability to add functionality over time. This really is a step forward for the OpenCL ecosystem, allowing developers to write portable applications that depend on widely accepted functionality. Currently shipping GPUs based on the PowerVR Rogue architecture will enjoy a significant feature uplift including SVM, Generic Address Space and Work-group Functions. Upon final release of the specification, Imagination will ship a conformant OpenCL 3.0 implementation with support extending across a wide range of PowerVR GPUs, including our latest offering with IMG A-Series."

"Intel strongly supports cross-architecture standards being driven across the compute ecosystem such as in OpenCL 3.0 and SYCL," said Jeff McVeigh, vice president, Intel Architecture, Graphics and Software. "Standards-based, unified programming models will enable efficiency and unleash creativity for our developers with the upcoming release of our new Xe GPU architecture."

"NVIDIA welcomes OpenCL 3.0's focus on defining a baseline to enable developer-critical functionality to be widely adopted in future versions of the specification," said Anshuman Bhat, compute product manager at NVIDIA. "NVIDIA will ship a conformant OpenCL 3.0 when the specification is finalized and we are working to define the Vulkan interop extension that, together with layered OpenCL implementations, will significantly increase deployment flexibility for OpenCL developers."

"OpenCL 3.0 is an important step forward in the drive to unlock greater performance and innovation across a broadening range of computing platforms and applications," said Balaji Calidas, director of engineering at Qualcomm. "The flexible extension model will help our customers and software partners take full advantage of the tremendous potential available in both our existing and future application processors. We are pleased to have had the opportunity to contribute to this specification and we look forward to supporting the final product."

"Many of our customers want a GPU programming language that runs on all devices, and with growing deployment in edge computing and mobile, this need is increasing," said Vincent Hindriksen, founder and CEO of Stream HPC. "OpenCL is the only solution for accessing diverse silicon acceleration and many key software stacks use OpenCL/SPIR-V as a backend. We are very happy that OpenCL 3.0 will drive even wider industry adoption, as it reassures our customers that their past and future investments in OpenCL are justified."

"OpenCL 3.0 has opened up a new chapter for the OpenCL API which has served as the standard GPGPU API during the past 10 years" said Weijin Dai, executive vice president and GM of Intellectual Property Division at VeriSilicon. "With the streamlined OpenCL 3.0 core feature set, OpenCL 3.0 will enable a whole new class of embedded devices to adopt OpenCL API for GPU Compute and ML/AI processing, and it will also pave the way forward for OpenCL to interop or layer with the Vulkan API. VeriSilicon will deploy OpenCL 3.0 implementations quickly on a broad range of our embedded GPU and VIP products to enable our customers to develop new sets of GPGPU/ML/AI applications with the OpenCL 3.0 API."

About OpenCL
OpenCL (Open Computing Language) is an open, royalty-free standard for cross-platform, parallel programming of diverse, heterogeneous accelerators found in supercomputers, cloud servers, personal computers, mobile devices and embedded platforms. OpenCL greatly improves the speed and responsiveness of a wide spectrum of applications in numerous market categories including professional creative tools, scientific and medical software, vision processing, and neural network training and inferencing.

About Khronos
The Khronos Group is an open, non-profit, member-driven consortium of over 150 industry-leading companies creating advanced, royalty-free, interoperability standards for 3D graphics, augmented and virtual reality, parallel programming, vision acceleration and machine learning. Khronos activities include Vulkan, OpenGL, OpenGL ES, WebGL, SPIR-V, OpenCL, SYCL, OpenVX, NNEF, OpenXR, 3D Commerce, ANARI, and glTF. Khronos members drive the development and evolution of Khronos specifications and are able to accelerate the delivery of cutting-edge platforms and applications through early access to specification drafts and conformance tests.

View at TechPowerUp Main Site
 
Joined
Aug 8, 2019
Messages
430 (0.22/day)
System Name R2V2 *In Progress
Processor Ryzen 7 2700
Motherboard Asrock X570 Taichi
Cooling W2A... water to air
Memory G.Skill Trident Z3466 B-die
Video Card(s) Radeon VII repaired and resurrected
Storage Adata and Samsung NVME
Display(s) Samsung LCD
Case Some ThermalTake
Audio Device(s) Asus Strix RAID DLX upgraded op amps
Power Supply Seasonic Prime something or other
Software Windows 10 Pro x64
Nvidia actually supporting 3.0? Ohhh...

Wait only need to support 1.2 to be conformant or so the language seems to suggest, as 2.x and 3.0 features are optional.

OpenCL is getting feature levels!

I mean if NV doesn't do the OpenCL 3.0 feature level 1.2, and has full support for 3.0 in actual products, I'll be happily surprised.
 
Joined
Jan 8, 2017
Messages
9,606 (3.27/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Nvidia actually supporting 3.0?

Doubt it, OpenCL 2.0 was in beta for what, 3 years ? Never became a thing and never will, it will likely be the same with 3.0.
 

ARF

Joined
Jan 28, 2020
Messages
4,670 (2.56/day)
Location
Ex-usa | slava the trolls
The entire article doesn't mention AMD a single time. So what does AMD think about OpenCL 3.0?
I bet AMD will be the main driving force for upcoming wide support.
 
Joined
Aug 8, 2019
Messages
430 (0.22/day)
System Name R2V2 *In Progress
Processor Ryzen 7 2700
Motherboard Asrock X570 Taichi
Cooling W2A... water to air
Memory G.Skill Trident Z3466 B-die
Video Card(s) Radeon VII repaired and resurrected
Storage Adata and Samsung NVME
Display(s) Samsung LCD
Case Some ThermalTake
Audio Device(s) Asus Strix RAID DLX upgraded op amps
Power Supply Seasonic Prime something or other
Software Windows 10 Pro x64
Doubt it, OpenCL 2.0 was in beta for what, 3 years ? Never became a thing and never will, it will likely be the same with 3.0.

I wonder after the supercomputer design wins for AMD. Giving OpenCL a real shot in the arm.

Though the release hints that 2.0 is basically dead, because you only need to support 1.2 and one 3.0 call to get 3.0 support... :laugh:

Actually I see NV supporting OCL 3.0 on the compute cards, while the consumer cores will continue limping along with gimped compute performance and likely support.
 

ARF

Joined
Jan 28, 2020
Messages
4,670 (2.56/day)
Location
Ex-usa | slava the trolls
I wonder after the supercomputer design wins for AMD. Giving OpenCL a real shot in the arm.

Though the release hints that 2.0 is basically dead, because you only need to support 1.2 and one 3.0 call to get 3.0 support... :laugh:

Actually I see NV supporting OCL 3.0 on the compute cards, while the consumer cores will continue limping along with gimped compute performance and likely support.

Latest version is 2.2.
Nvidia has a closed proprietary ecosystem with CUDA, if you haven't forgotten.
 
Joined
Oct 28, 2012
Messages
1,239 (0.28/day)
Processor AMD Ryzen 3700x
Motherboard asus ROG Strix B-350I Gaming
Cooling Deepcool LS520 SE
Memory crucial ballistix 32Gb DDR4
Video Card(s) RTX 3070 FE
Storage WD sn550 1To/WD ssd sata 1To /WD black sn750 1To/Seagate 2To/WD book 4 To back-up
Display(s) LG GL850
Case Dan A4 H2O
Audio Device(s) sennheiser HD58X
Power Supply Corsair SF600
Mouse MX master 3
Keyboard Master Key Mx
Software win 11 pro
mmmh. I thought that apple giving up on open cl and promoting metal was because Nvidia made developing open cl such a pain, that devs chooses CUDA instead. In the cg industry, arnorld, renderman (soon XPU)
, octane, and redshift are the most popular renderer and all of them either get features that can't work with amd, or can't work at all. And apple convinced redshift and octane to work with metal, so I don't think that AMD will get competitive in that sector anytime soon :/
 
Last edited:
Joined
Apr 24, 2020
Messages
2,792 (1.61/day)
I wonder after the supercomputer design wins for AMD. Giving OpenCL a real shot in the arm.

I asked some supercomputer guys and they seem to be using OpenAAC, OpenMP, and CUDA. They don't seem to be interested in OpenCL. Obviously, this is a sample-size of 1, but its something to think about.

In fact, they were more interested in ROCm / HIP (AMD's somewhat CUDA-compatible layer) than OpenCL.

Nvidia has a closed proprietary ecosystem with CUDA, if you haven't forgotten.

Closed, but highly advanced. Thrust, CUB, Cooperative Groups. Nearly full C++ compatibility on the device side (including support for classes, structures, and shared pointers between host / device).

CUDA is used for a reason. Because its way easier to program and optimize than OpenCL. AMD's ROCm / HIP stuff is similarly easier to use than OpenCL in my experience. OpenCL can share pointers with SVM, but with different compilers, there's no guarantee that your classes or structures line up.

CUDA (and AMD's ROCm/HIP) have a further guarantee: the device AND host code go through the same LLVM compiler simultaneously. All alignment and padding between the host and device will be identical and compatible.
 

bug

Joined
May 22, 2015
Messages
13,960 (3.95/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
Well, if they kept it backwards compatible, I expect it will see the same "wide adoption" as its predecessors.

Latest version is 2.2.
Nvidia has a closed proprietary ecosystem with CUDA, if you haven't forgotten.
Only the CUDA implementation is proprietary, the ecosystem is full of open source apps built on top of that.
 
Joined
Aug 8, 2019
Messages
430 (0.22/day)
System Name R2V2 *In Progress
Processor Ryzen 7 2700
Motherboard Asrock X570 Taichi
Cooling W2A... water to air
Memory G.Skill Trident Z3466 B-die
Video Card(s) Radeon VII repaired and resurrected
Storage Adata and Samsung NVME
Display(s) Samsung LCD
Case Some ThermalTake
Audio Device(s) Asus Strix RAID DLX upgraded op amps
Power Supply Seasonic Prime something or other
Software Windows 10 Pro x64
Well, if they kept it backwards compatible, I expect it will see the same "wide adoption" as its predecessors.


Only the CUDA implementation is proprietary, the ecosystem is full of open source apps built on top of that.

Open source apps built on a closed source API.

It's open as long as you agree not to attempt to use it on anything that's not certified by NV. Attempts to make CUDA run on anything else will get you ripped to shreds by NV's lawyers.

Sign away your rights and your life, and you can see what makes CUDA tick. Calling CUDA open is at best misleading and in reality a delusion. :laugh:

Note: You can clean room a solution, like making an IBM PC compatible BIOS, but CUDA is far more complicated and AMD's valiant efforts are still limited to older versions and the compatibility of the translator is limited.
 
Last edited:
Joined
Mar 24, 2012
Messages
533 (0.11/day)
Open source apps built on a closed source API.

It's open as long as you agree not to attempt to use it on anything that's not certified by NV. Attempts to make CUDA run on anything else will get you ripped to shreds by NV's lawyers.

Sign away your rights and your life, and you can see what makes CUDA tick. Calling CUDA open is at best misleading and in reality a delusion. :laugh:

Note: You can clean room a solution, like making an IBM PC compatible BIOS, but CUDA is far more complicated and AMD's valiant efforts are still limited to older versions and the compatibility of the translator is limited.
If that's the case then AMD would be on court right now with their boltzman initiative. Maybe qualcomm as well.
 
Joined
Dec 22, 2011
Messages
3,890 (0.81/day)
Processor AMD Ryzen 7 3700X
Motherboard MSI MAG B550 TOMAHAWK
Cooling AMD Wraith Prism
Memory Team Group Dark Pro 8Pack Edition 3600Mhz CL16
Video Card(s) NVIDIA GeForce RTX 3080 FE
Storage Kingston A2000 1TB + Seagate HDD workhorse
Display(s) Samsung 50" QN94A Neo QLED
Case Antec 1200
Power Supply Seasonic Focus GX-850
Mouse Razer Deathadder Chroma
Keyboard Logitech UltraX
Software Windows 11
This was always going to turn into an AMD love fest, but their OpenGL support was shit and CUDA is king.
 
Joined
Aug 20, 2007
Messages
21,632 (3.40/day)
Location
Olympia, WA
System Name Pioneer
Processor Ryzen 9 9950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon, Phanteks and Corsair Maglev blower fans...
Memory 64GB (2x 32GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage Intel 5800X Optane 800GB boot, +2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software Gentoo Linux x64 / Windows 11 Enterprise IoT 2024
OpenGL support was shit

Was? Did it ever stop?

To my knowledge, the only place that ever got fixed was in linux, and by open source devs, not AMD.

Calling CUDA open is at best misleading and in reality a delusion.

Few languages are truly open, but the ecosystems are open, which is all he was claiming.
 

bug

Joined
May 22, 2015
Messages
13,960 (3.95/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
Open source apps built on a closed source API.
That right there is your mistake. The API is just the top-most layer of the closed source implementation. In order for others to use your API, the API is usually open (as is the case here).
Besides bringing the SJW side in some, the actual implementation being closed sourced is of little consequence in this case. It's not like 3rd parties know Nvidia's hardware better then Nvidia so they could improve upon the implementation. Sure, it's nice to be able to browse the sources to better understand how it works and debug. But in this particular case, closed-source is not the end of the world.
I mean, open source is always better. But for compute, the open initiatives are shunned by users, so like it or not, many of the AI things you read about today, are made possible by CUDA.
 
Joined
Apr 24, 2020
Messages
2,792 (1.61/day)
Open source apps built on a closed source API.

It's open as long as you agree not to attempt to use it on anything that's not certified by NV. Attempts to make CUDA run on anything else will get you ripped to shreds by NV's lawyers.

Sign away your rights and your life, and you can see what makes CUDA tick. Calling CUDA open is at best misleading and in reality a delusion. :laugh:

Note: You can clean room a solution, like making an IBM PC compatible BIOS, but CUDA is far more complicated and AMD's valiant efforts are still limited to older versions and the compatibility of the translator is limited.

Then use OpenMP 4.5 "target" code.


Open source (CLang / GCC support), single-source compilation, device acceleration.

Code:
#pragma omp target
#pragma omp parallel for private(i)
    for (i=0; i<N; i++) p[i] = v1[i]*v2[i];

"Target" says run this on the GPU. "Parallel For" is an older OpenMP construct, saying that each iteration should be run in parallel. "private(i)" says that the variable "i" is per-thread private. Not sure if the data-transfer over PCIe is fast enough? Then make it CPU-parallel instead:

Code:
#pragma omp parallel for private(i)
    for (i=0; i<N; i++) p[i] = v1[i]*v2[i];

Bam, now the code is CPU parallel. Wait, but you're running on an AMD EPYC with a weird cache-hierarchy, sets of independent L3s across NUMA domains and you want the data to be NUMA-aware, PCIe-aware, and execute on the GPU closest to each individual NUMA node?

Code:
#pragma omp target teams distribute for private(i)
    for (i=0; i<N; i++) p[i] = v1[i]*v2[i];

Yeah. Its that easy.
 
Last edited:
Top