News Posts matching #C++

Return to Keyword Browsing

Intel Releases Quantum Software Development Kit Version 1.0 to Grow Developer Ecosystem

After launching its beta version in September 2022, Intel today released version 1.0 of the Intel Quantum Software Development Kit (SDK). The SDK is a full quantum computer in simulation that can also interface with Intel's quantum hardware, including Intel's Horse Ridge II control chip and Intel's quantum spin qubit chip when it becomes available this year. The kit allows developers to program quantum algorithms in simulation, and it features an intuitive programming interface written in C++ using an industry-standard low-level virtual machine (LLVM) compiler toolchain. As a result, Intel's SDK offers seamless interfacing with C/C++ and Python applications, making it more versatile and customizable.

"The Intel Quantum SDK helps programmers get ready for future large-scale commercial quantum computers. It will not only help developers learn how to create quantum algorithms and applications in simulation, but it will also advance the industry by creating a community of developers that will accelerate the development of applications, so they are ready when Intel's quantum hardware becomes available," said Anne Matsuura, director of Quantum Applications & Architecture, Intel Labs.

Intel Publishes Sorting Library Powered by AVX-512, Offers 10-17x Speed Up

Intel has recently updated its open-source C++ header file library for high-performance SIMD-based sorting to support the AVX-512 SIMD instruction set. Extending the capability of regular AVX2 support, the sorting functions now implement 512-bit extensions to offer greater performance. According to Phoronix, the NumPy Python library for mathematics that underpins a lot of software has updated its software base to use the AVX-512 boosted sorting functionality that yields a fantastic uplift in performance. The library uses AVX-512 to vectorize the quicksort for 16-bit and 64-bit data types using the extended instruction set. Benchmarked on an Intel Tiger Lake system, the NumPy sorting saw a 10-17x increase in performance.

Intel's engineer Raghuveer Devulapalli changed the NumPy code, which was merged into the NumPy codebase on Wednesday. Regarding individual data types, the new implementation increases 16-bit int sorting by 17x and 32-bit data type sorting by 12-13x, while float 64-bit sorting for random arrays has experienced a 10x speed up. Using the x86-simd-sort code, this speed-up shows the power of AVX-512 and its capability to enhance the performance of various libraries. We hope to see more implementations of AVX-512, as AMD has joined the party by placing AVX-512 processing elements on Zen 4.

AMD WMMA Instruction is Direct Response to NVIDIA Tensor Cores

AMD's RDNA3 graphics IP is just around the corner, and we are hearing more information about the upcoming architecture. Historically, as GPUs advance, it is not unusual for companies to add dedicated hardware blocks to accelerate a specific task. Today, AMD engineers have updated the backend of the LLVM compiler to include a new instruction called Wave Matrix Multiply-Accumulate (WMMA). This instruction will be present on GFX11, which is the RDNA3 GPU architecture. With WMMA, AMD will offer support for processing 16x16x16 size tensors in FP16 and BF16 precision formats. With these instructions, AMD is adding new arrangements to support the processing of matrix multiply-accumulate operations. This is closely mimicking the work NVIDIA is doing with Tensor Cores.

AMD ROCm 5.2 API update lists the use case for this type of instruction, which you can see below:
rocWMMA provides a C++ API to facilitate breaking down matrix multiply accumulate problems into fragments and using them in block-wise operations that are distributed in parallel across GPU wavefronts. The API is a header library of GPU device code, meaning matrix core acceleration may be compiled directly into your kernel device code. This can benefit from compiler optimization in the generation of kernel assembly and does not incur additional overhead costs of linking to external runtime libraries or having to launch separate kernels.

rocWMMA is released as a header library and includes test and sample projects to validate and illustrate example usages of the C++ API. GEMM matrix multiplication is used as primary validation given the heavy precedent for the library. However, the usage portfolio is growing significantly and demonstrates different ways rocWMMA may be consumed.

AMD Releases FidelityFX Super Resolution 2.0 Source Code Through GPUOpen

Today marks a year since gamers could try out AMD FidelityFX Super Resolution technology for themselves with our spatial upscaler - FSR 1. With the introduction of FSR 2, our temporal upscaling solution earlier this year, there are now over 110 games that support FSR. The rate of uptake has been very impressive - FSR is AMD's fastest adopted software gaming technology to date.

So it seems fitting that we should pick this anniversary day to share the source code for FSR 2, opening up the opportunity for every game developer to integrate FSR 2 if they wish, and add their title to the 24 games which have already announced support. As always, the source code is being made available via GPUOpen under the MIT license, and you can now find links to it on our dedicated FSR 2 page.

NVIDIA DLSS Source Code Leaked

The mother of all cyberattacks hit NVIDIA over the weekend, putting out critical driver source-code, the ability to disable LHR for mining, and even insights into future NVIDIA hardware, such as the Blackwell architecture. An anonymous tipster sent us this screenshot showing a list of files they claim are the source-code of DLSS.

The list, which looks credible enough, includes C++ files, headers, and assets that make up DLSS. There is also a super-convenient "Programming Guide" document to help developers make sense of the code and build correctly. Our tipsters who sent this screenshot are examining the code to see the inner workings of DLSS, and whether there's any secret sauce. Do note that this is DLSS version 2.2, so a reasonably recent version including the latest DLSS 2.2 changes. This code leak could hold the key for the open-source Linux driver community to bring DLSS to the platform, or even AMD and Intel learning from its design. Stealing Intellectual Property is a big deal of course and NVIDIA's lawyers will probably be busy picking apart every new innovation from their competitors, but ultimately it'll be hard to prove in a court of law.

Intel Releases oneAPI 2022 Toolkits to Developers

Intel today released oneAPI 2022 toolkits. Newly enhanced toolkits expand cross-architecture features to provide developers greater utility and architectural choice to accelerate computing. "I am impressed by the breadth of more than 900 technical improvements that the oneAPI software engineering team has done to accelerate development time and performance for critical application workloads across Intel's client and server CPUs and GPUs. The rich set of oneAPI technologies conforms to key industry standards, with deep technical innovations that enable applications developers to obtain the best possible run-time performance from the cloud to the edge. Multi-language support and cross-architecture performance acceleration are ready today in our oneAPI 2022 release to further enable programmer productivity on Intel platforms," said Greg Lavender, Intel chief technology officer, senior vice president and general manager of the Software and Advanced Technology Group.

New capabilities include the world's first unified compiler implementing C++, SYCL and Fortran, data parallel Python for CPUs and GPUs, advanced accelerator performance modeling and tuning, and performance acceleration for AI and ray tracing visualization workloads. The oneAPI cross-architecture programming model provides developers with tools that aim to improve the productivity and velocity of code development when building cross-architecture applications.

Xilinx Partners with Samsung to Develop SmartSSD CSD

Xilinx, Inc. and Samsung Electronics Co., Ltd. today announced the availability of the Samsung SmartSSD Computational Storage Drive (CSD). Powered by Xilinx FPGAs, the SmartSSD CSD is the industry's first adaptable computational storage platform providing the performance, customization, and scalability required by data-intensive applications.

Xilinx will showcase the SmartSSD CSD and partner solutions at the Flash Memory Summit Virtual Conference and Expo taking place November 10-12. The SmartSSD CSD is a flexible, programmable storage platform that developers can use to create a variety of unique and scalable accelerators that solve a broad range of data center problems. It empowers a new breed of software developers to easily build innovative hardware-accelerated solutions in familiar high-level languages. The SmartSSD CSD accelerates data processing performance by 10x or more for applications such as database management, video processing, artificial intelligence layers, complex search, and virtualization.

Intel Partners with Heidelberg University Computing Center to Establish oneAPI Academic Center of Excellence

Intel and Heidelberg University Computing Center (URZ) today announced that they have established oneAPI Academic Center of Excellence (CoE) at UZR. The newly established CoE has a goal to further develop Intel's oneAPI standard and enable it to work on AMD GPUs. This information is a bit shocking, however, Intel believes that the technology should work on a wide range of processors, no matter the vendor. The heterogeneous hardware programming is the main goal here. In a Twitter thread, an Intel employee specifies that Intel has also been working with Arm and NVIDIA to bring Data-Parallel C++ (DPC++), a core of oneAPI, to those vendors as well. That should bring this universal programming model to every device and adapt to every platform, which is a goal of heterogeneous programming - whatever you need to program a CPU, GPU, or some other ASIC, it is covered by a single API, specifically oneAPI.
UZRURZ's work as a oneAPI CoE will add advanced DPC++ capabilities into hipSYCL, which supports systems based on AMD GPUs, NVIDIA GPUs, and CPUs. New DPC++ extensions are part of the SYCL 2020 provisional specification that brings features such as unified shared memory to hipSYCL and the platforms it supports - furthering the promise of oneAPI application support across system architectures and vendors.

Intel Contributes Advanced oneAPI DPC++ Capabilities to the SYCL 2020 Provisional Spec

Today, The Khronos Group, an open consortium of industry-leading companies creating graphics and compute interoperability standards, announced its SYCL 2020 Provisional Specification, for which Intel has made significant contributions through new programming abstractions. These new capabilities accelerate heterogeneous parallel programming for high-performance computing (HPC), machine learning and compute-intensive applications.

"The SYCL 2020 Provisional Specification marks a significant milestone helping improve time-to-performance in programming heterogeneous computing systems through more productive and familiar C++ programming constructs," said Jeff McVeigh, vice president of Datacenter XPU Products and Solutions at Intel Corporation. "Through active collaboration with The Khronos Group, the new specification includes significant features pioneered in oneAPI's Data Parallel C++, such as unified shared memory, group algorithms and sub-groups that were up-streamed to SYCL 2020. Moving forward, Intel's oneAPI toolkits, which include the SYCL-based Intel oneAPI DPC++ Compiler, will deliver productivity and performance for open, cross-architecture programming."

Khronos Group Releases SYCL 2020 Provisional Specification

Today, The Khronos Group, an open consortium of industry-leading companies creating graphics and compute interoperability standards, announces the ratification and public release of the SYCL 2020 Provisional Specification. SYCL is a standard C++ based heterogeneous parallel programming framework for accelerating High Performance Computing (HPC), machine learning, embedded computing, and compute-intensive desktop applications on a wide range of processor architectures, including CPUs, GPUs, FPGAs, and AI processors.The SYCL 2020 Provisional Specification is publicly available today to enable feedback from developers and implementers before the eventual specification finalization and release of the SYCL 2020 Adopters Program, which will enable implementers to be officially conformant—tentatively expected by the end of the year.

A royalty-free open standard, SYCL 2020 enables significant programmer productivity through an expressive domain-specific language, compact code, and simplified common patterns, such as Class Template Argument Deduction and Deduction Guides, all while preserving significant backwards compatibility with previous versions. SYCL 2020 is based on C++17 and includes new programming abstractions, such as unified shared memory, reductions, group algorithms, and sub-groups to enable high-performance applications across diverse hardware architectures.

Khronos Group Releases OpenCL 3.0

Today, The Khronos Group, an open consortium of industry-leading companies creating advanced interoperability standards, publicly releases the OpenCL 3.0 Provisional Specifications. OpenCL 3.0 realigns the OpenCL roadmap to enable developer-requested functionality to be broadly deployed by hardware vendors, and it significantly increases deployment flexibility by empowering conformant OpenCL implementations to focus on functionality relevant to their target markets. OpenCL 3.0 also integrates subgroup functionality into the core specification, ships with a new OpenCL C 3.0 language specification, uses a new unified specification format, and introduces extensions for asynchronous data copies to enable a new class of embedded processors. The provisional OpenCL 3.0 specifications enable the developer community to provide feedback on GitHub before the specifications and conformance tests are finalized.
OpenCL

VUDA is a CUDA-Like Programming Interface for GPU Compute on Vulkan (Open-Source)

GitHub developer jgbit has started an open-source project called VUDA, which takes inspiration from NVIDIA's CUDA API to bring an easily accessible GPU compute interface to the open-source world. VUDA is implemented as wrapper on top of the highly popular next-gen graphics API Vulkan, which provides low-level access to hardware. VUDA comes as header-only C++ library, which means it's compatible with all platforms that have a C++ compiler and that support Vulkan.

While the project is still young, its potential is enormous, especially due to the open source nature (using the MIT license). The page on GitHub comes with a (very basic) sample, that could be a good start for using the library.

Creative Launches Aurora Reactive SDK for Sound BlasterX Products

Creative Technology Ltd today announced that it would be launching the Aurora Reactive SDK. This tool would effectively convert the Aurora Reactive Lighting System found on Sound BlasterX products into an open platform, allowing developers the freedom to customize, animate and synchronize its lighting behavior. The 16.8 million color Aurora Reactive Lighting System is currently found on the Sound BlasterX Katana, Vanguard K08, Siege M04, AE-5, and Kratos S5.

The Aurora Reactive SDK is a system with APIs (Application Programming Interfaces) that allow third party developers to program Creative's Sound BlasterX RGB-enabled hardware. The SDK will come complete with sample codes, an API library, and documentation to enable even novice programmers to get started.

AMD Ryzen-optimized C and C++ Compilers Improve Performance

AMD followed up its Ryzen processor launch with support for the software development ecosystem by releasing special C and C++ compilers that let you make software that can fully take advantage of the "Zen" micro-architecture. The new AOCC 1.0 C/C++ compilers by AMD are based on LLVM Clang, with "Zen" specific patches. AMD claims AOCC offers improved vectorization and better code generation for "Zen" based CPUs. It also includes a "Zen" optimized linker.

Phoronix benchmarked AOCC against other more common compilers such as GCC 6.3, GCC 7.1, GCC 8, LLVM Clang 4.0, and LLVM Clang 5.0 using a Ryzen 7-1700 eight-core processor powered machine, running on Ubuntu 17.04 Linux, and found that AOCC offers higher performance than GCC in most cases, LLVM Clang in some cases, and marginally higher performance than LLVM Clang in some cases. Find more results in the link below.

AMD Open Sources Professional GPU-Optimized Photorealistic Renderer

AMD today announced that its powerful physically-based rendering engine is becoming open source, giving developers access to the source code. As part of GPUOpen, Radeon ProRender (formerly previewed as AMD FireRender) enables creators to bring ideas to life through high-performance applications and workflows enhanced by photorealistic rendering. Alongside Radeon ProRender, developers also have access to Radeon Rays on GPUOpen.com, a high-efficiency, high-performance, heterogeneous ray tracing intersection library for GPU, CPU or APU on virtually any platform. GPUOpen is an AMD initiative designed to assist developers in creating ground-breaking games, professional graphics applications and GPU computing applications with superior performance and lifelike experiences, using no-cost open development tools and software.

Radeon ProRender plugins are available today for many popular 3D content creation applications, including Autodesk 3ds Max, SOLIDWORKS by Dassault Systèmes and Rhino, with Autodesk Maya coming soon. Radeon ProRender works across Windows, OS X and Linux, and supports AMD GPUs, CPUs and APUs as well as those of other vendors.

HSA Announces Publication of New Guide to Heterogeneous System Architecture

The Heterogeneous System Architecture (HSA) Foundation today announced publication of Heterogeneous System Architecture: A New Compute Platform Infrastructure (1st Edition), edited by Dr. Wen-Mei Hwu. The book, published by Elsevier Publishing (found here: here), offers a practical guide to understanding HSA, a standardized platform design that unlocks the performance and power efficiency of parallel computing engines found in most modern electronic devices.

"Heterogeneous computing is a key enabler of the next generation of compute environments, wherein entire systems will interconnect autonomously and in real time," said HSA Foundation President Dr. John Glossner. "Developers who are skilled in the use of this platform will have the upper hand in terms of design time, IP portability, power efficiency and performance."

To support these developers, the HSA Foundation working groups are rapidly standardizing tools and APIs for debug and profiling, creating guidelines for incorporating IP from multiple vendors into the same SoC, and much more. The Foundation released the v1.0 specification in March, and soon thereafter, companies including AMD, ARM, Imagination Technologies and MediaTek previewed their plans for rolling out the world's first products based on HSA.

App Claims to Blunt Intel's Compiler Edge on AMD Machines

A ominously named app claims to boost certain apps performance on AMD processors. Called "Intel Compiler Patcher," this app scans your machine for apps developed using Intel C++ compilers, and patches them to work better on non-Intel CPU platforms (namely AMD). The idea (suspicion rather), is that apps developed with Intel C++ compilers give modern AMD CPUs a performance disadvantage. The following is how the developer describes the app works:
The compiler or library can make multiple versions of a piece of code, each optimized for a certain processor and instruction set, for example SSE2, SSE3, etc. The system includes a function that detects which type of CPU it is running on and chooses the optimal code path for that CPU. This is called a CPU dispatcher. However, the Intel CPU dispatcher does not only check which instruction set is supported by the CPU, it also checks the vendor ID string. If the vendor string says "GenuineIntel" then it uses the optimal code path. If the CPU is not from Intel then, in most cases, it will run the slowest possible version of the code, even if the CPU is fully compatible with a better version.
We don't have an AMD machine at hand to put our benches ourselves, and so we invite AMD CPU users from our community to post their results by using this "patcher" at their own risk.

DOWNLOAD: Intel Compiler Patcher

AMD Launches The "Boltzmann Initiative," Brings NVIDIA CUDA to FirePro

Building on its strategic investments in heterogeneous system architecture (HSA), AMD (NASDAQ: AMD) announced a suite of tools designed to ease development of high-performance, energy efficient heterogeneous computing systems. The "Boltzmann Initiative" leverages HSA's ability to harness both central processing units (CPU) and AMD FirePro graphics processing units (GPU) for maximum compute efficiency through software. The first results of the initiative are featured this week at SC15 and include the Heterogeneous Compute Compiler (HCC); a headless Linux driver and HSA runtime infrastructure for cluster-class, High Performance Computing (HPC); and the Heterogeneous-compute Interface for Portability (HIP) tool for porting CUDA-based applications to a common C++ programming model. The tools are designed to drive application performance across markets ranging from machine learning to molecular dynamics, and from oil and gas to visual effects and computer-generated imaging.

"AMD's Heterogeneous-compute Interface for Portability enables performance portability for the HPC community. The ability to take code that was written for one architecture and transfer it to another architecture without a negative impact on performance is extremely powerful," said Jim Belak, co-lead of the U.S. Department of Energy's Exascale Co-design Center in Extreme Materials and senior computational materials scientist at Lawrence Livermore National Laboratory. "The work AMD is doing to produce a high-performance compiler that sits below high-level programming models enables researchers to concentrate on solving problems and publishing groundbreaking research rather than worrying about hardware-specific optimizations."

AMD Announces Heterogeneous C++ AMP Language for Developers

AMD in collaboration with Microsoft today announced the release of C++ AMP version 1.2 -- an open source C++ compiler which implements version 1.2 of the open specification for C++ AMP, available on both Linux and Windows for the first time. The release represents another step forward toward AMD's goal of supporting cross-platform solutions, multiple programming languages and continued contributions to the open source community. The tool, which leverages Clang and LLVM, accelerates productivity and ease of use for developers wishing to harness the full power of modern heterogeneous platforms spanning servers, PCs and handheld devices.

"AMD has a consistent track record of enriching the developer experience, and we're proud to make the first open source implementation of C++ AMP available to enable greater performance and more power-efficient applications," said Manju Hegde, corporate vice president, Heterogeneous Applications and Solutions, AMD. "The cross-platform release is another step in strengthening AMD's developer solutions, allowing for increased productivity and accelerated applications through shared physical memory across the CPU and GPU on both Linux and Windows."
Return to Keyword Browsing
Jan 20th, 2025 17:02 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts