Content

Overview of GPU APIs

April 23, 2025 - Daniel

How to program GPUs?

Graphics Processing Units (GPUs) are separate devices from the Central Processing Units (CPUs), so accessing them is not as trivial as writing some regular application code. We need to use Application Programming Interfaces (APIs) that communicate with the device driver, which, in turn, is a component interacting with the operating system and our target physical device. This means that all these various components need to be able to speak to each other, translating to compatibility requirements across them.

Assuming everything is set up correctly, there still remains an alienating number of choices among the various APIs in active use. In this post, we would like to give some overview, history and some hints on when to choose one API over another when starting a new project.

Brief history of GPU APIs

GPUs, as their name suggests, were originally designed for graphics tasks. Computer Graphics saw rapid development in the 1980s-1990s, on one hand with Computer-Generated Imagery (CGI) gaining popularity in the movie industry and on the other hand by the Personal Computer revolution bringing computer games to many households.

Thus, it is not surprising that the first standard that targeted what are today known as GPUs, was a graphics API, namely OpenGL developed by Silicon Graphics in the early 1990s. The aim was to accelerate, simplify and standardize graphics operations that were needed for professional visualization (2D) and computer graphics (3D). OpenGL aimed to be a cross-platform C API, but as being the pioneer, it went through vast changes over the years and decades. By the mid-2000s, its state-machine-like approach began to feel archaic. Its web spin-off, WebGL, however, became ubiquitous, enabling hardware acceleration on the web for a long time.

Microsoft launched a competitor, Direct3D (DirectX), for the Windows platform, catering to the game industry. This quickly developed into a significant and rapidly developing API, always bringing the latest features of vendors to be showcased with the newest games.

It was not until the middle of the 2000s that hardware accelerated programmable shading became a thing, and while enabling new possibilities in rendering, it also effectively opened a new field, allowing developers to utilize GPUs for arbitrary computations. This phenomenon became known as General Purpose Computing on GPUs (GPGPU). NVIDIA was here the first with their CUDA API to focus on accelerating general computations needed in scientific simulations and data processing. Thus, from this point on, we need to distinguish compute APIs from graphics APIs: the former is for running more general purpose programs, the latter is primarily for drawing content on the screen.

However, this distinction started to blur, as general computation started to appear in graphics APIs too under the name of compute shaders, enabling more complex and flexible effects. Nevertheless, in the late 2000s and 2010s, the industry saw a rapid rise in distinct compute APIs targeting GPUs and other accelerators. OpenCL was the first competitor to NVIDIA's CUDA, put forward by Apple, that was offering support for an exceptionally wide range of devices, from embedded CPUs to high-end GPUs across various vendors. Microsoft briefly attempted to enter this field with C++AMP, but that did not gain traction. AMD started promoting their own offering, ROCm and HIP. Based on feedback on OpenCL, Khronos started to develop SYCL as a more C++-focused alternative. Later, Intel built upon SYCL when they released their API standard. Thus, we see that basically all large vendors have a preferred API to use with their products, and then there are open standards by Khronos that offer solutions across different vendors.

In the meantime, graphics did not stand idle. Older APIs became more and more detached from the architectural changes of actual hardware, and the differences were needed to be bridged with the ever-growing complexity of drivers. At some point, there was enough tension to push for new graphics standards. Initially AMD put forward Mantle, that soon turned into the Vulkan API developed by Khronos, while almost simultaneously Microsoft was working on DirectX 12, and just a bit later Apple also developed their own Metal API.

The common theme across these APIs was the low-level approach to graphics programming, mostly focusing on the host side resource and state management. The idea was that giving more control to the developers would simultaneously solve two problems:

The driver can be simpler and does not need to bridge so many differences between the hardware and the developer-facing programming model.
The performance will be more predictable, as the developers would get exactly what they write, and they no longer need to double guess what the driver might be doing behind the scenes.

The results were somewhat mixed. While these APIs were indeed low-level — soberingly obvious from the thousand lines of code needed to render a single triangle — they were not that low-level to so closely match the hardware underneath, so some complexity remained in the driver. While developers got more freedom in managing resources and optimizing and specializing in their rendering pipelines, the associated complexity led to a relatively slow albeit steady adoption of the new APIs.

On the other end of the technology sphere, the web, there was much less success in improving the availability of acceleration offerings. OpenCL's webCL spin-off failed to gain traction, among many other attempts. Then, after long groundwork by Google, a W3C standard named WebGPU is under development and being finalized that finally brings compute shaders and somewhat low-level access to GPUs for not just the web but also to the desktop platforms as well.

Which API to choose?

Navigating the ever-changing and comically complicated landscape of just the programming interfaces — we have not even started talking about any deeper topics about actual GPUs! — seems intimidating, but here we are offering some hints that might be helpful.

From the above, a few things might be guessed: you most likely need to decide between graphics and general compute, and likely have to think about the platform and devices you are aiming to support. Let's start with some simple takes:

Do you want to target the web?
If yes, you have either webGPU (not yet final, not yet supported everywhere) or webGL (no general compute) at your service.
Do you want to target a specific vendor's devices for general compute?
If yes, use their favourite API (NVIDIA: CUDA, AMD: ROCm/HIP, Intel: oneAPI).
Do you want to target low-latency graphics specifically on Windows?If yes, go for DirectX 12.
Do you want to target low-latency graphics specifically on Apple products?If yes, go for Metal.

Then let's get into more details. If you are interested in some real-time graphics applications but would like to be more cross-platform, Vulkan is the best choice supported natively on Windows, Linux and Android.

If your application is much more about general purpose compute, but you need to support multiple vendor devices, you have further choices to make. If the selection is just between NVIDIA and AMD devices, HIP and ROCm can abstract that for you, as AMD basically copied the CUDA software stack and HIP can at runtime dispatch calls of compute functionalities to the respective back-ends. If you also need to support Intel devices, or more exotic accelerators and having a C API and kernel language is closer to your heart, OpenCL is the way to go. On the other hand, if you are much more interested in modern C++, you can give SYCL a try.

However, one might also approach the topic from the direction of learning and experimenting. If you are new to graphics, the low-level APIs might be too steep initially. In this case, playing around with the older APIs, like OpenGL or DirectX 11 (on Windows) is still a reasonable start. In the case of OpenGL, beware of the way too many older versions and aim for a tutorial targeting at least version 4.0. If you need a soft introduction to the low-level APIs, webGPU is recommended.
In the case of compute, if you have an NVIDIA GPU, CUDA has the widest coverage of learning materials, but the other mentioned compute APIs are also roughly on the same level of complexity and could be fine.
As a general rule, the longer something has been around, the more mature its support and ecosystem become, and thus the availability of learning materials is better.

Still unsure? Feel free to ask us, we can give more tailored advice based on your specific ideas!

GPU APIs and other names are registered trademarks of their respective owners.