Content

What do GPU APIs do?

June 17, 2025 - Daniel

What is the role of GPU APIs exactly?

Previously, we talked about GPU APIs, but perhaps a bit more detail on how all the components we use build up a correctly working hardware-software configuration might be helpful. After all, APIs are mostly just a plain text contract (perhaps even less: a promise) that if we use them as explained in the specifications, we can get the hardware to do what we would like to see.

So what components are there? Making some simplifications, there are:

The API specification as a text document (HTML, PDF, etc.) put together by a group of experts
The implementation of the API, usually a software library
The driver (and other related services), supplied by the hardware vendor
The operating system
The hardware itself

Putting it simple: the API and its implementation expose a set of hardware functionalities into user-space that are safe to be used by applications to achieve their desired operations, while not compromising the integrity of neither the hardware nor the operating system and other programs.

We can see that many different components from different sources need to be able to communicate and understand each other for this to work. Starting from the easier end: the hardware vendors develop their hardware and the drivers for them, thus we can expect that those two components can understand each other. However, a vendor usually has many different devices in their portfolio and they need to support previously sold hardware for some time. This could cause some software engineering challenges, like abstracting over different capabilities, limitations or communication protocol versions.

Then, there are different operating systems that end users would like to use in conjunction with those hardware devices. The driver needs to integrate with these tightly, as user programs cannot access the hardware directly (nor should they!), so they need to operate close to the OS's kernel respecting its driver logic interfaces.

Then there are the application developers who want to build upon the capabilities of the hardware. This is where libraries, APIs and accidental complexities come into play. Due to historical reasons, there are many different APIs, each providing a slightly or moderately different abstraction or view of the same hardware and each targeting different application domains. And worse, all these APIs need to be simultaneously accessible since once an API is out in the wild, there will be applications depending on it being available and working properly, according to its specification for decades.

On top of that, there is the security and isolation aspect: users would - naively - expect that a crashing GPU operation would not take down other programs or the entire system. Also, different APIs and applications using the same device are expected to be isolated and should not access each other's data directly or indirectly. While these aspects are supported for CPUs for decades now, GPU systems needed to evolve a long way to approach parity, and one could argue they are still not entirely there.

What do GPU APIs give us?

While we mentioned that different APIs provide a different view of the hardware, on a meta-level, they are actually kind of similar. If one looks through the specifications, there are four large groups of functionalities that are exposed by GPU APIs:

Access to implementation and device information
Data allocation and movement between host and device
Selecting and activating the functionalities that do something with the data on the device
Management functionalities relating to scheduling, executing and synchronizing operations

Let's go through these groups one-by-one.

1 - Implementation and device information

As a single API abstracts over devices with different capabilities, and we build an application against a specific API while we (or the user) expects it to run on any devices supported by the API, there should be a point where the abstraction is undone, and the currently available features and capabilities are checked against what is needed in the particular application. As such, explicitly or implicitly, usually at the beginning of applications, queries are made through the API to get version information and supported features of the underlying implementation and the hardware device or devices that are available in the system at hand. Based on this, the application can decide if it can work in the given hardware-software environment or not.

2 - Data

The next set of functionalities are related to data handling. The GPU devices have their own memory, so applications should be able to say something about how they would want to store their data. Usually there is support for storing arbitrary data as buffers and image-like data as textures. Some APIs expose shared memory space between the host and the device, and some expose further, specialized ways to pass data to the device. Usually there are ways to copy portions or whole parts of buffers and textures to-, from- or between devices and the host. There is usually a memset functionality too.

3 - Functions on data

The main reason we use devices is that they do something interesting: they support some operations that transform the data we give to them. GPUs have fixed and programmable functionalities. Examples of fixed functionalities were the traditional graphics workloads before programmable shading, where only some fixed parameters could be set to control the rendering algorithms. These days we meet with fixed functionalities via image and video decoding and encoding, where a fixed set of formats are supported and we can choose among them.

Programmable functionalities are more interesting but also more complex. APIs support consuming a specialized program either as a source text file or as a binary that describes a close to arbitrary operation to be performed by the device. The necessary scaffolding includes passing these sources and necessary other information (e.g. version information, feature flags, compilation options) to the API, being able to query the result and log of the compilation and finally setting up a such a programmable operation for execution, that especially for graphics applications involves specifying an intricately complicated pipeline object.

4 - Scheduling, execution, synchronization

Once we have selected our device, decided that it supports the features required by our application, passed our data to the device and set up the program, all what remains is to actually do the thing we wanted. APIs either implicitly or explicitly manage a list or even more a Directed Acyclic Graph of tasks and their dependencies, that we need to somehow express to them. Then, once this task description is submitted, the implementation in the background crunches through it, usually asynchronously. This means that there must also be ways to check if a particular task or an entire workload has already finished and if there were any execution errors. We might also be interested in some performance counters, like elapsed time or the number of pixels drawn.

This might sound a lot, but this is just how low-level software engineering is, especially when someone looks at it from a high-level systems' perspective. Choosing the right API for the right task and properly using its exposed functionalities could mean significant differences in performance, latency and development time.