GPU Occupancy

A quick primer on one of our favourite metrics

Sam2025-02-11

GPU Occupancy

When AI labs are trying to understand their GPU usage from an operational perspective, there are a few metrics they might consult: some type of utilization-based metric like power utilization or memory utilization, or a scheduler-derived metric like queue depth/wait times in Slurm or pod-based metrics in Kubernetes.

We think there is a better option, one we don't often see discussed, one that is a great representation of actual resource usage, a metric we call GPU Occupancy.

Defining Occupancy

Each GPU has a record of which processes on the machine are currently using the device. This is most commonly seen by running nvidia-smi for a given GPU and checking the Processes listed at the bottom of the output. For a single GPU/node, this is useful for debugging a workload, but across an entire cluster, understanding which GPUs have processes running against them gives you a fantastic overview of current resource allocation. This is the basis of our occupancy metric.

We define the occupancy of a single GPU as the percentage of time a minimum of one process was associated to that GPU, over a given timeframe.

We love occupancy so much, it is the default cluster-usage metric we surface in the Clusterfudge dashboard. It is one of the very first things you see when you hit our dashboard.

Monitoring occupancy can reveal a lot about your clusters day-to-day usage:

Consistently low occupancy might suggest researchers are slowed down by toil or cluster instability
Protracted periods where occupancy craters may suggest failures aren't being remediated (or even detected) quickly
Consistently high occupancy is a good indicator a cluster is under-resourced; maybe there is a low-occupancy cluster you can reallocate GPUs from?

What else can occupancy show us?

Once you have calculated the occupancy of your GPUs, you can extract some really useful ancillary metrics for your cluster:

Peak occupancy

The peak occupancy is the highest recorded value for cluster-wide occupancy for a given timeframe. This effectively shows you the headroom of your cluster, which is especially useful for tracking reserve nodes you are keeping spare in case of hardware failures (where you'd probably be looking for a peak occupancy of somewhere between 90% and 98%).

Unoccupied GPU Hours

This is the sum of all hours across all GPUs where no process was associated to a GPU. This is a fantastic way to derive the dollar amount of wastage accrued due to being inefficient with your using your paid-for resources.

Occupancy of Assigned GPUs

Schedulers like Slurm and Kubernetes allow you to specify how many GPUs your job needs. Researchers may copy-paste a previous job spec or misconfigure their workload to request more GPUs than they actually need. Occupancy gives you a source of truth to compare these requests against, making it possible to spot job configurations where the requested GPUs exceed the actual GPU usage.

Here is an example of a misconfigured job that is over-requesting GPUs; a properly configured job should be consistently at 100% occupancy.

Reporting against Occupancy

Occupancy when compared across periods, gives a truer picture as to the traffic of workloads on your cluster.

Occupancy vs Utilization

Utilization metrics are a reflection of the underlying workloads; how well optimized they are and what work they are primarily performing (shuffling data around memory vs CPU-bound vs hardcore FLOPping). Jobs that require a GPU don't always peg that GPU (like running some evals against a shiny new open-source model or data labelling jobs), and many workloads are CPU/memory/network constrained instead. Most labs have a few jobs like these that can easily distort any reporting that depends predominantly on GPU utilization metrics.

Occupancy is about getting above the workload level and instead thinking holistically about how well you are scheduling the use of your resources, irrespective of how well optimized they are.

For this reason, we'd recommend diving deeper into optimizing utilization only after you've got a good grasp on occupancy. You want to focus on resources that are entirely unused before looking to squeeze more out of in-use GPUs. Furthermore, we'd recommend you make sizing and resource allocation decisions primarily based on occupancy, as opposed to utilization for these same reasons.

Need a hand getting those occupancy numbers?

Struggling to work out who's using what in which cluster? Worried you're (under/over)provisioned, but need some more data to back up that intuition? No idea how much money you're wasting each month on unused GPUs?

Get started with Clusterfudge

Join leading AI labs using Clusterfudge to accelerate their research.

Get Started Free ✨