Iterate faster on AI using any hardware
Built by talent from
Accelerate AI research
Access GPU workstations
Start GPU workstations in your cluster and access them from anywhere with a simple SSH command.
ssh h100.ssh.clusterfudge.com
Launch Jupyter Notebooks
Run Jupyter Notebooks in your cluster with a single click. Our built-in tunneling eliminates the need for SSH port-forwarding, providing instant web access without any complex setup.
Specify workloads in pure Python
AI runs on Python. Now your GPU cluster does too. Clusterfudge provides a pure-Python API for specifying and launching experiments and multi-node training jobs.
Track research progress
View our web dashboard anywhere. You and your team can see launches, logs, tracebacks, and hardware metrics — even on mobile.
Reliable foundations for your matrix multiplications
Track hardware health
We've seen it all before. Now you can too. Get email and Slack notifications for common problems:
- ✓Disk usage (and emergency disk space recovery)
- ✓GPUs falling off the PCI bus
- ✓Thermal throttling
Maintain environment consistency
Ensure consistent environments across all your clusters and nodes. Check your cluster for differences in CUDA drivers, Python and PyTorch versions, and other Python dependencies.
Node 1 | Node 2 | Node 3 | Node 4 | |
---|---|---|---|---|
CUDA Driver | 11.7 | 11.7 | 11.7 | 11.8 |
Python Version | 3.9.7 | 3.9.7 | 3.9.7 | 3.9.7 |
PyTorch Version | 1.10.0 | 1.10.0 | 1.10.1 | 1.10.0 |
Registered GPUs | 8 | 8 | 8 | 7 |
Migrate GPU workloads
Automatically detect unhealthy nodes. Migrate GPU workloads to spare nodes. We use CRIU and CUDA checkpoint to snapshot and restore GPU workloads.
Easy setup
Run Clusterfudge in one-line, with zero config and no dependencies.
Sign up
Sign up to get an API key and personalised command to run our agent — Fudgelet.Sign up to get an API key and personalised command to run our agent — Fudgelet.
Run Fudgelet
Run Fudgelet on your compute node. This auto-detects GPUs and allows it to run workloads.Run Fudgelet on your compute node. This auto-detects GPUs and allows it to run workloads.
Launch workloads
Launch notebooks and workstations via the web, or write your own launches using our Python API.Launch notebooks and workstations via the web, or write your own launches using our Python API.
$ curl https://get.clusterfudge.com/run.sh |
API_KEY=<your-api-key> bash