Clusterfudge Runthe comprehensive platform for running your experiments

Run your experiments, monitor metrics, logs, and hardware resources in a unified dashboard. Sync code from your local machine. Debug issues and relaunch jobs from any device.

Everything in one place

Monitor, analyze, and control your AI experiments

Get complete visibility into your experiments and infrastructure. Take action from anywhere, on any device.

Experiment Launching & Monitoring

Monitor your experiments in real-time. Graphs for all your job's metrics. Track progress, relaunch, edit, and debug issues quickly.

Code Syncing

Sync local code and run it in-cluster all via the Clusterfudge VS Code extension. No shared drives or in-cluster github deploy keys needed.

Centralized Logs

Access logs from just your jobs on a dedicated job details page. Powerful search and filtering to quickly find what you need.

Hardware Monitoring

See live GPU utilization and memory usage. Filter by jobs and check availability of resources across your entire infrastructure.

Remote Job Control

Relaunch, stop, or modify jobs from anywhere on any device. No need to SSH into your machines.

MCP Server

Ask Claude to check the status of jobs, hardware resources, and more, directly from your chat window.

A preview of the Launches page of Clusterfudge, showing a list of launches in various states, like running and completed.
New

Sync code and launch jobs directly from your IDE

Sync local code and run it in-cluster all via the Clusterfudge VS Code extension. No shared drives or in-cluster github deploy keys needed.
New

The worlds first MCP Server for your cluster

Deploy Clusterfudge across your GPU clusters and unlock a previously inaccessible wealth of information: failed jobs, inefficient workloads, idle nodes. Our new clusterfudge-mcp server makes all this valuable data directly accessible to your favorite AI models right in your IDE, empowering you to interact with your cluster and workloads in novel and exciting ways.

Ready to take control?

Start monitoring your experiments and resources today. Get insights that help you build better models faster.