Tutorials

05 March 2026

Introducing the Dashboard: Monitor Your GPU Workloads at a Glance

Dashboard gives users and admins visibility into GPU workloads, utilization, VRAM, temperature, and spend rate to spot idle resources and reduce waste.

Introducing the Dashboard: Monitor Your GPU Workloads at a Glance

Introducing the Dashboard: Monitor Your GPU Workloads at a Glance

Until now, there was no single place to see whether your GPU workloads were running as expected — or whether your organization's GPU budget was being used efficiently. We built the Dashboard to change that.

TL;DR

  • Home Dashboard is now live — see all your active workloads, GPU usage, and spend rate in one place
  • Org Dashboard gives admins org-wide visibility into GPU utilization, team-level spend, and idle workloads
  • Workspace Metrics now shows VRAM usage and GPU temperature

Background

GPU compute is expensive. But knowing how well it's actually being used has always required workarounds.

Individual users had to open each workspace one by one, check logs, or run low-level tools like nvidia-smi to understand what their GPU was doing. Questions like "Is my training job actually computing anything?" or "Am I about to run out of VRAM?" had no easy answer from the platform.

For organization admins, the blind spot was even larger. There was no way to see GPU usage at an organizational level — no breakdown by team or user, no sense of how much was being spent in real time, and no signal when resources were sitting idle. Making informed decisions about resource allocation meant relying on manual check-ins or out-of-band tooling.

We heard this repeatedly in interviews with users across academia and industry: a dashboard is table stakes for a paid GPU platform. The Dashboard is our answer to both problems.

What's New

Home Dashboard — For Individual Users

Screenshot 2026-03-05 at 4.34.10 PM.png

The Home Dashboard is the first thing you see when you log in. It answers the question: "What are my workloads doing right now?"

Summary Cards

At the top of the page, four cards give you a quick health check:

  • Active Workloads — Total number of your workloads excluding terminated ones (across all teams)
  • GPUs in Use — How many GPUs are allocated to your workloads right now
  • GPU Utilization — GPU-weighted average utilization across your running workspaces (1hr avg)
  • Spend Rate — How much you're spending per hour, right now (e.g., $21.85/hr)
Screenshot 2026-03-05 at 4.37.57 PM.png

Workloads Table

Below the summary, your workloads are listed in a single table. Use the All teams and All statuses filters at the top to narrow down the list.

Each row shows:

  • Workload Name — Click to go to the workspace detail page. Under-utilized workloads are visually flagged.
  • Status — Running, Stopped, Paused, Failed, etc.
  • Team — Which team this workload belongs to
  • GPU — Type and count (e.g., 1x A100 GPU) or CPU only
  • GPU Utilization (1hr) — Average GPU utilization over the last hour
  • Spend Rate — Cost per hour for this workload
image.png

Under-utilization Alert

When a workload has been running with low GPU utilization for 1hr, it's visually flagged in the table.

image.png

Org Dashboard — For Organization Admins

The Org Dashboard is accessible from Manage > Organization > Dashboard and is only visible to org admins. It answers: "How is our organization using GPU resources, and where is our budget going?"

image.png

Idle Workload Banner

If any workloads have had 0% GPU utilization for the past 3 hours, a banner appears at the top of the page:

Click Review to jump to the filtered workload list showing idle workloads only.

image.png

Summary Cards

  • Running Workloads — Total active workloads across the entire organization
  • GPUs in Use — Sum of all GPUs currently allocated
  • Organization GPU Utilization — Weighted average GPU utilization across all workloads (1hr avg, weighted by GPU count)
  • Organization Spend Rate — Total hourly cost for the organization
image.png

Spend Trend Chart

The spend trend chart shows your organization's credit usage over time, broken down by team. Each team appears as its own line, alongside an "All teams" aggregate line, so you can see exactly where spend is coming from.

Use the date range picker to zoom into a specific window.

image.png

Team Breakdown Table

The Team Breakdown table gives admins a per-team view of GPU usage:

  • Team / User: Description: Team name. Click the > icon to expand and see per-user breakdown.
  • GPUs in Use: Description: Total GPUs currently allocated to this team (or user)
  • Avg Utilization(1h): Description: GPU-count-weighted average utilization. Color-coded: 🟢 90%+, ⚪ 30–90%, 🟠 <30%

Click any team row to drill down and see individual user-level utilization within that team.

image.png

Workload Table

The full workload table lets admins filter by team, user, or idle-only status. Each row shows:

  • Workload Name, Team, User
  • GPU — Type and count
  • GPU Utilization (1hr) — Average over the last hour
  • VRAM Usage — Current video memory consumption
  • Runtime — How long this workload has been running
image.png

Workspace Metrics — Deeper Visibility

The Workspace Metrics page (accessible from any workspace's detail view → Metrics tab) has been updated with two new charts, an improved time range selector, and contextual banners.

New Charts & Filters

  • VRAM Usage — See how much video memory your workload is consuming over time. Sustained usage above 95% is a signal that an OOM error may be imminent.
  • GPU Temperature — Monitor GPU temperature over time. Temperatures consistently above 85°C can indicate thermal throttling, which silently reduces your compute performance.
  • Time Range Filter — Choose from 1h, 6h, 12h, 1d, or 7d to zoom in or out on your metrics.
  • Layout Toggle — Switch between grid view (2 charts per row) and expanded view (1 chart per row) for easier reading.
image.png

Reading the Charts

  • GPU Utilization: What to watch for: Below 30% for an extended period → consider downscaling
  • VRAM Usage: What to watch for: Consistently above 95% → OOM risk; consider upscaling or reducing batch size
  • GPU Temperature: What to watch for: Sustained above 85°C → thermal throttling may be reducing performance
  • CPU Utilization: What to watch for: High CPU with low GPU → possible data loading bottleneck
  • Memory Usage: What to watch for: System RAM usage by your workload
  • Network I/O: What to watch for: Data transferred in and out of the workspace

Contextual Banners

When you arrive at the Metrics page, a banner appears at the top based on the workload's GPU utilization state.

Low utilization (GPU Util < 30%)

image.png
Low GPU utilization — GPU is running below 30%. Want to save costs? You can downscale using Pause → Edit.

High utilization (GPU Util > 90%)

image.png
Great GPU utilization! Need more power or worried about OOM? You can scale up using Pause → Edit.

Where to Find the Dashboard

  • Home Dashboard: Navigate to Home in the left sidebar after logging in.
  • Org Dashboard: Go to Manage > Organization > Dashboard (visible to org admins only).
  • Workspace Metrics: Open any workspace → click the Metrics tab.

To Conclude

The Dashboard brings together GPU utilization, VRAM, temperature, cost, and team-level usage into surfaces designed for your role — whether you're an engineer debugging a training run or an admin allocating resources across teams.

Ready to take a look? Log in and head to Home in the left sidebar to get started.

We'd love your feedback. If you have questions or suggestions, reach out to us at support@vessl.ai.

Happy training! 🚀

Lidia

Lidia

Product Manager

Wayne Kim

Wayne Kim

Product Marketer

Try Dashboard on VESSL Cloud

Try VESSL today

Build, train, and deploy models faster at scale with fully managed infrastructure, tools, and workflows.

Get Started

MLOps for high-performance ML teams

© 2026 VESSL AI, Inc. All rights reserved.