Product

25 November 2024

Introducing VESSL Storage: Seamlessly Store and Connect Your ML Data on VESSL

We're releasing VESSL Storage. Through this blog post, you can learn what VESSL Storage is and how you can leverage it to enhance your machine learning data management.

Introducing VESSL Storage: Seamlessly Store and Connect Your ML Data on VESSL

TL;DR

  • VESSL Storage is now live
  • Simplify your data management on VESSL without confusion.
  • Learn how to manage your ML workflows effectively with VESSL Storage.

Background

We are thrilled to announce the release of VESSL Storage. VESSL Storage is our integrated storage system that empowers users to manage models, logs, datasets, and more within their AI/ML workflows. In the past, navigating the various features of VESSL Storage could be challenging. To enhance user experience and streamline data management, we've implemented significant updates.

Key updates

Our primary goals with this update were:

Preserving data from Ephemeral Volumes

Ephemeral volumes are temporary storage solutions used in Kubernetes for tasks like fine-tuning jobs. However, data in these volumes is lost once the instance stops. To prevent data loss, VESSL Storage now captures and stores this data before the system erases it.

Simplified volume backups

Users often need to re-import previously exported data for backup or further use. We've addressed this by enabling exported data in storage to be easily re-imported.

To achieve these goals, we've introduced the following updates:

  • Unified "Volume" concept

We've consolidated artifacts, models, logs, and datasets into a single entity called Volume. This unification simplifies data management and enhances usability.

  • Seamless integration with workloads

Volumes can now be effortlessly integrated with Run, Workspace, Service, and Pipeline, streamlining your workflow.

  • Enhanced storage options

Users can store Volumes not only in VESSL Storage but also in external storage solutions such as AWS S3, GCP Storage, and on-premises Network File System (NFS).

  • Manage exported volumes in VESSL Storage

Within VESSL Storage's volumes, users can view and manage exported volumes, including logs, metrics, and model checkpoints from VESSL Run and Workspace.

For more detailed information about these key updates, please visit our changelog↗.

Walkthrough: How to Use VESSL Storage

In this walkthrough, we'll guide you through leveraging VESSL Storage to enhance your ML workflows.

Prerequisites

Before you begin, ensure you have the following:

  • A VESSL account with credits (new users receive free credits upon sign-up)
  • Have Volumes (metrics, logs, datasets, etc.)
  • External storage account credentials or information (for example, AWS, GCS, On-premises) for connection. To connect credentials, navigate through the “Integrations” settings page. For detailed instructions, please refer to this document↗.

Using VESSL-managed storage

On the VESSL page, you can find the “VESSL storage” section at the top. The VESSL-managed storage is enabled by default in VESSL Storage.

  • Built-in storage support: No need to connect with external storage services.
  • Optimized storage: As our proprietary solution, it is optimized to work seamlessly with our other features.

Follow these steps to create a new volume in Storage:

  • 1. Click on the “vessl-storage” card.
  • 2. Define a “Volume name” and add tags that can explain what is inside.
  • 3. Add a new file or folder.

Connecting external storage: AWS/GCS

Follow these steps to connect your AWS S3 or GCP Storage account:

  • 1. Sign up or Log in to your VESSL account.
  • 2. Navigate to VESSL Storage in the dashboard.
  • 3. Click on the “New external storage” button.
  • 4. Select “Amazon S3” or “GCP storage”, and fill in the required information:
    • Storage name
    • Bucket path
    • Credentials
  • 5. Click the “Test connection” button to verify the connection.
  • 6. Once the connection is successful, click the “Integrate” button to integrate your external storage with VESSL.

Connecting on-premises storage

VESSL supports NFS and host path volumes for users utilizing on-premises storage solutions.

1. In the new external storage setup, select “On-premises”.

2. Fill in the necessary details:

  • Storage name
  • Server IP address
  • Base path
  • Select the appropriate Cluster.

3. NFS volumes can be directly mounted to a run, reducing initialization time and making them ideal for large datasets.

Importing and exporting volumes

Creating a new volume

1. In the VESSL storage, click on the storage you’ve created in the previous steps (for example, AWS, GCP, On-premise). In the example image below, it is about the VESSL-managed storage.

2. Click on “Create a new Volume”.

3. Add files or folders to your new volumes by uploading them directly.

4. Optionally, add tags to your Volume for better organization on the Volumes page also.

Importing a volume into a Run

1. Navigate to your Run configuration.

2. In the Task > Volumes section, select the volume you've created.

Exporting volumes after workloads

1. After your workload is complete, export the volume by selecting the “Export” option in the Run page.

2. Choose the destination storage and click the “Export” button.

Accessing exported volumes

1. Go to the “Files” tab in a completed Run first.

You can view the exported volumes in the “Volumes” folder. When you click the exported volume path, you will be taken to the Storage page where the volume is stored.

2. Also, you can check the exported volumes in VESSL Storage and verify that your exported volume is available. For instance, the Volume named “Create your next model -369…” represents the exported volumes from your previous Run.

In Workspace

Like the image above, you can view the exported volumes in the Jupyter session for the volume in Workspace. To test your jobs, please refer to the YAML configuration below:

Before running this session, you must create a volume named “test3.”
name: test-run
import:
/input/: volume://vessl-storage/test3
export:
/output/: volume://vessl-storage
resources:
cluster: vessl-gcp-oregon
preset: gpu-l4-small
image: quay.io/vessl-ai/python:3.10-r1
run:
- command: echo "test3!" > test3.txt
workdir: /output
interactive:
max_runtime: 24h
jupyter:
idle_timeout: 120m

Once the workspace is completed, you can view the exported volume in the Jupyterlab session.

Back up exported volumes

VESSL Storage supports the backup of volumes. As previously mentioned, backing up exported volumes is a pivotal part of data management. With VESSL Storage, you can smoothly back up exported volumes to VESSL Run or Workspace. To summarize, you can perform the following actions on VESSL:

  • 1. Import volumes
  • 2. Store exported volumes in Storage
  • 3. Re-import the exported volumes into VESSL Run or Workspace

To Conclude

With the release of VESSL Storage, managing your ML data has never been easier. The unified “Volume” concept, seamless integration with workloads, and enhanced storage options empower you to streamline your AI/ML workflows. Whether you're using cloud storage solutions like AWS S3 or GCP Storage, or on-premises systems, VESSL Storage provides the flexibility and functionality you need.

We invite you to explore these new features and see how they can enhance your data management processes. For any questions or further assistance, please refer to our documentation↗ or contact our support team (support@vessl.ai).

Thank you for choosing VESSL. Happy coding 🙌🏻

Wayne Kim

Wayne Kim

Technical Communicator

Intae Ryoo

Intae Ryoo

Product Manager

Try our product!

Try VESSL today

Build, train, and deploy models faster at scale with fully managed infrastructure, tools, and workflows.

Get Started

MLOps for high-performance ML teams

© 2024 VESSL AI, Inc. All rights reserved.