25 November 2024
We're releasing VESSL Storage. Through this blog post, you can learn what VESSL Storage is and how you can leverage it to enhance your machine learning data management.
We are thrilled to announce the release of VESSL Storage. VESSL Storage is our integrated storage system that empowers users to manage models, logs, datasets, and more within their AI/ML workflows. In the past, navigating the various features of VESSL Storage could be challenging. To enhance user experience and streamline data management, we've implemented significant updates.
Our primary goals with this update were:
Ephemeral volumes are temporary storage solutions used in Kubernetes for tasks like fine-tuning jobs. However, data in these volumes is lost once the instance stops. To prevent data loss, VESSL Storage now captures and stores this data before the system erases it.
Users often need to re-import previously exported data for backup or further use. We've addressed this by enabling exported data in storage to be easily re-imported.
To achieve these goals, we've introduced the following updates:
We've consolidated artifacts, models, logs, and datasets into a single entity called Volume. This unification simplifies data management and enhances usability.
Volumes can now be effortlessly integrated with Run↗, Workspace↗, Service↗, and Pipeline↗, streamlining your workflow.
Users can store Volumes not only in VESSL Storage but also in external storage solutions such as AWS S3, GCP Storage, and on-premises Network File System (NFS).
Within VESSL Storage's volumes, users can view and manage exported volumes, including logs, metrics, and model checkpoints from VESSL Run and Workspace.
For more detailed information about these key updates, please visit our changelog↗.
In this walkthrough, we'll guide you through leveraging VESSL Storage to enhance your ML workflows.
Before you begin, ensure you have the following:
On the VESSL page, you can find the “VESSL storage” section at the top. The VESSL-managed storage is enabled by default in VESSL Storage.
Follow these steps to create a new volume in Storage:
Follow these steps to connect your AWS S3 or GCP Storage account:
VESSL supports NFS and host path volumes for users utilizing on-premises storage solutions.
1. In the new external storage setup, select “On-premises”.
2. Fill in the necessary details:
3. NFS volumes can be directly mounted to a run, reducing initialization time and making them ideal for large datasets.
1. In the VESSL storage, click on the storage you’ve created in the previous steps (for example, AWS, GCP, On-premise). In the example image below, it is about the VESSL-managed storage.
2. Click on “Create a new Volume”.
3. Add files or folders to your new volumes by uploading them directly.
4. Optionally, add tags to your Volume for better organization on the Volumes page also.
1. Navigate to your Run configuration.
2. In the Task > Volumes
section, select the volume you've created.
1. After your workload is complete, export the volume by selecting the “Export” option in the Run page.
2. Choose the destination storage and click the “Export” button.
1. Go to the “Files” tab in a completed Run first.
You can view the exported volumes in the “Volumes” folder. When you click the exported volume path, you will be taken to the Storage page where the volume is stored.
2. Also, you can check the exported volumes in VESSL Storage and verify that your exported volume is available. For instance, the Volume named “Create your next model -369…” represents the exported volumes from your previous Run.
Like the image above, you can view the exported volumes in the Jupyter session for the volume in Workspace. To test your jobs, please refer to the YAML configuration below:
Before running this session, you must create a volume named “test3.”
name: test-run
import:
/input/: volume://vessl-storage/test3
export:
/output/: volume://vessl-storage
resources:
cluster: vessl-gcp-oregon
preset: gpu-l4-small
image: quay.io/vessl-ai/python:3.10-r1
run:
- command: echo "test3!" > test3.txt
workdir: /output
interactive:
max_runtime: 24h
jupyter:
idle_timeout: 120m
Once the workspace is completed, you can view the exported volume in the Jupyterlab session.
VESSL Storage supports the backup of volumes. As previously mentioned, backing up exported volumes is a pivotal part of data management. With VESSL Storage, you can smoothly back up exported volumes to VESSL Run or Workspace. To summarize, you can perform the following actions on VESSL:
With the release of VESSL Storage, managing your ML data has never been easier. The unified “Volume” concept, seamless integration with workloads, and enhanced storage options empower you to streamline your AI/ML workflows. Whether you're using cloud storage solutions like AWS S3 or GCP Storage, or on-premises systems, VESSL Storage provides the flexibility and functionality you need.
We invite you to explore these new features and see how they can enhance your data management processes. For any questions or further assistance, please refer to our documentation↗ or contact our support team (support@vessl.ai).
Thank you for choosing VESSL. Happy coding 🙌🏻
Technical Communicator
Product Manager
Build, train, and deploy models faster at scale with fully managed infrastructure, tools, and workflows.