With the launch of VESSL Hub, we are starting a new monthly series, a quick rundown of new models uploaded on VESSL Hub. You can try out these models with a single click on VESSL Hub and later edit the model YAML to bring your datasets and GPUs. Learn how VESSL AI handles these workloads behind the scenes in our recent blog post↗.
1. Llama 2
- Explore Meta’s launch page — https://ai.meta.com/llama/↗
- Launch Llama 2 fine-tuning on VESSL Hub →↗
Llama 2 is an open-source large language model released by Meta. In comparison with the previous generation, Llama 2 was trained on a 40% bigger training dataset, doubled the context length, and added a grouped query attention mechanism (GQA) for a 70B model to improve the inference scalability. Meta has open-sourced the code and model weights of 7B, 13B, and 70B. We’ve already seen multiple iterations and applications built on Llama 2, including Meta’s own CodeLlama, an AI tool for coding fine-tuned on the code generation dataset.
On VESSL Hub, we’ve uploaded an example of a fine-tuning workload for Llama 2 7B. We used the code instruction dataset consisting of prompts made of instructions, inputs, and outputs. The model was loaded in 8-bit quantization mode and fine-tuned using LoRA (Low-Rank Adaptation).
The model also uses VESSL AI’s Python SDK vessl.log
for tracking key metrics like learning rate, training loss, and validation loss. You can take a look at these values under the Plots page. Once the fine-tuning is completed, our experiment tracker will store various artifacts of the top-performing models, and the best model will be stored automatically in VESSL AI’s model repository. You can access this model under the Models page.
2. Stable Diffusion
- Read the paper — https://arxiv.org/abs/2112.10752↗
- The original code — https://github.com/AUTOMATIC1111/stable-diffusion-webui↗
- Run AUTOMATIC1111’s Stable Diffusion app on VESSL Hub →↗
Stable Diffusion is a text-to-image model that uses diffusion techniques to generate an image from noise by iterating gradual denoising steps. Unlike other diffusion models, Stable Diffusion is faster and uses fewer resources because it performs the diffusion process in a latent space with a small dimension and restores the original domain using a reconstructor such as an autoencoder. In addition, by adding a cross-attention mechanism to the main layers of U-Net, it can be utilized for noise-to-image tasks and multi-modal tasks such as text-to-image and layout-to-image.
On VESSL Hub, you can try the Stable Diffusion web app built by AUTOMATIC1111. This web app automatically loads weights when it starts so you try some examples as soon as you launch the app. You can directly import compatible stable diffusion weights uploaded to Hugging Face as specified in our YAML, or you can load weights from our model repository.
3. Mistral 7B
- Read the paper — https://arxiv.org/abs/2310.06825↗
- The original code — https://github.com/mistralai/mistral-src↗
- Run a Streamlit app for Mistral 7B on VESSL Hub →↗
Mistral-7B is an open-source large language model developed by Mistral with an emphasis on high throughput and efficiency. The model adopts grouped query attention to increase inference speed and reduce memory usage for decoding. It also utilizes a sliding window attention mechanism to enable fast inference and extended context handling, making the model suitable for low compute usage. Despite having fewer parameters, Mistral 7B outperforms Llama 2 13B on all benchmarks and Llama 1 34B in reasoning, mathematics, and code generation benchmarks.
On VESSL Hub, we created an inference web app for Mistral 7B with Streamlit. Try out the app with your prompts.
4. SSD-1B
- Read Segmind’s release — https://blog.segmind.com/introducing-segmind-ssd-1b/↗
- Run SSD-1B web app on VESSL Hub →↗
The Segmind Stable Diffusion Model (SSD-1B) is a distilled, 50% smaller version of the Stable Diffusion XL (SDXL). Segmind achieved this downsize with minimum performance drop by removing the transformer blocks in the attention layers and the attention & ResNet layer in the mid-block. In addition, Segmind made the U-net block shorter in each stage by distilling it progressively. As a result, SSD-1B shows 16.02 iterations per second at a single-unit batch size, which is 56% faster than SDXL’s 10.26 It/s. For 16 units, SSD-1B is still 60% faster than SDXL (1.21 It/s vs 0.75 It/s).
We built a Streamlit app for SSD-1B you can launch on VESSL Hub. You can deploy the app with just a single click and start generating images.
5. Whisper V3
- Read OpenAI’s release post — https://openai.com/research/whisper↗
- Read the paper — https://cdn.openai.com/papers/whisper.pdf↗
- The original code — https://github.com/openai/whisper↗
- Try OpenAI's Whisper on VESSL Hub →↗
Whisper is a general-purpose speech recognition model released by OpenAI. Unlike traditional ASR models, it can handle multiple languages and tasks with a single model. Whisper is also a multitasking model, which means it can also translate speech from one language to another, identify the language of the speech, detect voice activity in the audio, and more. Whisper is trained on a large and diverse dataset of English and non-English audio and text, which makes it robust to different accents, noises, and domains and able to be used without additional fine-tuning.
On VESSL Hub, you can test Whisper with LibriSpeech ASR. When you launch the app, it will show the transcriptions and the ground truths of the first 5 data.
—
Sanghyuk, ML Engineer
Yong Hee, Growth Manager