80 tokens/s with DBRX in 15 minutes
Getting Started
Crusoe CLI
In this tutorial, we will be running DBRX-Instruct on L40S instances provided by Crusoe Cloud using the CLI. First, ensure that you have the CLI installed by following the instructions here and verify your installation with crusoe whoami
.
Starting a VM
We'll run DBRX-Instruct on an L40S.8x instance with our batteries-included NVIDIA image. To create the VM using the CLI, run the following command:
Wait a few minutes for the VM to be created, then note the public IP. Verify that you are able to access the instance with ssh ubuntu@<public ip address>
. Then, exit the VM and we'll set up a storage disk to load our massive MoE model. If you didn't log the public IP, simply open up your Instances tab on the Crusoe Console and copy the address from there.
Creating and Attaching a Persistent Disk
It's always recommended to create a disk to avoid misusing the boot disk (128 GiB), but particularly with LLMs we can run out of storage very quickly. The DBRX-Instruct repo is 490 GiB, so we'll create a 1TiB disk for some breathing room. Back on your local machine with the crusoe CLI installed, run the following to create a disk:
Now, let's attach the disk to our instance with:
SSH into your instance (ssh ubuntu@<public ip address>
) and run lsblk
. The persistent disk will show up as vd[b-z]
. Now, create the filesystem by running:
Create a directory to mount the volume. For this tutorial, we'll run sudo mkdir /workspace/
. Finally, mount the volume by running:
You can verify that the volume was mounted by running lsblk
again and seeing /workspace
attached to vdb under MOUNTPOINTS.
Clone DBRX-Instruct and DBRX-Instruct-Tokenizer
For simplicity, we will clone the repos for both the instruct model and tokenizer (as opposed to letting HF handle caching) and provide local paths when loading our resources. Navigate to /workspace
and run the command mkdir models && cd models/
.
DBRX-Instruct is a gated model, so you will need to request permission in order to interact with the model. Please refer to the DBRX-Instruct repo for steps on how to do so.
Git LFS
HuggingFace uses lfs to manage large files, so we'll have to run a couple commands to get it set up:
You can verify git lfs is set up with git lfs --version
.
Clone DBRX-Instruct
Now, clone the repository with git clone https://huggingface.co/databricks/dbrx-instruct
. NOTE: you will be prompted for your hugging face username and password, provide your ACCESS TOKEN when prompted for your password.
This will kick off the download for the entire repo which is ~490 GiB. Luckily, this is running on our VM on a site with high-speed networking 😁. Even so, download speed can be limited by demand on the host server so feel free to go grab coffee and come back when the download is done.
Clone DBRX-Instruct-Tokenizer
We'll use the fast tokenizer provided by Xenova, so again navigate to /workspace/models/
and clone this repository.
Clone this Repo
We'll make a directory to hold code on our boot disk. Run mkdir ~/dev && cd ~/dev
and clone into this repository with git clone git@github.com:crusoecloud/dbrx_inference_tutorial.git && cd dbrx_inference_tutorial/
.
Peripherals
Before we jump into our inference tutorials, let's install some quality-of-life peripherals. First, run apt-get update
then apt-get install tmux
. We'll often have two or more processes running, so it'll be nice to have multiple windows to monitor each and tmux is a great solution for session and window management.
To manage dependencies, we'll use virtualenv
which can be installed with apt install python3-virtualenv
.
If you run into issues with storage, ncdu
is a useful tool for easy navigation.
Additionally, I recommend using ssh-remote with VSCode to connect and interact with remote code (unless you're a vim wizard).
vLLM
The fastest way to get up and running with DBRX-Instruct is vLLM. In a few steps, we'll have a high-performance, OpenAI-API compatible server up and running. For more details, reference the README in vLLM/
in this repo.
TGI
To serve DBRX-Instruct through text-generation-inference
by HuggingFace, refer to the README in tgi/
in this repo.
Cleaning Up
To delete our VM and the disk, we can simply run the following commands using the CLI: