Getting Started with Weights & Biases for Machine Learning Experimentation

This guide introduces you to using Weights & Biases (W&B) for tracking machine learning experiments, specifically focusing on fine-tuning a Large Language Model (LLM) with the transformers library.

If your transformers pipeline is already set up, feel free to skip ahead to the Weights & Biases Configuration section.

Initial Setup

Create an Account

Ensure you have a Weights & Biases account. You can sign up at the Weights & Biases website.

Prerequisites

To begin, update your system and install the Python development headers:

Next, install the necessary Python libraries. If you've just created your environment, use the following command:

Importing Modules

For better organization, begin by importing all required modules:

Model and Dataset Preparation

In this section, we'll prepare the model and dataset for experimentation. While we won't delve into the details of fine-tuning LLMs here, you can refer to our detailed guide on best practices for LLM fine-tuning in our Cookbook on XXX.

Configuring Bits and Bytes

Set up the bitsandbytes configuration:

Model Preparation

Load and configure your model:

Dataset Preparation

Prepare your dataset:

For simplicity, we'll use a smaller subset of the dataset:

Tokenizer Setup

Initialize the tokenizer:

Weights & Biases Configuration

Environment Variables

Set the necessary environment variables to access Weights & Biases and save model artifacts. Follow the instructions here to find your API key. For a complete list of environment variables, refer to the documentation:

Logging In

Tracking Your First Experiment

Configuring Training Parameters

Set up your training parameters using SFTConfig and configure logging to W&B. Remember to set report_to to wandb:

Initializing the Run

Initialize your W&B run:

Starting the Training Process

Start the training process:

Finalizing the Run

Remember to finish your W&B run if you're using a notebook:

Visualizing Results

After training, check your run on the W&B dashboard under your project:

dashboard with one experiment

Running multiple experiments in the same project fills your dashboard with more runs and training curves.

few experiments dashboard

You can still see only one run by clicking the run name in the project dashboard.

For detailed information about all of the runs, expand the Runs panel (shortcut: Ctrl+J):

table with all logged information

Logging Custom Metrics

By default, Weights & Biases automatically logs a variety of metrics related to training, evaluation, and system performance. These include metrics like mean token accuracy, training and test losses, GPU power usage, and GPU temperature, among others. However, you can also log your own metrics. To do this, subclass WandbCallback and use methods like on_train_end to log metrics with wandb.log(). You can check out all of the methods in the Huggingface docs.

Organize metrics into categories by prefixing names:

You can also create new categories:

Downloading and Using a Model Artifact

To download and use a model from W&B:

Conclusion

In this guide, we explored the fundamentals of using Weights & Biases for experiment tracking. For more in-depth information and additional features, be sure to visit the Weights & Biases documentation.