OpenAI’s documentation on fine-tuning lays out the basic steps to begin fine-tuning an OpenAI model along with various helpful tips. Here was my process to successfully perform my first OpenAI fine-tuning task:

Step 1

The first step is setting the OpenAI API key with the following command. I found that using a GPT-Plus API key works better than a normal key:

export OPENAI_API_KEY="OPENAI_API_KEY"

Step 2

The second step is to create a training dataset – I used a CSV file. Using a comma as the delimiter is likely to cause problems since the prompts and completions may contain commas, so using a “/” or “*” as the delimiter is better.

Step 3

The third step is to use OpenAI’s tool to properly format the dataset with the command:

openai tools fine_tunes.prepare_data -f LOCAL_FILE

It provided multiple recommendations to my dataset – I accepted all of them.

Step 4

Finally, start the fine-tuning job with the following:

openai api fine_tunes.create -t TRAIN_FILE_ID_OR_PATH -m BASE_MODEL

The stream disconnected 3 or so times, but I easily resumed the task by entering the given command. In total, it took around 2-3 minutes.

Results

I saw changes in the output with only five entries in my CSV file. For the prompts, I gave it 5 unformatted bible verses with no reference or verse numbers, and the completions properly added in-line verse numbers along with the verse reference and bible version in parenthesis. Given a random bible verse, the target output is:

16 For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life. (John 3:16, NIV)

After fine-tuning, it produced:

For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life. (John 3:16, NIV)

Notice the missing verse number at the beginning. Significantly more datasets will be needed to achieve the ideal output.