OpenAI’s documentation on fine-tuning lays out the basic steps to begin fine-tuning an OpenAI model along with various helpful tips. Here was my process to successfully perform my first OpenAI fine-tuning task:
Step 1
The first step is setting the OpenAI API key with the following command. I found that using a GPT-Plus API key works better than a normal key:
export OPENAI_API_KEY="OPENAI_API_KEY"
Step 2
The second step is to create a training dataset – I used a CSV file. Using a comma as the delimiter is likely to cause problems since the prompts and completions may contain commas, so using a “/” or “*” as the delimiter is better.
Step 3
The third step is to use OpenAI’s tool to properly format the dataset with the command:
openai tools fine_tunes.prepare_data -f LOCAL_FILE
It provided multiple recommendations to my dataset – I accepted all of them.
Step 4
Finally, start the fine-tuning job with the following:
openai api fine_tunes.create -t TRAIN_FILE_ID_OR_PATH -m BASE_MODEL
The stream disconnected 3 or so times, but I easily resumed the task by entering the given command. In total, it took around 2-3 minutes.
Results
I saw changes in the output with only five entries in my CSV file. For the prompts, I gave it 5 unformatted bible verses with no reference or verse numbers, and the completions properly added in-line verse numbers along with the verse reference and bible version in parenthesis. Given a random bible verse, the target output is:
16 For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life. (John 3:16, NIV)
After fine-tuning, it produced:
For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life. (John 3:16, NIV)
Notice the missing verse number at the beginning. Significantly more datasets will be needed to achieve the ideal output.