I am new to the OpenAI API and I am trying to understand fine-tuning to use the API.
I checked what finetuning files I have available using this request in VS Code Rest Client:
GET https://api.openai.com/v1/files
Authorization: Bearer {{key}}
Here is part of the Response:
{
"object": "list",
"data": [
{
"object": "file",
"id": "file-2FLvF7VJSBsEOFSwJAf1XXXX",
"purpose": "fine-tune",
"filename": "mydata.jsonl",
"bytes": 493,
"created_at": 1697459301,
"status": "processed",
"status_details": null
},
{
....
Apparently, I have a file object with id "file-2FLvF7VJSBsEOFSwJAf1XXXX" (last four characters I want to keep secret) for a "purpose" named "fine-tune". Sounds great, so let’s use that in a call.
POST https://api.openai.com/v1/fine_tuning/jobs
Content-Type: application/json
Authorization: Bearer {{key}}
{
"training_file": "file-2FLvF7VJSBsEOFSwJAf1XXXX",
"model": "gpt-3.5-turbo"
}
This is the result:
{
"error": {
"message": "File 'file-2FLvF7VJSBsEOFSwJAf1XXXX' is in prompt-completion format. The model gpt-3.5-turbo-0613 requires data in the chat-completion format.",
"type": "invalid_request_error",
"param": null,
"code": "invalid_file_format"
}
}
I got the error. That data is indeed in prompt-completion instead of chat completion format:
{"prompt": "What is the closest planet to the sun?", "completion": "The closest planet to the sun is Mercury."}
{"prompt": "How many moons does Mars have?", "completion": "Mars has two moons."}
{"prompt": "What is the largest planet in our solar system?", "completion": "The largest planet in our solar system is Jupiter."}
{"prompt": "What is the main component of Saturn's rings?", "completion": "The main component of Saturn's rings is ice particles, with some rocky debris and dust."}
However, it is still not clear how to fix this. The chat completion format is simply not accepted. Here is how I discovered this using this jsonl file
{"chat": "Translate the following English text to French: 'Hello, how are you?'", "completion": "Bonjour, comment ça va?"}
{"chat": "What's the capital of France?", "completion": "Paris"}
{"chat": "Solve for x: 2x + 5 = 11", "completion": "x = 3"}
which I tried to upload by running this in my wsl (Ubuntu) console:
curl https://api.openai.com/v1/files -H "Authorization: Bearer [MY_KEY]" -F purpose="fine-tune" -F file="@fineTune.jsonl"
giving this error:
{
"error": {
"message": "Unexpected file format, expected either prompt/completion pairs or chat messages.",
"type": "invalid_request_error",
"param": null,
"code": null
}
}
So how can I possibly solve this?
All I want to is to get a call like this:
POST https://api.openai.com/v1/fine_tuning/jobs
Content-Type: application/json
Authorization: Bearer {{key}}
{
"training_file": "file-2FLvF7VJSBsEOFSwJAf1XXXX",
"model": "gpt-3.5-turbo"
}
working….. But I don’t succeed. So please help me getting this working. Please come up with a fully working example.
2
Answers
Problem
You want to fine-tune a Chat Completions model, but your dataset format is not correct.
Solution
As of today, fine-tuning is available for the following models, as stated in the official OpenAI documentation:
gpt-3.5-turbo-0613
<– Chat Completions modelbabbage-002
<– Completions modeldavinci-002
<– Completions model• If you choose a Chat Completions model for fine-tuning, your dataset should be in chat-completion format, as follows:
Note: The
gpt-3.5-turbo
is not available for fine-tuning. Usegpt-3.5-turbo-0613
instead.• If you choose a Completions model for fine-tuning, your dataset should be in prompt-completion format, as follows:
Also, be careful which OpenAI API endpoint you use.
• If you choose a Chat Completions model for fine-tuning, use the fine-tuned model with the following endpoint:
• If you choose a Completions model for fine-tuning, use the fine-tuned model with the following endpoint:
After the file is processed, you can create a fine-tuning model.