Ubuntu - How to POST a fine-tuning job to OpenAI API using the correct file format?

Daan
October 17, 2023
340 views
0 votes
2 Answers

I am new to the OpenAI API and I am trying to understand fine-tuning to use the API.

I checked what finetuning files I have available using this request in VS Code Rest Client:

GET https://api.openai.com/v1/files
Authorization: Bearer {{key}}

Here is part of the Response:

{
  "object": "list",
  "data": [
    {
      "object": "file",
      "id": "file-2FLvF7VJSBsEOFSwJAf1XXXX",
      "purpose": "fine-tune",
      "filename": "mydata.jsonl",
      "bytes": 493,
      "created_at": 1697459301,
      "status": "processed",
      "status_details": null
    },
    {
....

Apparently, I have a file object with id "file-2FLvF7VJSBsEOFSwJAf1XXXX" (last four characters I want to keep secret) for a "purpose" named "fine-tune". Sounds great, so let’s use that in a call.

POST https://api.openai.com/v1/fine_tuning/jobs
Content-Type: application/json
Authorization: Bearer {{key}}

{
  "training_file": "file-2FLvF7VJSBsEOFSwJAf1XXXX",
  "model": "gpt-3.5-turbo"
}

This is the result:

{
  "error": {
    "message": "File 'file-2FLvF7VJSBsEOFSwJAf1XXXX' is in prompt-completion format. The model gpt-3.5-turbo-0613 requires data in the chat-completion format.",
    "type": "invalid_request_error",
    "param": null,
    "code": "invalid_file_format"
  }
}

I got the error. That data is indeed in prompt-completion instead of chat completion format:

{"prompt": "What is the closest planet to the sun?", "completion": "The closest planet to the sun is Mercury."}
{"prompt": "How many moons does Mars have?", "completion": "Mars has two moons."}
{"prompt": "What is the largest planet in our solar system?", "completion": "The largest planet in our solar system is Jupiter."}
{"prompt": "What is the main component of Saturn's rings?", "completion": "The main component of Saturn's rings is ice particles, with some rocky debris and dust."}

However, it is still not clear how to fix this. The chat completion format is simply not accepted. Here is how I discovered this using this jsonl file

{"chat": "Translate the following English text to French: 'Hello, how are you?'", "completion": "Bonjour, comment ça va?"}
{"chat": "What's the capital of France?", "completion": "Paris"}
{"chat": "Solve for x: 2x + 5 = 11", "completion": "x = 3"}

which I tried to upload by running this in my wsl (Ubuntu) console:

curl https://api.openai.com/v1/files   -H "Authorization: Bearer [MY_KEY]"   -F purpose="fine-tune"   -F file="@fineTune.jsonl"

giving this error:

{
  "error": {
    "message": "Unexpected file format, expected either prompt/completion pairs or chat messages.",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

So how can I possibly solve this?

All I want to is to get a call like this:

POST https://api.openai.com/v1/fine_tuning/jobs
Content-Type: application/json
Authorization: Bearer {{key}}

{
  "training_file": "file-2FLvF7VJSBsEOFSwJAf1XXXX",
  "model": "gpt-3.5-turbo"
}

working….. But I don’t succeed. So please help me getting this working. Please come up with a fully working example.

Answers

- RokBenko
- October 17, 2023 at 5:59 pm
- 0 votes
0
Problem

You want to fine-tune a Chat Completions model, but your dataset format is not correct.

Solution

As of today, fine-tuning is available for the following models, as stated in the official OpenAI documentation:
- gpt-3.5-turbo-0613 <– Chat Completions model
- babbage-002 <– Completions model
- davinci-002 <– Completions model
• If you choose a Chat Completions model for fine-tuning, your dataset should be in chat-completion format, as follows:
```
{"messages": [{"role": "system", "content": "<system content here>"}, {"role": "user", "content": "<user content here>"}]}
{"messages": [{"role": "system", "content": "<system content here>"}, {"role": "user", "content": "<user content here>"}]}
{"messages": [{"role": "system", "content": "<system content here>"}, {"role": "user", "content": "<user content here>"}]}
```
Note: The gpt-3.5-turbo is not available for fine-tuning. Use gpt-3.5-turbo-0613 instead.

• If you choose a Completions model for fine-tuning, your dataset should be in prompt-completion format, as follows:
```
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
```
Also, be careful which OpenAI API endpoint you use.

• If you choose a Chat Completions model for fine-tuning, use the fine-tuned model with the following endpoint:
```
POST https://api.openai.com/v1/chat/completions
```
• If you choose a Completions model for fine-tuning, use the fine-tuned model with the following endpoint:
```
POST https://api.openai.com/v1/completions
```
Login or Signup to reply.

- JeremyFiel
- October 17, 2023 at 6:13 pm
- 0 votes
0
The conversation chat format is required to fine-tune gpt-3.5-turbo
```
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
```
```
curl -X POST https://api.openai.com/v1/files 
-H 'authorization: Bearer <token>' 
-F file="./file.jsonl" 
-F purpose="fine-tune"
```
After the file is processed, you can create a fine-tuning model.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Ubuntu – How to POST a fine-tuning job to OpenAI API using the correct file format?

Answers

Problem

Solution