Question 1

What is ChatML and when should I use it?

Accepted Answer

ChatML (Chat Markup Language) is a prompt format developed by OpenAI and widely adopted for fine-tuning and inference. It uses special tokens — <|im_start|>, <|im_end|> — to delimit system, user, and assistant turns. Use ChatML when fine-tuning models with OpenAI's fine-tuning API, serving models via vLLM with the ChatML template, or working with Mistral's base model checkpoint. It is also the default format for many open-source inference servers including Ollama and LM Studio.

Question 2

What is the difference between the Claude format and ChatML?

Accepted Answer

Claude uses a simpler Human/Assistant format without special tokens. Each user turn is prefixed with '

Human:' and each assistant turn with '

Assistant:'. The system prompt is prepended to the first Human turn rather than as a separate role. This format is used in Anthropic's legacy completions API. The newer Messages API uses a structured JSON format similar to OpenAI's, which is reflected in the Raw JSON output of this tool.

Question 3

How does Llama 3 prompt formatting differ from Llama 2?

Accepted Answer

Llama 3 introduced new special tokens: <|begin_of_text|>, <|start_header_id|>, <|end_header_id|>, and <|eot_id|>. This is a major change from the Llama 2 [INST] / [/INST] format (which is now used by Mistral). Llama 3 supports an explicit system role as a first-class turn, whereas Llama 2 required embedding system instructions inside the first [INST] block. Using the wrong format for a given model version will produce degraded results because the model was fine-tuned specifically on its template.

Question 4

What format should I use for Gemini API calls?

Accepted Answer

The Gemini API uses a JSON format with a 'contents' array where each item has a 'role' ('user' or 'model') and a 'parts' array containing text objects. System instructions are passed separately via the 'systemInstruction' field in the request body. This tool outputs a ready-to-paste JSON object that matches the Gemini generateContent request body structure. Note that Gemini uses 'model' instead of 'assistant' for the AI role.

Question 5

Can I use Raw JSON output directly with the OpenAI API?

Accepted Answer

Yes. The Raw JSON output matches the 'messages' array format expected by the OpenAI Chat Completions API. Each object has 'role' ('system', 'user', or 'assistant') and 'content' fields. You can copy this array and pass it as the 'messages' parameter in your API call. The same format is compatible with any OpenAI-compatible API, including Together AI, Fireworks AI, Perplexity, and most open-source serving frameworks.

Format	Models	Use when
ChatML	GPT-4, GPT-3.5, OpenHermes, vLLM	OpenAI fine-tuning, vLLM inference, Ollama
Claude	Claude 1, Claude 2 (legacy)	Anthropic completions API (legacy)
Llama 3	Llama 3, Llama 3.1, Llama 3.2	Meta Llama 3 family via Groq, Together, Replicate
Mistral	Mistral 7B, Mixtral 8x7B, Mistral Nemo	Mistral instruction models via Mistral AI API or self-hosted
Gemini	Gemini 1.5 Pro/Flash, Gemini 2.0	Google AI Studio or Vertex AI Gemini API
Raw JSON	Any OpenAI-compatible endpoint	OpenAI, Together AI, Fireworks, Perplexity APIs

AI Prompt Formatter

LLM Prompt Format Reference

Writing Effective System Prompts

Multi-Turn Conversation Design

Frequently Asked Questions

What is ChatML and when should I use it?

What is the difference between the Claude format and ChatML?

How does Llama 3 prompt formatting differ from Llama 2?

What format should I use for Gemini API calls?

Can I use Raw JSON output directly with the OpenAI API?

Related Tools