AI Prompt Formatter
Build multi-turn conversations with system prompts and format them for ChatML, Claude, Llama 3, Mistral, Gemini, or the OpenAI messages API.
Used by OpenAI fine-tuning, vLLM, Mistral
Input
Formatted Output
<|im_start|>user <|im_end|> <|im_start|>assistant <|im_end|>
Tip: Click a role badge (user / assistant) to toggle it. The format updates live.
LLM Prompt Format Reference
Each LLM family uses a distinct prompt template that was baked in during instruction fine-tuning. Sending your prompt in the wrong format — even with correct content — will cause the model to produce worse results because the special tokens it expects to see are absent. Below is a quick reference for when to use each format.
| Format | Models | Use when |
|---|---|---|
| ChatML | GPT-4, GPT-3.5, OpenHermes, vLLM | OpenAI fine-tuning, vLLM inference, Ollama |
| Claude | Claude 1, Claude 2 (legacy) | Anthropic completions API (legacy) |
| Llama 3 | Llama 3, Llama 3.1, Llama 3.2 | Meta Llama 3 family via Groq, Together, Replicate |
| Mistral | Mistral 7B, Mixtral 8x7B, Mistral Nemo | Mistral instruction models via Mistral AI API or self-hosted |
| Gemini | Gemini 1.5 Pro/Flash, Gemini 2.0 | Google AI Studio or Vertex AI Gemini API |
| Raw JSON | Any OpenAI-compatible endpoint | OpenAI, Together AI, Fireworks, Perplexity APIs |
Writing Effective System Prompts
The system prompt sets the model's persona, constraints, and behavioral defaults. It runs on every call, so token efficiency matters at scale. These patterns consistently produce better results across all model families:
Define role before constraints
Start with who the model is ('You are a senior backend engineer') before listing what it cannot do. Role-first prompts produce more coherent personas.
Specify output format explicitly
If you want JSON, markdown tables, or code blocks, say so in the system prompt. Don't rely on user messages to restate formatting requirements every turn.
Calibrate verbosity
Tell the model how long responses should be. 'Be concise, two paragraphs max' or 'Provide exhaustive detail with examples' are both valid. Without guidance, models default to medium verbosity.
Avoid negations where possible
"Always cite sources" outperforms "Don't make things up". Models respond better to positive instructions than prohibition lists.
Multi-Turn Conversation Design
Multi-turn conversation history lets the model maintain context across exchanges. However, every turn you include adds to the token count. For chat applications, a sliding window strategy — keeping only the last N turns — balances context quality against cost. For agentic workflows, summarizing earlier turns into a compressed memory block before appending new turns is a common optimization.
When building few-shot examples, use the user/assistant turn structure rather than stuffing examples into the system prompt. Models trained with RLHF respond better to examples in the conversation flow because that structure matches their training distribution. Keep few-shot examples short and representative of the output format you want — not the edge cases you fear.
Frequently Asked Questions
What is ChatML and when should I use it?
ChatML (Chat Markup Language) is a prompt format developed by OpenAI and widely adopted for fine-tuning and inference. It uses special tokens — <|im_start|>, <|im_end|> — to delimit system, user, and assistant turns. Use ChatML when fine-tuning models with OpenAI's fine-tuning API, serving models via vLLM with the ChatML template, or working with Mistral's base model checkpoint. It is also the default format for many open-source inference servers including Ollama and LM Studio.
What is the difference between the Claude format and ChatML?
Claude uses a simpler Human/Assistant format without special tokens. Each user turn is prefixed with '\n\nHuman:' and each assistant turn with '\n\nAssistant:'. The system prompt is prepended to the first Human turn rather than as a separate role. This format is used in Anthropic's legacy completions API. The newer Messages API uses a structured JSON format similar to OpenAI's, which is reflected in the Raw JSON output of this tool.
How does Llama 3 prompt formatting differ from Llama 2?
Llama 3 introduced new special tokens: <|begin_of_text|>, <|start_header_id|>, <|end_header_id|>, and <|eot_id|>. This is a major change from the Llama 2 [INST] / [/INST] format (which is now used by Mistral). Llama 3 supports an explicit system role as a first-class turn, whereas Llama 2 required embedding system instructions inside the first [INST] block. Using the wrong format for a given model version will produce degraded results because the model was fine-tuned specifically on its template.
What format should I use for Gemini API calls?
The Gemini API uses a JSON format with a 'contents' array where each item has a 'role' ('user' or 'model') and a 'parts' array containing text objects. System instructions are passed separately via the 'systemInstruction' field in the request body. This tool outputs a ready-to-paste JSON object that matches the Gemini generateContent request body structure. Note that Gemini uses 'model' instead of 'assistant' for the AI role.
Can I use Raw JSON output directly with the OpenAI API?
Yes. The Raw JSON output matches the 'messages' array format expected by the OpenAI Chat Completions API. Each object has 'role' ('system', 'user', or 'assistant') and 'content' fields. You can copy this array and pass it as the 'messages' parameter in your API call. The same format is compatible with any OpenAI-compatible API, including Together AI, Fireworks AI, Perplexity, and most open-source serving frameworks.