Model Playground PRO
The Model Playground lets you compare two models side by side in real time. Type a prompt once, see both responses simultaneously with streaming, and make informed decisions about which model to use for each agent.
Side-by-Side Comparison
The playground uses a CSS grid two-panel layout. Each panel represents one model, and both panels respond to the same prompt at the same time. This eliminates the guesswork of switching between models — you see exactly how each model handles the same input, making differences in tone, accuracy, verbosity, and speed immediately apparent.
The two panels are rendered side by side on desktop and stacked vertically on mobile, ensuring a usable experience on any screen size.
Panel Layout
Each panel contains three elements:
- Model Selector Dropdown — Choose any model available on your VPS instance. The dropdown is populated dynamically with your configured models. Select a different model for each panel to compare them, or select the same model with different temperature settings to see how randomness affects output.
- Response Area — The model's response appears here as it streams in. Responses are rendered with full Markdown support including headings, code blocks, lists, bold/italic text, and links. The response area scrolls automatically as new content arrives during streaming.
- Response Time Badge — A small badge displayed after the response completes, showing the total time from prompt submission to final token. This gives you an objective latency measurement for each model, helping you balance quality against speed.
Shared Prompt
The prompt textarea sits at the bottom of the playground, below both panels. When you submit a prompt, it is sent to both selected models simultaneously. This shared-prompt design ensures a fair comparison — both models receive the exact same input at the exact same time.
The prompt field supports multi-line input for complex queries. You can paste code snippets, multi-paragraph questions, or structured prompts with instructions and examples.
Streaming Responses
Both models stream their responses in real time using Server-Sent Events (SSE). You see tokens appear in each panel as they are generated, giving you a live view of how each model constructs its answer. Streaming makes the comparison more informative than waiting for complete responses — you can observe differences in how each model structures its thinking and output.
If one model finishes before the other, its response time badge appears immediately while the other model continues streaming. This makes speed differences visually obvious.
Advanced Settings
Fine-tune each model's behavior with advanced parameters:
Temperature
Adjustable from 0 to 2 using a slider. Temperature controls the randomness of the model's output:
- 0 — Deterministic. The model always picks the most likely next token. Best for factual queries and consistent outputs.
- 0.5 - 0.7 — Balanced. Good default for conversational agents. Responses are natural without being unpredictable.
- 1.0 — Standard randomness. The model samples from the full probability distribution.
- 1.5 - 2.0 — High creativity. Useful for brainstorming, creative writing, or exploring diverse responses. May produce less coherent output.
Temperature is set per-panel, so you can compare the same model at temperature 0.3 versus 1.2 to see how randomness affects its answers for your specific use case.
Max Response Length
Set a maximum token limit for each model's response. This is useful for comparing how models handle length constraints — some models produce concise answers naturally, while others need a length cap to avoid verbose output.
Adjusting the max response length also helps you estimate costs and latency for production usage. A shorter max length means faster responses and lower token consumption.
Save Comparisons
Save any comparison for later review by clicking the save button after both responses complete. A saved comparison captures:
- The prompt text
- Both model selections
- Both complete responses
- Response times for each model
- Temperature and max length settings
- Timestamp of the comparison
Saved comparisons are stored in your account and accessible from the comparison history view. Use them to build a reference library of model behavior across different prompt types, or share them with your team to discuss which model fits best for each agent.
Comparison History
The history view lists all your saved comparisons in reverse chronological order. Each entry shows the prompt preview, the two models compared, and their response times. Click any entry to expand it and view the full prompt and responses.
Use comparison history to track how your model preferences evolve over time, or to revisit past comparisons when onboarding a new team member who needs to understand why a specific model was chosen for an agent.
Practical Use Cases
- Model selection for new agents — Before assigning a model in the Agent Builder, test candidates in the playground with domain-specific prompts.
- Temperature tuning — Compare the same model at different temperatures to find the optimal balance between creativity and consistency for your use case.
- Regression testing — When a model provider releases an update, re-run your saved prompts to verify that response quality has not degraded.
- Cost optimization — Compare a larger, more expensive model against a smaller, faster one. If the smaller model produces acceptable quality for your use case, you save on per-token costs.
Related Documentation
- Pro Features Overview — Full list of Pro capabilities
- Agent Builder — Create agents with your chosen model
- API Access — Use models programmatically via the API
- Analytics — Track model performance in production