Model Playground PRO

The Model Playground lets you compare two models side by side in real time. Type a prompt once, see both responses simultaneously with streaming, and make informed decisions about which model to use for each agent.

Side-by-Side Comparison

The playground uses a CSS grid two-panel layout. Each panel represents one model, and both panels respond to the same prompt at the same time. This eliminates the guesswork of switching between models — you see exactly how each model handles the same input, making differences in tone, accuracy, verbosity, and speed immediately apparent.

The two panels are rendered side by side on desktop and stacked vertically on mobile, ensuring a usable experience on any screen size.

Panel Layout

Each panel contains three elements:

Model Selector Dropdown — Choose any model available on your VPS instance. The dropdown is populated dynamically with your configured models. Select a different model for each panel to compare them, or select the same model with different temperature settings to see how randomness affects output.
Response Area — The model's response appears here as it streams in. Responses are rendered with full Markdown support including headings, code blocks, lists, bold/italic text, and links. The response area scrolls automatically as new content arrives during streaming.
Response Time Badge — A small badge displayed after the response completes, showing the total time from prompt submission to final token. This gives you an objective latency measurement for each model, helping you balance quality against speed.

Shared Prompt

The prompt textarea sits at the bottom of the playground, below both panels. When you submit a prompt, it is sent to both selected models simultaneously. This shared-prompt design ensures a fair comparison — both models receive the exact same input at the exact same time.

The prompt field supports multi-line input for complex queries. You can paste code snippets, multi-paragraph questions, or structured prompts with instructions and examples.

Streaming Responses

Both models stream their responses in real time using Server-Sent Events (SSE). You see tokens appear in each panel as they are generated, giving you a live view of how each model constructs its answer. Streaming makes the comparison more informative than waiting for complete responses — you can observe differences in how each model structures its thinking and output.

If one model finishes before the other, its response time badge appears immediately while the other model continues streaming. This makes speed differences visually obvious.

Advanced Settings

Fine-tune each model's behavior with advanced parameters:

Temperature

Adjustable from 0 to 2 using a slider. Temperature controls the randomness of the model's output:

0 — Deterministic. The model always picks the most likely next token. Best for factual queries and consistent outputs.
0.5 - 0.7 — Balanced. Good default for conversational agents. Responses are natural without being unpredictable.
1.0 — Standard randomness. The model samples from the full probability distribution.
1.5 - 2.0 — High creativity. Useful for brainstorming, creative writing, or exploring diverse responses. May produce less coherent output.

Temperature is set per-panel, so you can compare the same model at temperature 0.3 versus 1.2 to see how randomness affects its answers for your specific use case.

Max Response Length

Set a maximum token limit for each model's response. This is useful for comparing how models handle length constraints — some models produce concise answers naturally, while others need a length cap to avoid verbose output.

Adjusting the max response length also helps you estimate costs and latency for production usage. A shorter max length means faster responses and lower token consumption.

Tip: When comparing models for a specific agent, use prompts that represent real conversations your users will have. Generic test prompts like "tell me a joke" do not reveal how models perform on your actual domain. Use questions from your support tickets or Knowledge Base queries instead.

Save Comparisons

Save any comparison for later review by clicking the save button after both responses complete. A saved comparison captures:

The prompt text
Both model selections
Both complete responses
Response times for each model
Temperature and max length settings
Timestamp of the comparison

Saved comparisons are stored in your account and accessible from the comparison history view. Use them to build a reference library of model behavior across different prompt types, or share them with your team to discuss which model fits best for each agent.

Comparison History

The history view lists all your saved comparisons in reverse chronological order. Each entry shows the prompt preview, the two models compared, and their response times. Click any entry to expand it and view the full prompt and responses.

Use comparison history to track how your model preferences evolve over time, or to revisit past comparisons when onboarding a new team member who needs to understand why a specific model was chosen for an agent.

Practical Use Cases

Model selection for new agents — Before assigning a model in the Agent Builder, test candidates in the playground with domain-specific prompts.
Temperature tuning — Compare the same model at different temperatures to find the optimal balance between creativity and consistency for your use case.
Regression testing — When a model provider releases an update, re-run your saved prompts to verify that response quality has not degraded.
Cost optimization — Compare a larger, more expensive model against a smaller, faster one. If the smaller model produces acceptable quality for your use case, you save on per-token costs.

Workflow suggestion: Start in the playground to identify your preferred model, then go to the Agent Builder to create an agent with that model. After deployment, use Analytics to validate that real-world performance matches your playground results.