Documentation

Complete guide to using Conforma AI - LLM Benchmark & Analytics Platform

Get started with Conforma AI in 5 minutes

Navigate to Providers and add your API keys for OpenRouter, Anthropic, OpenAI, or Google.

Go to Tasks and create evaluation tasks with input/output pairs or import from CSV.

Navigate to Benchmarks → New, select tasks and models, then execute.

View detailed results in Results with charts, metrics, and model comparisons.

Start with a small set of tasks (5-10) to test your setup before scaling to hundreds of benchmarks.

Evaluation scenarios with input prompts and expected outputs. Tasks can be reused across multiple benchmarks.

Collections of tasks executed against selected LLM models. Results are compared using similarity metrics.

LLM API integrations (OpenRouter, Anthropic, OpenAI, Google). Configure once, use across all benchmarks.

Detailed metrics including similarity scores, response times, token usage, and cost analysis.