# Evaluation## OverviewEvaluation enables systematic testing of your chatbots. Define test cases, run them automatically, and track performance metrics over time to ensure quality and catch regressions.## Key Concepts| Term | Definition |
| ---------------- | --------------------------------------------------------- |
| Test Cases | Defined input/output pairs that verify chatbot behavior |
| Coverage | Percentage of test cases passing (75% threshold for Pass) |
| Regressions | Tests that previously passed but now fail |
| Improvements | Tests that previously failed but now pass |## Project CardsEvaluation projects display:* Project name and description
- Metrics grid: Total Tests, Coverage %, Regressions, Improvements
- Pass/Fail badge (based on 75% threshold)
- Last run information
- Action buttons## Creating an Evaluation Project1) Click New Evaluation Project
- Enter a name and description
- Select the chatbot to evaluate
- Click CreateNew projects start with no test cases.## Managing Test CasesIn your project, navigate to Test Cases:1) Add test cases with:
- Input: The question or prompt
- Expected Output: What the chatbot should respond (or key elements)
- Include diverse cases:
- Common questions
- Edge cases
- Known problem areas
- Happy path scenariosMore comprehensive test cases = better coverage.### Test Case Examples| Input | Expected Output |
| ------------------------------- | -------------------------------- |
| "What are your business hours?" | Contains "9 AM" and "5 PM" |
| "How do I reset my password?" | Contains "settings" or "account" |
| "What's the return policy?" | Contains "30 days" |## Running Evaluations### Manual Runs1) Click Run Tests
- Monitor progress
- View results in Run History### Scheduled Runs1) Click Schedule
- Select frequency:
- Every minute (for testing)
- Hourly
- Daily
- Weekly
- Scheduled runs appear in the Jobs section## Analyzing ResultsThe Run History tab shows all past runs:* Click any run for detailed results
- See which tests passed and failed
- Compare actual responses to expected
- Track metrics over time### Using Results to Improve1) Identify patterns in failures
- Measure improvement from configuration changes
- Catch regressions early before users notice
- Track trends over time## Additional Features### Create RAG DataGenerate synthetic test data to improve coverage. Useful when you need more test cases but don't have real user data.### Quick Links (Sidebar)| Link | Purpose |
| ------------------------ | ----------------------------------------------- |
| User Feedback Logs | Real user feedback (inspiration for test cases) |
| Agent Suggestions | AI-recommended improvements |
| Create from Feedback | Turn user feedback directly into test cases |## Evaluation Workflow```
Create Project → Add Test Cases → Run Tests → Analyze → Improve → Repeat
2) Add initial test cases (start with 10-20)
3) Run baseline evaluation
4) Make improvements to chatbot
5) Run again and compare
6) Add more test cases based on findings
7) Schedule regular runs---**Related:** [Chatbots](./05-chatbots.md)