Hallucination Guardrail
Prevent and detect AI hallucinations in your CrewAI tasks
Overview
The Hallucination Guardrail is an enterprise feature that validates AI-generated content to ensure it’s grounded in facts and doesn’t contain hallucinations. It analyzes task outputs against reference context and provides detailed feedback when potentially hallucinated content is detected.
What are Hallucinations?
AI hallucinations occur when language models generate content that appears plausible but is factually incorrect or not supported by the provided context. The Hallucination Guardrail helps prevent these issues by:
- Comparing outputs against reference context
- Evaluating faithfulness to source material
- Providing detailed feedback on problematic content
- Supporting custom thresholds for validation strictness
Basic Usage
Setting Up the Guardrail
Adding to Tasks
Advanced Configuration
Custom Threshold Validation
For stricter validation, you can set a custom faithfulness threshold (0-10 scale):
Including Tool Response Context
When your task uses tools, you can include tool responses for more accurate validation:
How It Works
Validation Process
- Context Analysis: The guardrail compares task output against the provided reference context
- Faithfulness Scoring: Uses an internal evaluator to assign a faithfulness score (0-10)
- Verdict Determination: Determines if content is faithful or contains hallucinations
- Threshold Checking: If a custom threshold is set, validates against that score
- Feedback Generation: Provides detailed reasons when validation fails
Validation Logic
- Default Mode: Uses verdict-based validation (FAITHFUL vs HALLUCINATED)
- Threshold Mode: Requires faithfulness score to meet or exceed the specified threshold
- Error Handling: Gracefully handles evaluation errors and provides informative feedback
Guardrail Results
The guardrail returns structured results indicating validation status:
Result Properties
- valid: Boolean indicating whether the output passed validation
- feedback: Detailed explanation when validation fails, including:
- Faithfulness score
- Verdict classification
- Specific reasons for failure
Integration with Task System
Automatic Validation
When a guardrail is added to a task, it automatically validates the output before the task is marked as complete:
Event Tracking
The guardrail integrates with CrewAI’s event system to provide observability:
- Validation Started: When guardrail evaluation begins
- Validation Completed: When evaluation finishes with results
- Validation Failed: When technical errors occur during evaluation
Best Practices
Context Guidelines
Provide Comprehensive Context
Include all relevant factual information that the AI should base its output on:
Keep Context Relevant
Only include information directly related to the task to avoid confusion:
Update Context Regularly
Ensure your reference context reflects current, accurate information.
Threshold Selection
Start with Default Validation
Begin without custom thresholds to understand baseline performance.
Adjust Based on Requirements
- High-stakes content: Use threshold 8-10 for maximum accuracy
- General content: Use threshold 6-7 for balanced validation
- Creative content: Use threshold 4-5 or default verdict-based validation
Monitor and Iterate
Track validation results and adjust thresholds based on false positives/negatives.
Performance Considerations
Impact on Execution Time
- Validation Overhead: Each guardrail adds ~1-3 seconds per task
- LLM Efficiency: Choose efficient models for evaluation (e.g., gpt-4o-mini)
Cost Optimization
- Model Selection: Use smaller, efficient models for guardrail evaluation
- Context Size: Keep reference context concise but comprehensive
- Caching: Consider caching validation results for repeated content
Troubleshooting
Need Help?
Contact our support team for assistance with hallucination guardrail configuration or troubleshooting.