Hallucination Guardrail

Overview

The Hallucination Guardrail is an enterprise feature that validates AI-generated content to ensure it’s grounded in facts and doesn’t contain hallucinations. It analyzes task outputs against reference context and provides detailed feedback when potentially hallucinated content is detected.

What are Hallucinations?

AI hallucinations occur when language models generate content that appears plausible but is factually incorrect or not supported by the provided context. The Hallucination Guardrail helps prevent these issues by:

Comparing outputs against reference context
Evaluating faithfulness to source material
Providing detailed feedback on problematic content
Supporting custom thresholds for validation strictness

Basic Usage

Setting Up the Guardrail

from crewai.tasks.hallucination_guardrail import HallucinationGuardrail
from crewai import LLM

# Basic usage - will use task's expected_output as context
guardrail = HallucinationGuardrail(
    llm=LLM(model="gpt-4o-mini")
)

# With explicit reference context
context_guardrail = HallucinationGuardrail(
    context="AI helps with various tasks including analysis and generation.",
    llm=LLM(model="gpt-4o-mini")
)

Adding to Tasks

from crewai import Task

# Create your task with the guardrail
task = Task(
    description="Write a summary about AI capabilities",
    expected_output="A factual summary based on the provided context",
    agent=my_agent,
    guardrail=guardrail  # Add the guardrail to validate output
)

Advanced Configuration

Custom Threshold Validation

For stricter validation, you can set a custom faithfulness threshold (0-10 scale):

# Strict guardrail requiring high faithfulness score
strict_guardrail = HallucinationGuardrail(
    context="Quantum computing uses qubits that exist in superposition states.",
    llm=LLM(model="gpt-4o-mini"),
    threshold=8.0  # Requires score >= 8 to pass validation
)

Including Tool Response Context

When your task uses tools, you can include tool responses for more accurate validation:

# Guardrail with tool response context
weather_guardrail = HallucinationGuardrail(
    context="Current weather information for the requested location",
    llm=LLM(model="gpt-4o-mini"),
    tool_response="Weather API returned: Temperature 22°C, Humidity 65%, Clear skies"
)

How It Works

Validation Process

Context Analysis: The guardrail compares task output against the provided reference context
Faithfulness Scoring: Uses an internal evaluator to assign a faithfulness score (0-10)
Verdict Determination: Determines if content is faithful or contains hallucinations
Threshold Checking: If a custom threshold is set, validates against that score
Feedback Generation: Provides detailed reasons when validation fails

Validation Logic

Default Mode: Uses verdict-based validation (FAITHFUL vs HALLUCINATED)
Threshold Mode: Requires faithfulness score to meet or exceed the specified threshold
Error Handling: Gracefully handles evaluation errors and provides informative feedback

Guardrail Results

The guardrail returns structured results indicating validation status:

# Example of guardrail result structure
{
    "valid": False,
    "feedback": "Content appears to be hallucinated (score: 4.2/10, verdict: HALLUCINATED). The output contains information not supported by the provided context."
}

Result Properties

valid: Boolean indicating whether the output passed validation
feedback: Detailed explanation when validation fails, including:
- Faithfulness score
- Verdict classification
- Specific reasons for failure

Integration with Task System

Automatic Validation

When a guardrail is added to a task, it automatically validates the output before the task is marked as complete:

# Task output validation flow
task_output = agent.execute_task(task)
validation_result = guardrail(task_output)

if validation_result.valid:
    # Task completes successfully
    return task_output
else:
    # Task fails with validation feedback
    raise ValidationError(validation_result.feedback)

Event Tracking

The guardrail integrates with CrewAI’s event system to provide observability:

Validation Started: When guardrail evaluation begins
Validation Completed: When evaluation finishes with results
Validation Failed: When technical errors occur during evaluation

Best Practices

Context Guidelines

Provide Comprehensive Context

Include all relevant factual information that the AI should base its output on:

context = """
Company XYZ was founded in 2020 and specializes in renewable energy solutions.
They have 150 employees and generated $50M revenue in 2023.
Their main products include solar panels and wind turbines.
"""

Keep Context Relevant

Only include information directly related to the task to avoid confusion:

# Good: Focused context
context = "The current weather in New York is 18°C with light rain."

# Avoid: Unrelated information
context = "The weather is 18°C. The city has 8 million people. Traffic is heavy."

Update Context Regularly

Ensure your reference context reflects current, accurate information.

Threshold Selection

Start with Default Validation

Begin without custom thresholds to understand baseline performance.

Adjust Based on Requirements

High-stakes content: Use threshold 8-10 for maximum accuracy
General content: Use threshold 6-7 for balanced validation
Creative content: Use threshold 4-5 or default verdict-based validation

Monitor and Iterate

Track validation results and adjust thresholds based on false positives/negatives.

Performance Considerations

Impact on Execution Time

Validation Overhead: Each guardrail adds ~1-3 seconds per task
LLM Efficiency: Choose efficient models for evaluation (e.g., gpt-4o-mini)

Cost Optimization

Model Selection: Use smaller, efficient models for guardrail evaluation
Context Size: Keep reference context concise but comprehensive
Caching: Consider caching validation results for repeated content

Troubleshooting

Validation Always Fails

False Positives (Valid Content Marked Invalid)

Evaluation Errors

Need Help?

Contact our support team for assistance with hallucination guardrail configuration or troubleshooting.

Getting Started

Features

Integration Docs

How-To Guides

Resources

Hallucination Guardrail

Overview

What are Hallucinations?

Basic Usage

Setting Up the Guardrail

Adding to Tasks

Advanced Configuration

Custom Threshold Validation

Including Tool Response Context

How It Works

Validation Process

Validation Logic

Guardrail Results

Result Properties

Integration with Task System

Automatic Validation

Event Tracking

Best Practices

Context Guidelines

Threshold Selection

Performance Considerations

Impact on Execution Time

Cost Optimization

Troubleshooting

Need Help?

Getting Started

Features

Integration Docs

How-To Guides

Resources

​Overview

​What are Hallucinations?

​Basic Usage

​Setting Up the Guardrail

​Adding to Tasks

​Advanced Configuration

​Custom Threshold Validation

​Including Tool Response Context

​How It Works

​Validation Process

​Validation Logic

​Guardrail Results

​Result Properties

​Integration with Task System

​Automatic Validation

​Event Tracking

​Best Practices

​Context Guidelines

​Threshold Selection

​Performance Considerations

​Impact on Execution Time

​Cost Optimization

​Troubleshooting

Need Help?

Overview

What are Hallucinations?

Basic Usage

Setting Up the Guardrail

Adding to Tasks

Advanced Configuration

Custom Threshold Validation

Including Tool Response Context

How It Works

Validation Process

Validation Logic

Guardrail Results

Result Properties

Integration with Task System

Automatic Validation

Event Tracking

Best Practices

Context Guidelines

Threshold Selection

Performance Considerations

Impact on Execution Time

Cost Optimization

Troubleshooting