Patronus AI Evaluation
Overview
Patronus AI provides comprehensive evaluation and monitoring capabilities for CrewAI agents, enabling you to assess model outputs, agent behaviors, and overall system performance. This integration allows you to implement continuous evaluation workflows that help maintain quality and reliability in production environments.Key Features
- Automated Evaluation: Real-time assessment of agent outputs and behaviors
- Custom Criteria: Define specific evaluation criteria tailored to your use cases
- Performance Monitoring: Track agent performance metrics over time
- Quality Assurance: Ensure consistent output quality across different scenarios
- Safety & Compliance: Monitor for potential issues and policy violations
Evaluation Tools
Patronus provides three main evaluation tools for different use cases:- PatronusEvalTool: Allows agents to select the most appropriate evaluator and criteria for the evaluation task.
- PatronusPredefinedCriteriaEvalTool: Uses predefined evaluator and criteria specified by the user.
- PatronusLocalEvaluatorTool: Uses custom function evaluators defined by the user.
Installation
To use these tools, you need to install the Patronus package:Steps to Get Started
To effectively use the Patronus evaluation tools, follow these steps:- Install Patronus: Install the Patronus package using the command above.
- Set Up API Key: Set your Patronus API key as an environment variable.
- Choose the Right Tool: Select the appropriate Patronus evaluation tool based on your needs.
- Configure the Tool: Configure the tool with the necessary parameters.
Examples
Using PatronusEvalTool
The following example demonstrates how to use thePatronusEvalTool
, which allows agents to select the most appropriate evaluator and criteria:
Code
Using PatronusPredefinedCriteriaEvalTool
The following example demonstrates how to use thePatronusPredefinedCriteriaEvalTool
, which uses predefined evaluator and criteria:
Code
Using PatronusLocalEvaluatorTool
The following example demonstrates how to use thePatronusLocalEvaluatorTool
, which uses custom function evaluators:
Code
Parameters
PatronusEvalTool
ThePatronusEvalTool
does not require any parameters during initialization. It automatically fetches available evaluators and criteria from the Patronus API.
PatronusPredefinedCriteriaEvalTool
ThePatronusPredefinedCriteriaEvalTool
accepts the following parameters during initialization:
- evaluators: Required. A list of dictionaries containing the evaluator and criteria to use. For example:
[{"evaluator": "judge", "criteria": "contains-code"}]
.
PatronusLocalEvaluatorTool
ThePatronusLocalEvaluatorTool
accepts the following parameters during initialization:
- patronus_client: Required. The Patronus client instance.
- evaluator: Optional. The name of the registered local evaluator to use. Default is an empty string.
- evaluated_model_gold_answer: Optional. The gold answer to use for evaluation. Default is an empty string.
Usage
When using the Patronus evaluation tools, you provide the model input, output, and context, and the tool returns the evaluation results from the Patronus API. For thePatronusEvalTool
and PatronusPredefinedCriteriaEvalTool
, the following parameters are required when calling the tool:
- evaluated_model_input: The agent’s task description in simple text.
- evaluated_model_output: The agent’s output of the task.
- evaluated_model_retrieved_context: The agent’s context.
PatronusLocalEvaluatorTool
, the same parameters are required, but the evaluator and gold answer are specified during initialization.