Learn how to enable and use multimodal capabilities in your agents for processing images and other non-text content within the CrewAI framework.
multimodal
parameter to True
when initializing your agent:
multimodal=True
, the agent is automatically configured with the necessary tools for handling non-text content, including the AddImageTool
.
AddImageTool
, which allows it to process images. You don’t need to manually add this tool - it’s automatically included when you enable multimodal capabilities.
Here’s a complete example showing how to use a multimodal agent to analyze an image:
AddImageTool
is automatically configured with the following schema:
action
parameter for focused analysis