TavilyExtractorTool
allows CrewAI agents to extract structured content from web pages using the Tavily API. It can process single URLs or lists of URLs and provides options for controlling the extraction depth and including images.
Installation
To use theTavilyExtractorTool
, you need to install the tavily-python
library:
Example Usage
Here’s how to initialize and use theTavilyExtractorTool
within a CrewAI agent:
Configuration Options
TheTavilyExtractorTool
accepts the following arguments:
urls
(Union[List[str], str]): Required. A single URL string or a list of URL strings to extract data from.include_images
(Optional[bool]): Whether to include images in the extraction results. Defaults toFalse
.extract_depth
(Literal[“basic”, “advanced”]): The depth of extraction. Use"basic"
for faster, surface-level extraction or"advanced"
for more comprehensive extraction. Defaults to"basic"
.timeout
(int): The maximum time in seconds to wait for the extraction request to complete. Defaults to60
.
Advanced Usage
Multiple URLs with Advanced Extraction
Tool Parameters
You can customize the tool’s behavior by setting parameters during initialization:Features
- Single or Multiple URLs: Extract content from one URL or process multiple URLs in a single request
- Configurable Depth: Choose between basic (fast) and advanced (comprehensive) extraction modes
- Image Support: Optionally include images in the extraction results
- Structured Output: Returns well-formatted JSON containing the extracted content
- Error Handling: Robust handling of network timeouts and extraction errors
Response Format
The tool returns a JSON string representing the structured data extracted from the provided URL(s). The exact structure depends on the content of the pages and theextract_depth
used.
Common response elements include:
- Title: The page title
- Content: Main text content of the page
- Images: Image URLs and metadata (when
include_images=True
) - Metadata: Additional page information like author, description, etc.
Use Cases
- Content Analysis: Extract and analyze content from competitor websites
- Research: Gather structured data from multiple sources for analysis
- Content Migration: Extract content from existing websites for migration
- Monitoring: Regular extraction of content for change detection
- Data Collection: Systematic extraction of information from web sources