Overview

The StagehandTool integrates the Stagehand framework with CrewAI, enabling agents to interact with websites and automate browser tasks using natural language instructions.

Overview

Stagehand is a powerful browser automation framework built by Browserbase that allows AI agents to:

  • Navigate to websites
  • Click buttons, links, and other elements
  • Fill in forms
  • Extract data from web pages
  • Observe and identify elements
  • Perform complex workflows

The StagehandTool wraps the Stagehand Python SDK to provide CrewAI agents with browser control capabilities through three core primitives:

  1. Act: Perform actions like clicking, typing, or navigating
  2. Extract: Extract structured data from web pages
  3. Observe: Identify and analyze elements on the page

Prerequisites

Before using this tool, ensure you have:

  1. A Browserbase account with API key and project ID
  2. An API key for an LLM (OpenAI or Anthropic Claude)
  3. The Stagehand Python SDK installed

Install the required dependency:

pip install stagehand-py

Usage

Basic Implementation

The StagehandTool can be implemented in two ways:

The context manager approach is recommended as it ensures proper cleanup of resources even if exceptions occur.

from crewai import Agent, Task, Crew
from crewai_tools import StagehandTool
from stagehand.schemas import AvailableModel

# Initialize the tool with your API keys using a context manager
with StagehandTool(
    api_key="your-browserbase-api-key",
    project_id="your-browserbase-project-id",
    model_api_key="your-llm-api-key",  # OpenAI or Anthropic API key
    model_name=AvailableModel.CLAUDE_3_7_SONNET_LATEST,  # Optional: specify which model to use
) as stagehand_tool:
    # Create an agent with the tool
    researcher = Agent(
        role="Web Researcher",
        goal="Find and summarize information from websites",
        backstory="I'm an expert at finding information online.",
        verbose=True,
        tools=[stagehand_tool],
    )

    # Create a task that uses the tool
    research_task = Task(
        description="Go to https://www.example.com and tell me what you see on the homepage.",
        agent=researcher,
    )

    # Run the crew
    crew = Crew(
        agents=[researcher],
        tasks=[research_task],
        verbose=True,
    )

    result = crew.kickoff()
    print(result)

2. Manual Resource Management

from crewai import Agent, Task, Crew
from crewai_tools import StagehandTool
from stagehand.schemas import AvailableModel

# Initialize the tool with your API keys
stagehand_tool = StagehandTool(
    api_key="your-browserbase-api-key",
    project_id="your-browserbase-project-id",
    model_api_key="your-llm-api-key",
    model_name=AvailableModel.CLAUDE_3_7_SONNET_LATEST,
)

try:
    # Create an agent with the tool
    researcher = Agent(
        role="Web Researcher",
        goal="Find and summarize information from websites",
        backstory="I'm an expert at finding information online.",
        verbose=True,
        tools=[stagehand_tool],
    )

    # Create a task that uses the tool
    research_task = Task(
        description="Go to https://www.example.com and tell me what you see on the homepage.",
        agent=researcher,
    )

    # Run the crew
    crew = Crew(
        agents=[researcher],
        tasks=[research_task],
        verbose=True,
    )

    result = crew.kickoff()
    print(result)
finally:
    # Explicitly clean up resources
    stagehand_tool.close()

Command Types

The StagehandTool supports three different command types for specific web automation tasks:

1. Act Command

The act command type (default) enables webpage interactions like clicking buttons, filling forms, and navigation.

# Perform an action (default behavior)
result = stagehand_tool.run(
    instruction="Click the login button", 
    url="https://example.com",
    command_type="act"  # Default, so can be omitted
)

# Fill out a form
result = stagehand_tool.run(
    instruction="Fill the contact form with name 'John Doe', email 'john@example.com', and message 'Hello world'", 
    url="https://example.com/contact"
)

2. Extract Command

The extract command type retrieves structured data from webpages.

# Extract all product information
result = stagehand_tool.run(
    instruction="Extract all product names, prices, and descriptions", 
    url="https://example.com/products",
    command_type="extract"
)

# Extract specific information with a selector
result = stagehand_tool.run(
    instruction="Extract the main article title and content", 
    url="https://example.com/blog/article",
    command_type="extract",
    selector=".article-container"  # Optional CSS selector
)

3. Observe Command

The observe command type identifies and analyzes webpage elements.

# Find interactive elements
result = stagehand_tool.run(
    instruction="Find all interactive elements in the navigation menu", 
    url="https://example.com",
    command_type="observe"
)

# Identify form fields
result = stagehand_tool.run(
    instruction="Identify all the input fields in the registration form", 
    url="https://example.com/register",
    command_type="observe",
    selector="#registration-form"
)

Configuration Options

Customize the StagehandTool behavior with these parameters:

stagehand_tool = StagehandTool(
    api_key="your-browserbase-api-key",
    project_id="your-browserbase-project-id",
    model_api_key="your-llm-api-key",
    model_name=AvailableModel.CLAUDE_3_7_SONNET_LATEST,
    dom_settle_timeout_ms=5000,  # Wait longer for DOM to settle
    headless=True,  # Run browser in headless mode
    self_heal=True,  # Attempt to recover from errors
    wait_for_captcha_solves=True,  # Wait for CAPTCHA solving
    verbose=1,  # Control logging verbosity (0-3)
)

Best Practices

  1. Be Specific: Provide detailed instructions for better results
  2. Choose Appropriate Command Type: Select the right command type for your task
  3. Use Selectors: Leverage CSS selectors to improve accuracy
  4. Break Down Complex Tasks: Split complex workflows into multiple tool calls
  5. Implement Error Handling: Add error handling for potential issues

Troubleshooting

Common issues and solutions:

  • Session Issues: Verify API keys for both Browserbase and LLM provider
  • Element Not Found: Increase dom_settle_timeout_ms for slower pages
  • Action Failures: Use observe to identify correct elements first
  • Incomplete Data: Refine instructions or provide specific selectors

Additional Resources

For questions about the CrewAI integration: