HyperbrowserLoadTool

Description

The HyperbrowserLoadTool enables web scraping and crawling using Hyperbrowser, a platform for running and scaling headless browsers. This tool allows you to scrape a single page or crawl an entire site, returning the content in properly formatted markdown or HTML.

Key Features:

  • Instant Scalability - Spin up hundreds of browser sessions in seconds without infrastructure headaches
  • Simple Integration - Works seamlessly with popular tools like Puppeteer and Playwright
  • Powerful APIs - Easy to use APIs for scraping/crawling any site
  • Bypass Anti-Bot Measures - Built-in stealth mode, ad blocking, automatic CAPTCHA solving, and rotating proxies

Installation

To use this tool, you need to install the Hyperbrowser SDK:

uv add hyperbrowser

Steps to Get Started

To effectively use the HyperbrowserLoadTool, follow these steps:

  1. Sign Up: Head to Hyperbrowser to sign up and generate an API key.
  2. API Key: Set the HYPERBROWSER_API_KEY environment variable or pass it directly to the tool constructor.
  3. Install SDK: Install the Hyperbrowser SDK using the command above.

Example

The following example demonstrates how to initialize the tool and use it to scrape a website:

Code
from crewai_tools import HyperbrowserLoadTool
from crewai import Agent

# Initialize the tool with your API key
tool = HyperbrowserLoadTool(api_key="your_api_key")  # Or use environment variable

# Define an agent that uses the tool
@agent
def web_researcher(self) -> Agent:
    '''
    This agent uses the HyperbrowserLoadTool to scrape websites
    and extract information.
    '''
    return Agent(
        config=self.agents_config["web_researcher"],
        tools=[tool]
    )

Parameters

The HyperbrowserLoadTool accepts the following parameters:

Constructor Parameters

  • api_key: Optional. Your Hyperbrowser API key. If not provided, it will be read from the HYPERBROWSER_API_KEY environment variable.

Run Parameters

  • url: Required. The website URL to scrape or crawl.
  • operation: Optional. The operation to perform on the website. Either ‘scrape’ or ‘crawl’. Default is ‘scrape’.
  • params: Optional. Additional parameters for the scrape or crawl operation.

Supported Parameters

For detailed information on all supported parameters, visit:

Return Format

The tool returns content in the following format:

  • For scrape operations: The content of the page in markdown or HTML format.
  • For crawl operations: The content of each page separated by dividers, including the URL of each page.

Conclusion

The HyperbrowserLoadTool provides a powerful way to scrape and crawl websites, handling complex scenarios like anti-bot measures, CAPTCHAs, and more. By leveraging Hyperbrowser’s platform, this tool enables agents to access and extract web content efficiently.