`SpiderTool`

Description

Spider is the fastest open source scraper and crawler that returns LLM-ready data. It converts any website into pure HTML, markdown, metadata or text while enabling you to crawl with custom actions using AI.

Installation

To use the SpiderTool you need to download the Spider SDK and the crewai[tools] SDK too:

pip install spider-client 'crewai[tools]'

Example

This example shows you how you can use the SpiderTool to enable your agent to scrape and crawl websites. The data returned from the Spider API is already LLM-ready, so no need to do any cleaning there.

Code

from crewai_tools import SpiderTool

def main():
    spider_tool = SpiderTool()

    searcher = Agent(
        role="Web Research Expert",
        goal="Find related information from specific URL's",
        backstory="An expert web researcher that uses the web extremely well",
        tools=[spider_tool],
        verbose=True,
    )

    return_metadata = Task(
        description="Scrape https://spider.cloud with a limit of 1 and enable metadata",
        expected_output="Metadata and 10 word summary of spider.cloud",
        agent=searcher
    )

    crew = Crew(
        agents=[searcher],
        tasks=[
            return_metadata,
        ],
        verbose=2
    )

    crew.kickoff()

if __name__ == "__main__":
    main()

Arguments

Argument	Type	Description
api_key	`string`	Specifies Spider API key. If not specified, it looks for `SPIDER_API_KEY` in environment variables.
params	`object`	Optional parameters for the request. Defaults to `{"return_format": "markdown"}` to optimize content for LLMs.
request	`string`	Type of request to perform (`http`, `chrome`, `smart`). `smart` defaults to HTTP, switching to JavaScript rendering if needed.
limit	`int`	Max pages to crawl per website. Set to `0` or omit for unlimited.
depth	`int`	Max crawl depth. Set to `0` for no limit.
cache	`bool`	Enables HTTP caching to speed up repeated runs. Default is `true`.
budget	`object`	Sets path-based limits for crawled pages, e.g., `{"*":1}` for root page only.
locale	`string`	Locale for the request, e.g., `en-US`.
cookies	`string`	HTTP cookies for the request.
stealth	`bool`	Enables stealth mode for Chrome requests to avoid detection. Default is `true`.
headers	`object`	HTTP headers as a map of key-value pairs for all requests.
metadata	`bool`	Stores metadata about pages and content, aiding AI interoperability. Defaults to `false`.
viewport	`object`	Sets Chrome viewport dimensions. Default is `800x600`.
encoding	`string`	Specifies encoding type, e.g., `UTF-8`, `SHIFT_JIS`.
subdomains	`bool`	Includes subdomains in the crawl. Default is `false`.
user_agent	`string`	Custom HTTP user agent. Defaults to a random agent.
store_data	`bool`	Enables data storage for the request. Overrides `storageless` when set. Default is `false`.
gpt_config	`object`	Allows AI to generate crawl actions, with optional chaining steps via an array for `"prompt"`.
fingerprint	`bool`	Enables advanced fingerprinting for Chrome.
storageless	`bool`	Prevents all data storage, including AI embeddings. Default is `false`.
readability	`bool`	Pre-processes content for reading via Mozilla’s readability. Improves content for LLMs.
return_format	`string`	Format to return data: `markdown`, `raw`, `text`, `html2text`. Use `raw` for default page format.
proxy_enabled	`bool`	Enables high-performance proxies to avoid network-level blocking.
query_selector	`string`	CSS query selector for content extraction from markup.
full_resources	`bool`	Downloads all resources linked to the website.
request_timeout	`int`	Timeout in seconds for requests (5-60). Default is `30`.
run_in_background	`bool`	Runs the request in the background, useful for data storage and triggering dashboard crawls. No effect if `storageless` is set.

Get Started

Guides

Core Concepts

MCP Integration

Tools

Observability

Learn

Telemetry

Spider Scraper

`SpiderTool`

Description

Installation

Example

Arguments

Get Started

Guides

Core Concepts

MCP Integration

Tools

Observability

Learn

Telemetry

​SpiderTool

​Description

​Installation

​Example

​Arguments

`SpiderTool`

Description

Installation

Example

Arguments