MultiOnTool

Description

The MultiOnTool is designed to wrap MultiOn’s web browsing capabilities, enabling CrewAI agents to control web browsers using natural language instructions. This tool facilitates seamless web browsing, making it an essential asset for projects requiring dynamic web data interaction and automation of web-based tasks.

Installation

To use this tool, you need to install the MultiOn package:

uv add multion

You’ll also need to install the MultiOn browser extension and enable API usage.

Steps to Get Started

To effectively use the MultiOnTool, follow these steps:

  1. Install CrewAI: Ensure that the crewai[tools] package is installed in your Python environment.
  2. Install and use MultiOn: Follow MultiOn documentation for installing the MultiOn Browser Extension.
  3. Enable API Usage: Click on the MultiOn extension in the extensions folder of your browser (not the hovering MultiOn icon on the web page) to open the extension configurations. Click the API Enabled toggle to enable the API.

Example

The following example demonstrates how to initialize the tool and execute a web browsing task:

Code
from crewai import Agent, Task, Crew
from crewai_tools import MultiOnTool

# Initialize the tool
multion_tool = MultiOnTool(api_key="YOUR_MULTION_API_KEY", local=False)

# Define an agent that uses the tool
browser_agent = Agent(
    role="Browser Agent",
    goal="Control web browsers using natural language",
    backstory="An expert browsing agent.",
    tools=[multion_tool],
    verbose=True,
)

# Example task to search and summarize news
browse_task = Task(
    description="Summarize the top 3 trending AI News headlines",
    expected_output="A summary of the top 3 trending AI News headlines",
    agent=browser_agent,
)

# Create and run the crew
crew = Crew(agents=[browser_agent], tasks=[browse_task])
result = crew.kickoff()

Parameters

The MultiOnTool accepts the following parameters during initialization:

  • api_key: Optional. Specifies the MultiOn API key. If not provided, it will look for the MULTION_API_KEY environment variable.
  • local: Optional. Set to True to run the agent locally on your browser. Make sure the MultiOn browser extension is installed and API Enabled is checked. Default is False.
  • max_steps: Optional. Sets the maximum number of steps the MultiOn agent can take for a command. Default is 3.

Usage

When using the MultiOnTool, the agent will provide natural language instructions that the tool translates into web browsing actions. The tool returns the results of the browsing session along with a status.

Code
# Example of using the tool with an agent
browser_agent = Agent(
    role="Web Browser Agent",
    goal="Search for and summarize information from the web",
    backstory="An expert at finding and extracting information from websites.",
    tools=[multion_tool],
    verbose=True,
)

# Create a task for the agent
search_task = Task(
    description="Search for the latest AI news on TechCrunch and summarize the top 3 headlines",
    expected_output="A summary of the top 3 AI news headlines from TechCrunch",
    agent=browser_agent,
)

# Run the task
crew = Crew(agents=[browser_agent], tasks=[search_task])
result = crew.kickoff()

If the status returned is CONTINUE, the agent should be instructed to reissue the same instruction to continue execution.

Implementation Details

The MultiOnTool is implemented as a subclass of BaseTool from CrewAI. It wraps the MultiOn client to provide web browsing capabilities:

Code
class MultiOnTool(BaseTool):
    """Tool to wrap MultiOn Browse Capabilities."""

    name: str = "Multion Browse Tool"
    description: str = """Multion gives the ability for LLMs to control web browsers using natural language instructions.
            If the status is 'CONTINUE', reissue the same instruction to continue execution
        """
    
    # Implementation details...
    
    def _run(self, cmd: str, *args: Any, **kwargs: Any) -> str:
        """
        Run the Multion client with the given command.
        
        Args:
            cmd (str): The detailed and specific natural language instruction for web browsing
            *args (Any): Additional arguments to pass to the Multion client
            **kwargs (Any): Additional keyword arguments to pass to the Multion client
        """
        # Implementation details...

Conclusion

The MultiOnTool provides a powerful way to integrate web browsing capabilities into CrewAI agents. By enabling agents to interact with websites through natural language instructions, it opens up a wide range of possibilities for web-based tasks, from data collection and research to automated interactions with web services.