Knowledge no CrewAI é um sistema poderoso que permite que agentes de IA acessem e utilizem fontes de informação externas durante suas tarefas.
Pense nisso como dar aos seus agentes uma biblioteca de referência que eles podem consultar enquanto trabalham.
Principais benefícios de usar Knowledge:
Aprimorar agentes com informações específicas do domínio
Para Fontes de Knowledge baseadas em arquivos, certifique-se de colocar seus arquivos em um diretório knowledge na raiz do seu projeto.
Além disso, use caminhos relativos do diretório knowledge ao criar a fonte.
from crewai import Agent, Task, Crew, Process, LLMfrom crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource# Create a knowledge sourcecontent = "Users name is John. He is 30 years old and lives in San Francisco."string_source = StringKnowledgeSource(content=content)# Create an LLM with a temperature of 0 to ensure deterministic outputsllm = LLM(model="gpt-4o-mini", temperature=0)# Create an agent with the knowledge storeagent = Agent( role="About User", goal="You know everything about the user.", backstory="You are a master at understanding people and their preferences.", verbose=True, allow_delegation=False, llm=llm,)task = Task( description="Answer the following questions about the user: {question}", expected_output="An answer to the question.", agent=agent,)crew = Crew( agents=[agent], tasks=[task], verbose=True, process=Process.sequential, knowledge_sources=[string_source], # Enable knowledge by adding the sources here)result = crew.kickoff(inputs={"question": "What city does John live in and how old is he?"})
Você precisa instalar docling para o seguinte exemplo funcionar: uv add docling
Code
Copy
Ask AI
from crewai import LLM, Agent, Crew, Process, Taskfrom crewai.knowledge.source.crew_docling_source import CrewDoclingSource# Create a knowledge source from web contentcontent_source = CrewDoclingSource( file_paths=[ "https://lilianweng.github.io/posts/2024-11-28-reward-hacking", "https://lilianweng.github.io/posts/2024-07-07-hallucination", ],)# Create an LLM with a temperature of 0 to ensure deterministic outputsllm = LLM(model="gpt-4o-mini", temperature=0)# Create an agent with the knowledge storeagent = Agent( role="About papers", goal="You know everything about the papers.", backstory="You are a master at understanding papers and their content.", verbose=True, allow_delegation=False, llm=llm,)task = Task( description="Answer the following questions about the papers: {question}", expected_output="An answer to the question.", agent=agent,)crew = Crew( agents=[agent], tasks=[task], verbose=True, process=Process.sequential, knowledge_sources=[content_source],)result = crew.kickoff( inputs={"question": "What is the reward hacking paper about? Be sure to provide sources."})
from crewai.knowledge.source.json_knowledge_source import JSONKnowledgeSourcejson_source = JSONKnowledgeSource( file_paths=["data.json"])
Certifique-se de criar a pasta ./knowledge. Todos os arquivos de origem (ex: .txt, .pdf, .xlsx, .json) devem ser colocados nesta pasta para gerenciamento centralizado.
Entendendo os Níveis de Knowledge: O CrewAI suporta knowledge tanto no nível de agente quanto de crew. Esta seção esclarece exatamente como cada um funciona, quando são inicializados, e aborda equívocos comuns sobre dependências.
from crewai import Agent, Task, Crewfrom crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource# Agent-specific knowledgeagent_knowledge = StringKnowledgeSource( content="Agent-specific information that only this agent needs")agent = Agent( role="Specialist", goal="Use specialized knowledge", backstory="Expert with specific knowledge", knowledge_sources=[agent_knowledge], embedder={ # Agent can have its own embedder "provider": "openai", "config": {"model": "text-embedding-3-small"} })task = Task( description="Answer using your specialized knowledge", agent=agent, expected_output="Answer based on agent knowledge")# No crew knowledge neededcrew = Crew(agents=[agent], tasks=[task])result = crew.kickoff() # Works perfectly
Exemplo 3: Múltiplos Agentes com Knowledge Diferente
Copy
Ask AI
# Different knowledge for different agentssales_knowledge = StringKnowledgeSource(content="Sales procedures and pricing")tech_knowledge = StringKnowledgeSource(content="Technical documentation")support_knowledge = StringKnowledgeSource(content="Support procedures")sales_agent = Agent( role="Sales Representative", knowledge_sources=[sales_knowledge], embedder={"provider": "openai", "config": {"model": "text-embedding-3-small"}})tech_agent = Agent( role="Technical Expert", knowledge_sources=[tech_knowledge], embedder={"provider": "ollama", "config": {"model": "mxbai-embed-large"}})support_agent = Agent( role="Support Specialist", knowledge_sources=[support_knowledge] # Will use crew embedder as fallback)crew = Crew( agents=[sales_agent, tech_agent, support_agent], tasks=[...], embedder={ # Fallback embedder for agents without their own "provider": "google", "config": {"model": "text-embedding-004"} })# Each agent gets only their specific knowledge# Each can use different embedding providers
Diferente da recuperação de um banco de dados vetorial usando uma ferramenta, agentes pré-carregados com knowledge não precisarão de uma persona de recuperação ou tarefa.
Simplesmente adicione as fontes de knowledge relevantes que seu agente ou crew precisa para funcionar.
As fontes de knowledge podem ser adicionadas no nível do agente ou da crew.
As fontes de knowledge no nível da crew serão usadas por todos os agentes na crew.
As fontes de knowledge no nível do agente serão usadas pelo agente específico que é pré-carregado com o knowledge.
results_limit: é o número de documentos relevantes a retornar. Padrão é 3.
score_threshold: é a pontuação mínima para um documento ser considerado relevante. Padrão é 0.35.
Lista de fontes de knowledge que fornecem conteúdo para ser armazenado e consultado. Pode incluir PDF, CSV, Excel, JSON, arquivos de texto ou conteúdo de string.
Configuração de armazenamento personalizada para gerenciar como o knowledge é armazenado e recuperado. Se não fornecido, um armazenamento padrão será criado.
Entendendo o Armazenamento de Knowledge: O CrewAI armazena automaticamente as fontes de knowledge em diretórios específicos da plataforma usando ChromaDB para armazenamento vetorial. Entender essas localizações e padrões ajuda com implantações de produção, depuração e gerenciamento de armazenamento.
import osfrom crewai import Crew# Set custom storage location for all CrewAI dataos.environ["CREWAI_STORAGE_DIR"] = "./my_project_storage"# All knowledge will now be stored in ./my_project_storage/knowledge/crew = Crew( agents=[...], tasks=[...], knowledge_sources=[...])
Opção 3: Armazenamento de Knowledge Específico do Projeto
Copy
Ask AI
import osfrom pathlib import Path# Store knowledge in project directoryproject_root = Path(__file__).parentknowledge_dir = project_root / "knowledge_storage"os.environ["CREWAI_STORAGE_DIR"] = str(knowledge_dir)# Now all knowledge will be stored in your project directory
Provedor de Embedding Padrão: O CrewAI usa por padrão embeddings da OpenAI (text-embedding-3-small) para armazenamento de knowledge, mesmo quando usa diferentes provedores de LLM. Você pode facilmente personalizar isso para corresponder à sua configuração.
from crewai import Agent, Crew, LLMfrom crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource# When using Claude as your LLM...agent = Agent( role="Researcher", goal="Research topics", backstory="Expert researcher", llm=LLM(provider="anthropic", model="claude-3-sonnet") # Using Claude)# CrewAI will still use OpenAI embeddings by default for knowledge# This ensures consistency but may not match your LLM provider preferenceknowledge_source = StringKnowledgeSource(content="Research data...")crew = Crew( agents=[agent], tasks=[...], knowledge_sources=[knowledge_source] # Default: Uses OpenAI embeddings even with Claude LLM)
Certifique-se de implantar o modelo de embedding na plataforma Azure primeiro
Então você precisa usar a seguinte configuração:
Copy
Ask AI
agent = Agent( role="Researcher", goal="Research topics", backstory="Expert researcher", knowledge_sources=[knowledge_source], embedder={ "provider": "azure", "config": { "api_key": "your-azure-api-key", "model": "text-embedding-ada-002", # change to the model you are using and is deployed in Azure "api_base": "https://your-azure-endpoint.openai.azure.com/", "api_version": "2024-02-01" } })
O CrewAI implementa um mecanismo inteligente de reescrita de consulta para otimizar a recuperação de knowledge. Quando um agente precisa pesquisar nas fontes de knowledge, o prompt da tarefa bruto é automaticamente transformado em uma consulta de pesquisa mais eficaz.
# Original task prompttask_prompt = "Answer the following questions about the user's favorite movies: What movie did John watch last week? Format your answer in JSON."# Behind the scenes, this might be rewritten as:rewritten_query = "What movies did John watch last week?"
A consulta reescrita é mais focada na necessidade de informação principal e remove instruções irrelevantes sobre formatação de saída.
Este mecanismo é totalmente automático e não requer configuração dos usuários. O LLM do agente é usado para realizar a reescrita da consulta, então usar um LLM mais capaz pode melhorar a qualidade das consultas reescritas.
O CrewAI emite eventos durante o processo de recuperação de knowledge que você pode escutar usando o sistema de eventos. Esses eventos permitem que você monitore, depure e analise como o knowledge está sendo recuperado e usado pelos seus agentes.
O CrewAI permite que você crie fontes de knowledge personalizadas para qualquer tipo de dados estendendo a classe BaseKnowledgeSource. Vamos criar um exemplo prático que busca e processa artigos de notícias espaciais.
Exemplo de Fonte de Knowledge de Notícias Espaciais
Copy
Ask AI
from crewai import Agent, Task, Crew, Process, LLMfrom crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSourceimport requestsfrom datetime import datetimefrom typing import Dict, Anyfrom pydantic import BaseModel, Fieldclass SpaceNewsKnowledgeSource(BaseKnowledgeSource): """Knowledge source that fetches data from Space News API.""" api_endpoint: str = Field(description="API endpoint URL") limit: int = Field(default=10, description="Number of articles to fetch") def load_content(self) -> Dict[Any, str]: """Fetch and format space news articles.""" try: response = requests.get( f"{self.api_endpoint}?limit={self.limit}" ) response.raise_for_status() data = response.json() articles = data.get('results', []) formatted_data = self.validate_content(articles) return {self.api_endpoint: formatted_data} except Exception as e: raise ValueError(f"Failed to fetch space news: {str(e)}") def validate_content(self, articles: list) -> str: """Format articles into readable text.""" formatted = "Space News Articles:\n\n" for article in articles: formatted += f""" Title: {article['title']} Published: {article['published_at']} Summary: {article['summary']} News Site: {article['news_site']} URL: {article['url']} -------------------""" return formatted def add(self) -> None: """Process and store the articles.""" content = self.load_content() for _, text in content.items(): chunks = self._chunk_text(text) self.chunks.extend(chunks) self._save_documents()# Create knowledge sourcerecent_news = SpaceNewsKnowledgeSource( api_endpoint="https://api.spaceflightnewsapi.net/v4/articles", limit=10,)# Create specialized agentspace_analyst = Agent( role="Space News Analyst", goal="Answer questions about space news accurately and comprehensively", backstory="""You are a space industry analyst with expertise in space exploration, satellite technology, and space industry trends. You excel at answering questions about space news and providing detailed, accurate information.""", knowledge_sources=[recent_news], llm=LLM(model="gpt-4", temperature=0.0))# Create task that handles user questionsanalysis_task = Task( description="Answer this question about space news: {user_question}", expected_output="A detailed answer based on the recent space news articles", agent=space_analyst)# Create and run the crewcrew = Crew( agents=[space_analyst], tasks=[analysis_task], verbose=True, process=Process.sequential)# Example usageresult = crew.kickoff( inputs={"user_question": "What are the latest developments in space exploration?"})
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource# Create a test knowledge sourcetest_source = StringKnowledgeSource( content="Test knowledge content for debugging", chunk_size=100, # Small chunks for testing chunk_overlap=20)# Check chunking behaviorprint(f"Original content length: {len(test_source.content)}")print(f"Chunk size: {test_source.chunk_size}")print(f"Chunk overlap: {test_source.chunk_overlap}")# Process and inspect chunkstest_source.add()print(f"Number of chunks created: {len(test_source.chunks)}")for i, chunk in enumerate(test_source.chunks[:3]): # Show first 3 chunks print(f"Chunk {i+1}: {chunk[:50]}...")
# Ensure files are in the correct locationfrom crewai.utilities.constants import KNOWLEDGE_DIRECTORYimport osknowledge_dir = KNOWLEDGE_DIRECTORY # Usually "knowledge"file_path = os.path.join(knowledge_dir, "your_file.pdf")if not os.path.exists(file_path): print(f"File not found: {file_path}") print(f"Current working directory: {os.getcwd()}") print(f"Expected knowledge directory: {os.path.abspath(knowledge_dir)}")
Erros “Incompatibilidade de dimensão de embedding”:
Copy
Ask AI
# This happens when switching embedding providers# Reset knowledge storage to clear old embeddingscrew.reset_memories(command_type='knowledge')# Or use consistent embedding providerscrew = Crew( agents=[...], tasks=[...], knowledge_sources=[...], embedder={"provider": "openai", "config": {"model": "text-embedding-3-small"}})