Knowledge
What is knowledge in CrewAI and how to use it.
Overview
Knowledge in CrewAI is a powerful system that allows AI agents to access and utilize external information sources during their tasks. Think of it as giving your agents a reference library they can consult while working.
Key benefits of using Knowledge:
- Enhance agents with domain-specific information
- Support decisions with real-world data
- Maintain context across conversations
- Ground responses in factual information
Quickstart Examples
For file-based Knowledge Sources, make sure to place your files in a knowledge
directory at the root of your project.
Also, use relative paths from the knowledge
directory when creating the source.
Basic String Knowledge Example
Web Content Knowledge Example
You need to install docling
for the following example to work: uv add docling
Supported Knowledge Sources
CrewAI supports various types of knowledge sources out of the box:
Text Sources
- Raw strings
- Text files (.txt)
- PDF documents
Structured Data
- CSV files
- Excel spreadsheets
- JSON documents
Text File Knowledge Source
PDF Knowledge Source
CSV Knowledge Source
Excel Knowledge Source
JSON Knowledge Source
Please ensure that you create the ./knowledge folder. All source files (e.g., .txt, .pdf, .xlsx, .json) should be placed in this folder for centralized management.
Agent vs Crew Knowledge: Complete Guide
Understanding Knowledge Levels: CrewAI supports knowledge at both agent and crew levels. This section clarifies exactly how each works, when they’re initialized, and addresses common misconceptions about dependencies.
How Knowledge Initialization Actually Works
Here’s exactly what happens when you use knowledge:
Agent-Level Knowledge (Independent)
What Happens During crew.kickoff()
When you call crew.kickoff()
, here’s the exact sequence:
Storage Independence
Each knowledge level uses independent storage collections:
Complete Working Examples
Example 1: Agent-Only Knowledge
Example 2: Both Agent and Crew Knowledge
Example 3: Multiple Agents with Different Knowledge
Unlike retrieval from a vector database using a tool, agents preloaded with knowledge will not need a retrieval persona or task. Simply add the relevant knowledge sources your agent or crew needs to function.
Knowledge sources can be added at the agent or crew level. Crew level knowledge sources will be used by all agents in the crew. Agent level knowledge sources will be used by the specific agent that is preloaded with the knowledge.
Knowledge Configuration
You can configure the knowledge configuration for the crew or agent.
results_limit
: is the number of relevant documents to return. Default is 3.
score_threshold
: is the minimum score for a document to be considered relevant. Default is 0.35.
Supported Knowledge Parameters
List of knowledge sources that provide content to be stored and queried. Can include PDF, CSV, Excel, JSON, text files, or string content.
Name of the collection where the knowledge will be stored. Used to identify different sets of knowledge. Defaults to “knowledge” if not provided.
Custom storage configuration for managing how the knowledge is stored and retrieved. If not provided, a default storage will be created.
Knowledge Storage Transparency
Understanding Knowledge Storage: CrewAI automatically stores knowledge sources in platform-specific directories using ChromaDB for vector storage. Understanding these locations and defaults helps with production deployments, debugging, and storage management.
Where CrewAI Stores Knowledge Files
By default, CrewAI uses the same storage system as memory, storing knowledge in platform-specific directories:
Default Storage Locations by Platform
macOS:
Linux:
Windows:
Finding Your Knowledge Storage Location
To see exactly where CrewAI is storing your knowledge files:
Controlling Knowledge Storage Locations
Option 1: Environment Variable (Recommended)
Option 2: Custom Knowledge Storage
Option 3: Project-Specific Knowledge Storage
Default Embedding Provider Behavior
Default Embedding Provider: CrewAI defaults to OpenAI embeddings (text-embedding-3-small
) for knowledge storage, even when using different LLM providers. You can easily customize this to match your setup.
Understanding Default Behavior
Customizing Knowledge Embedding Providers
Advanced Features
Query Rewriting
CrewAI implements an intelligent query rewriting mechanism to optimize knowledge retrieval. When an agent needs to search through knowledge sources, the raw task prompt is automatically transformed into a more effective search query.
How Query Rewriting Works
- When an agent executes a task with knowledge sources available, the
_get_knowledge_search_query
method is triggered - The agent’s LLM is used to transform the original task prompt into an optimized search query
- This optimized query is then used to retrieve relevant information from knowledge sources
Benefits of Query Rewriting
Improved Retrieval Accuracy
By focusing on key concepts and removing irrelevant content, query rewriting helps retrieve more relevant information.
Context Awareness
The rewritten queries are designed to be more specific and context-aware for vector database retrieval.
Example
The rewritten query is more focused on the core information need and removes irrelevant instructions about output formatting.
This mechanism is fully automatic and requires no configuration from users. The agent’s LLM is used to perform the query rewriting, so using a more capable LLM can improve the quality of rewritten queries.
Knowledge Events
CrewAI emits events during the knowledge retrieval process that you can listen for using the event system. These events allow you to monitor, debug, and analyze how knowledge is being retrieved and used by your agents.
Available Knowledge Events
- KnowledgeRetrievalStartedEvent: Emitted when an agent starts retrieving knowledge from sources
- KnowledgeRetrievalCompletedEvent: Emitted when knowledge retrieval is completed, including the query used and the retrieved content
- KnowledgeQueryStartedEvent: Emitted when a query to knowledge sources begins
- KnowledgeQueryCompletedEvent: Emitted when a query completes successfully
- KnowledgeQueryFailedEvent: Emitted when a query to knowledge sources fails
- KnowledgeSearchQueryFailedEvent: Emitted when a search query fails
Example: Monitoring Knowledge Retrieval
For more information on using events, see the Event Listeners documentation.
Custom Knowledge Sources
CrewAI allows you to create custom knowledge sources for any type of data by extending the BaseKnowledgeSource
class. Let’s create a practical example that fetches and processes space news articles.
Space News Knowledge Source Example
Debugging and Troubleshooting
Debugging Knowledge Issues
Check Agent Knowledge Initialization
Verify Knowledge Storage Locations
Test Knowledge Retrieval
Inspect Knowledge Collections
Check Knowledge Processing
Common Knowledge Storage Issues
“File not found” errors:
“Embedding dimension mismatch” errors:
“ChromaDB permission denied” errors:
Knowledge not persisting between runs:
Knowledge Reset Commands
Clearing Knowledge
If you need to clear the knowledge stored in CrewAI, you can use the crewai reset-memories
command with the --knowledge
option.
This is useful when you’ve updated your knowledge sources and want to ensure that the agents are using the most recent information.