Knowledge
What is knowledge in CrewAI and how to use it.
What is Knowledge?
Knowledge in CrewAI is a powerful system that allows AI agents to access and utilize external information sources during their tasks. Think of it as giving your agents a reference library they can consult while working.
Key benefits of using Knowledge:
- Enhance agents with domain-specific information
- Support decisions with real-world data
- Maintain context across conversations
- Ground responses in factual information
Supported Knowledge Sources
CrewAI supports various types of knowledge sources out of the box:
Text Sources
- Raw strings
- Text files (.txt)
- PDF documents
Structured Data
- CSV files
- Excel spreadsheets
- JSON documents
Supported Knowledge Parameters
Parameter | Type | Required | Description |
---|---|---|---|
sources | List[BaseKnowledgeSource] | Yes | List of knowledge sources that provide content to be stored and queried. Can include PDF, CSV, Excel, JSON, text files, or string content. |
collection_name | str | No | Name of the collection where the knowledge will be stored. Used to identify different sets of knowledge. Defaults to “knowledge” if not provided. |
storage | Optional[KnowledgeStorage] | No | Custom storage configuration for managing how the knowledge is stored and retrieved. If not provided, a default storage will be created. |
Quickstart Example
For file-Based Knowledge Sources, make sure to place your files in a knowledge
directory at the root of your project.
Also, use relative paths from the knowledge
directory when creating the source.
Here’s an example using string-based knowledge:
Here’s another example with the CrewDoclingSource
. The CrewDoclingSource is actually quite versatile and can handle multiple file formats including TXT, PDF, DOCX, HTML, and more.
You need to install docling
for the following example to work: uv add docling
More Examples
Here are examples of how to use different types of knowledge sources:
Text File Knowledge Source
PDF Knowledge Source
CSV Knowledge Source
Excel Knowledge Source
JSON Knowledge Source
Knowledge Configuration
Chunking Configuration
Knowledge sources automatically chunk content for better processing. You can configure chunking behavior in your knowledge sources:
The chunking configuration helps in:
- Breaking down large documents into manageable pieces
- Maintaining context through chunk overlap
- Optimizing retrieval accuracy
Embeddings Configuration
You can also configure the embedder for the knowledge store.
This is useful if you want to use a different embedder for the knowledge store than the one used for the agents.
The embedder
parameter supports various embedding model providers that include:
openai
: OpenAI’s embedding modelsgoogle
: Google’s text embedding modelsazure
: Azure OpenAI embeddingsollama
: Local embeddings with Ollamavertexai
: Google Cloud VertexAI embeddingscohere
: Cohere’s embedding modelsvoyageai
: VoyageAI’s embedding modelsbedrock
: AWS Bedrock embeddingshuggingface
: Hugging Face modelswatson
: IBM Watson embeddings
Here’s an example of how to configure the embedder for the knowledge store using Google’s text-embedding-004
model:
Clearing Knowledge
If you need to clear the knowledge stored in CrewAI, you can use the crewai reset-memories
command with the --knowledge
option.
This is useful when you’ve updated your knowledge sources and want to ensure that the agents are using the most recent information.
Agent-Specific Knowledge
While knowledge can be provided at the crew level using crew.knowledge_sources
, individual agents can also have their own knowledge sources using the knowledge_sources
parameter:
Benefits of agent-specific knowledge:
- Give agents specialized information for their roles
- Maintain separation of concerns between agents
- Combine with crew-level knowledge for layered information access
Custom Knowledge Sources
CrewAI allows you to create custom knowledge sources for any type of data by extending the BaseKnowledgeSource
class. Let’s create a practical example that fetches and processes space news articles.
Space News Knowledge Source Example
Key Components Explained
-
Custom Knowledge Source (
SpaceNewsKnowledgeSource
):- Extends
BaseKnowledgeSource
for integration with CrewAI - Configurable API endpoint and article limit
- Implements three key methods:
load_content()
: Fetches articles from the API_format_articles()
: Structures the articles into readable textadd()
: Processes and stores the content
- Extends
-
Agent Configuration:
- Specialized role as a Space News Analyst
- Uses the knowledge source to access space news
-
Task Setup:
- Takes a user question as input through
{user_question}
- Designed to provide detailed answers based on the knowledge source
- Takes a user question as input through
-
Crew Orchestration:
- Manages the workflow between agent and task
- Handles input/output through the kickoff method
This example demonstrates how to:
- Create a custom knowledge source that fetches real-time data
- Process and format external data for AI consumption
- Use the knowledge source to answer specific user questions
- Integrate everything seamlessly with CrewAI’s agent system
About the Spaceflight News API
The example uses the Spaceflight News API, which:
- Provides free access to space-related news articles
- Requires no authentication
- Returns structured data about space news
- Supports pagination and filtering
You can customize the API query by modifying the endpoint URL: