Integrating RAG with Open WebUI¶
This document provides comprehensive instructions for integrating Obelisk's RAG system with Open WebUI, an advanced interface for interacting with large language models.
Overview¶
Open WebUI can leverage Obelisk's RAG capabilities to provide context-aware responses based on your documentation. The integration works through an OpenAI-compatible API provided by the Obelisk RAG service, allowing Open WebUI to seamlessly retrieve and use relevant documentation chunks when answering questions.
Integration Architecture¶
graph LR
User([User]) --> OpenWebUI
OpenWebUI --> RAG_API[Obelisk RAG API]
OpenWebUI --> Ollama
RAG_API --> VectorDB[(Vector Database)]
RAG_API --> Ollama[Ollama]
VectorDB --- Docs[(Documentation)]
style OpenWebUI fill:#f9f,stroke:#333,stroke-width:2px
style RAG_API fill:#bbf,stroke:#333,stroke-width:2px
style Ollama fill:#bfb,stroke:#333,stroke-width:2px
Configuration¶
Docker Compose Configuration¶
To integrate Open WebUI with the RAG service, configure the following environment variables in your docker-compose.yaml
file:
services:
open-webui:
# ... other configuration ...
environment:
# ... other environment variables ...
# RAG configuration
- RAG_ENABLED=true
- RAG_SERVICE_TYPE=custom
- RAG_SERVICE_URL=http://obelisk-rag:8000/v1
- "RAG_TEMPLATE=You are a helpful assistant. Use the following pieces of retrieved context to answer the user's question. If you don't know the answer, just say that you don't know.\n\nContext:\n{{context}}\n\nUser question: {{query}}"
depends_on:
- ollama
- obelisk-rag
Environment Variables¶
Variable | Description | Example Value |
---|---|---|
RAG_ENABLED | Enable RAG functionality | true |
RAG_SERVICE_TYPE | Type of RAG service | custom |
RAG_SERVICE_URL | URL to the RAG API endpoint | http://obelisk-rag:8000/v1 |
RAG_TEMPLATE | Prompt template for context injection | See example below |
Prompt Template¶
The RAG_TEMPLATE
defines how retrieved context and user queries are presented to the language model. Here's the default template:
You are a helpful assistant. Use the following pieces of retrieved context to answer the user's question. If you don't know the answer, just say that you don't know.
Context:
{{context}}
User question: {{query}}
This template includes two placeholders: - {{context}}
- Where retrieved document chunks will be inserted - {{query}}
- Where the user's original question will be inserted
How It Works¶
When a user asks a question through Open WebUI, the following process occurs:
- Open WebUI sends the query to the Obelisk RAG API using the OpenAI-compatible endpoint
- The RAG service:
- Embeds the query into a vector representation
- Searches the vector database for similar document chunks
- Retrieves the most relevant document chunks
- Formats these chunks as context using the RAG_TEMPLATE
- The formatted query with context is sent to the language model (via Ollama)
- The language model generates a response using the provided context
- The response is returned to Open WebUI and displayed to the user
Customizing the RAG Experience¶
Adjusting Retrieved Context¶
To modify the number of context chunks retrieved for each query, adjust the RETRIEVE_TOP_K
environment variable in the obelisk-rag service:
services:
obelisk-rag:
# ... other configuration ...
environment:
# ... other environment variables ...
- RETRIEVE_TOP_K=5 # Retrieve 5 chunks instead of default 3
Higher values provide more context but may dilute relevance. Lower values are more focused but might miss important information.
Customizing the Prompt Template¶
You can customize the RAG_TEMPLATE to change how the assistant uses retrieved context:
- 'RAG_TEMPLATE=You are an expert on Obelisk documentation. When answering, cite your sources using [Document Name] notation. Use these documentation excerpts to answer:\n\n{{context}}\n\nQuestion: {{query}}'
Effective customizations include: - Adjusting the assistant's persona - Requesting specific citation formats - Changing the response style (brief, detailed, technical, simplified) - Adding instructions for handling unanswerable questions
Selecting Different Models¶
Configure the language model used for generating responses by modifying the OLLAMA_MODEL
environment variable:
services:
obelisk-rag:
# ... other configuration ...
environment:
# ... other environment variables ...
- OLLAMA_MODEL=llama3 # Or other models like mistral, phi-3-mini, etc.
Similarly, you can change the embedding model with EMBEDDING_MODEL
:
Testing the Integration¶
To verify the RAG integration is working correctly:
- Ensure all services are running with
docker-compose ps
- Open the Open WebUI interface at
http://localhost:8080
- Log in with your credentials
- Select a model with RAG capabilities
- Ask a question related to your documentation, for example:
- The response should include information from your documentation
Troubleshooting¶
Common Issues¶
Issue | Possible Cause | Solution |
---|---|---|
No context retrieved | Documents not indexed | Run docker exec obelisk-rag python -m obelisk.rag.cli index |
Incorrect responses | Irrelevant context | Increase context quality by improving document chunking or adjust RETRIEVE_TOP_K |
RAG not being used | Configuration issue | Check RAG_ENABLED is true and RAG_SERVICE_URL is correct |
Slow responses | Large document corpus | Optimize embeddings or reduce context size |
Checking RAG Status¶
To verify the RAG system status:
This will return information about indexed documents and the current configuration.
Viewing RAG Logs¶
To view the RAG service logs for diagnostic information:
Advanced Configuration¶
Using Multiple RAG Models¶
Open WebUI allows creating multiple models with different RAG configurations. To set up specialized models:
- In Open WebUI, go to "Models"
- Click "Create New Model"
- Select an Ollama model as the base
- Enable "Use with RAG"
- Configure RAG settings specific to this model
- Save the model
This allows having specialized models for different documentation sections or query types.
Hybrid Search¶
While not implemented in the current version, future updates may support hybrid search combining vector similarity with keyword matching for improved retrieval quality.
Example: Full Integration¶
Below is a complete example of docker-compose configuration for integrating Open WebUI with the Obelisk RAG system:
services:
open-webui:
container_name: open-webui
image: ghcr.io/open-webui/open-webui:main
environment:
- MODEL_DOWNLOAD_DIR=/models
- OLLAMA_API_BASE_URL=http://ollama:11434
- OLLAMA_URL=http://ollama:11434
- LOG_LEVEL=debug
# RAG configuration
- RAG_ENABLED=true
- RAG_SERVICE_TYPE=custom
- RAG_SERVICE_URL=http://obelisk-rag:8000/v1
- "RAG_TEMPLATE=You are a helpful assistant. Use the following pieces of retrieved context to answer the user's question. If you don't know the answer, just say that you don't know.\n\nContext:\n{{context}}\n\nUser question: {{query}}"
volumes:
- data:/data
- models:/models
- open-webui:/config
ports:
- "8080:8080"
depends_on:
- ollama
- obelisk-rag
networks:
- ollama-net
ollama:
container_name: ollama
image: ollama/ollama:latest
# ... ollama configuration ...
obelisk-rag:
container_name: obelisk-rag
build:
context: .
dockerfile: Dockerfile.rag
volumes:
- rag-data:/app/data
- rag-vault:/app/vault
ports:
- "8001:8000"
environment:
- VAULT_DIR=/app/vault
- CHROMA_DIR=/app/data/chroma_db
- OLLAMA_URL=http://ollama:11434
- OLLAMA_MODEL=llama3
- EMBEDDING_MODEL=mxbai-embed-large
- RETRIEVE_TOP_K=3
- API_HOST=0.0.0.0
- API_PORT=8000
- LOG_LEVEL=INFO
depends_on:
- ollama
networks:
- ollama-net
Further Reading¶
- Using RAG - Complete documentation on Obelisk's RAG capabilities
- Ollama Integration - Details on Ollama integration
- Open WebUI Documentation - Official Open WebUI documentation