OpenAI Model Integration¶
This document covers integrating OpenAI models with Obelisk's RAG system.
Supported Models¶
Obelisk RAG supports the following OpenAI models for embedding and completion tasks:
Model Datasheets¶
Feature | GPT-4o | GPT-4.1 | text-embedding-3-large |
---|---|---|---|
Description | Fast, intelligent, flexible GPT model | Flagship GPT model for complex tasks | Most capable embedding model |
Use in Obelisk | Completion generation | Completion generation | Document and query embedding |
Pricing (per 1M tokens) | $2.50 input / $10.00 output | $2.00 input / $8.00 output | $0.13 |
Context Window | 128,000 tokens | 1,047,576 tokens | N/A |
Max Output Tokens | 16,384 | 32,768 | N/A |
Knowledge Cutoff | Sep 30, 2023 | May 31, 2024 | Not applicable |
Dimensions | N/A | N/A | 3,072 |
Rate Limits (TPM) | Tier 1: 30,000 Tier 2: 450,000 | Tier 1: 30,000 Tier 2: 450,000 | Tier 1: 1,000,000 Tier 2: 1,000,000 |
Configuration¶
When an OpenAI API key is detected in the environment, Obelisk RAG can optionally use these models instead of local Ollama models, providing enhanced performance and capabilities especially for development environments without GPU access.
Environment Variables¶
To enable OpenAI models, set the following environment variables:
# Required
OPENAI_API_KEY=your_openai_api_key
# Optional
OPENAI_ORG_ID=your_org_id # For enterprise users
USE_OPENAI=true # Force using OpenAI (defaults to true when key exists)
OPENAI_EMBEDDING_MODEL=text-embedding-3-large # Default embedding model
OPENAI_COMPLETION_MODEL=gpt-4o # Default completion model
EMBEDDING_PROVIDER=openai # Set embedding provider (ollama or openai)
COMPLETION_PROVIDER=openai # Set completion provider (ollama or openai)
Docker Compose Usage¶
When using docker-compose, you can provide the OpenAI API key:
Technical Implementation¶
The OpenAI integration is implemented through several components:
1. LiteLLM Configuration¶
LiteLLM acts as a middleware layer that can route requests to either OpenAI or Ollama models. The configuration in litellm-config.yaml
includes:
- OpenAI model definitions (gpt-4o, text-embedding-3-large)
- Model aliases for simpler referencing
- Fallback mechanisms to ensure graceful degradation to Ollama models when needed
- Prioritized embedding model list
# Model routing with fallbacks
fallbacks: [
{
"model": "openai/gpt-4o",
"fallback_model": "ollama/llama3"
},
{
"model": "openai/text-embedding-3-large",
"fallback_model": "ollama/mxbai-embed-large"
}
]
2. Initialization Process¶
During container initialization:
- The
generate-tokens.sh
script checks for an OpenAI API key and adds it to the shared tokens file - The
configure-services.sh
script creates configurations for LiteLLM and Obelisk-RAG based on the presence of the API key - Models are automatically prioritized based on availability
3. Fallback Mechanism¶
If the OpenAI API is unavailable or rate-limited:
- LiteLLM will automatically route requests to the specified fallback model (Ollama)
- This ensures system reliability even when external API services are unavailable
- No code changes are needed as the fallback is handled at the routing layer
Testing the Integration¶
-
Start the Obelisk stack with your OpenAI API key:
-
Access the OpenWebUI interface at http://localhost:8080
-
Create a new chat and select LiteLLM as the provider
-
In the model dropdown, you should see OpenAI models (gpt-4o)
-
Test embedding functionality by creating a new RAG collection and uploading a document
Troubleshooting¶
API Key Issues¶
If your OpenAI API key is invalid or has insufficient permissions:
- Check the logs in the
litellm
container for API-related errors - Verify the key works using our testing script:
- Ensure your OpenAI account has billing enabled for API access
Fallback Issues¶
If fallback to Ollama models isn't working:
- Verify Ollama models are downloaded and available
- Check the fallback configuration in
config/litellm_config.yaml
- Examine the logs for routing-related errors
Limitations¶
- Embedding caching is not yet implemented (planned for future releases)
- Rate limiting for OpenAI API is not currently enforced
- Cost optimization features are not yet available