Query Pipeline¶

The query pipeline is a critical component of the Obelisk RAG system, processing user questions and retrieving relevant content to enhance AI responses.

Query Flow Architecture¶

The query pipeline follows these steps:

sequenceDiagram
    participant User
    participant QueryProcessor
    participant VectorDB
    participant PromptEngine
    participant Ollama

    User->>QueryProcessor: Ask question
    QueryProcessor->>QueryProcessor: Preprocess query
    QueryProcessor->>VectorDB: Generate embedding
    VectorDB->>QueryProcessor: Return similar chunks
    QueryProcessor->>PromptEngine: Pass query + chunks
    PromptEngine->>Ollama: Send enhanced prompt
    Ollama->>User: Return augmented response

Query Preprocessing¶

Before retrieval, queries undergo several preprocessing steps:

Query expansion: Enhance queries with related terms
Intent recognition: Identify the type of question
Metadata extraction: Extract filters from the query
Language detection: Handle multilingual queries
Query rewriting: Optimize for retrieval performance

Example implementation:

# Future implementation example
def preprocess_query(query_text):
    """Preprocess user query for optimal retrieval."""
    # Clean and normalize text
    cleaned_query = clean_text(query_text)

    # Extract potential filters
    filters = extract_metadata_filters(cleaned_query)

    # Expand query with related terms
    expanded_query = query_expansion(cleaned_query)

    return {
        "original_query": query_text,
        "processed_query": expanded_query,
        "filters": filters,
        "detected_intent": detect_intent(cleaned_query)
    }

Retrieval Strategies¶

The pipeline will implement multiple retrieval strategies:

1. Dense Retrieval¶

Using vector similarity to find relevant content:

Embedding space: Convert query to the same embedding space as documents
Similarity metrics: Cosine similarity, dot product, or Euclidean distance
Top-k retrieval: Return the k most similar chunks

2. Hybrid Search¶

Combining multiple search techniques:

BM25 keyword search: Traditional information retrieval
Dense vector search: Semantic similarity
Fusion methods: Reciprocal rank fusion or weighted combinations

3. Multi-stage Retrieval¶

A two-step process for better results:

Initial retrieval: Get a larger set of potentially relevant chunks
Re-ranking: Apply more complex models to re-rank results
Diversity optimization: Ensure varied context

Context Assembly¶

Retrieved chunks are assembled into a coherent context:

Chunk sorting: Order by relevance and document structure
Deduplication: Remove redundant information
Context limitation: Fit within model context window
Metadata inclusion: Add source information

Prompt Engineering¶

Crafting effective prompts is essential for quality responses:

Basic RAG Prompt Template¶

You are an assistant for the Obelisk documentation.
Answer the question based ONLY on the following context:

{retrieved_context}

Question: {user_question}

Answer:

Advanced RAG Prompt Template¶

You are an assistant for the Obelisk documentation system.
Use ONLY the following retrieved documentation to answer the user's question.
If the information is not in the retrieved docs, acknowledge that and suggest where they might find the information.

Retrieved documentation:
{retrieved_chunks}

User question: {user_question}

Respond in a helpful, concise manner. Include code examples if relevant.
Always cite your sources using the document names provided in the retrieved chunks.

Customizing Prompts¶

The Obelisk RAG system allows for prompt template customization to better fit specific use cases:

Template Variables¶

You can use the following variables in your prompt templates:

{user_question}: The original question asked by the user
{retrieved_context}: The full context assembled from retrieved chunks
{retrieved_chunks}: An array of individual content chunks with metadata
{chunk_count}: The number of chunks retrieved
{confidence_score}: The confidence score of the retrieval

Custom Prompt Configuration¶

Prompt templates can be customized through environment variables or the configuration API:

# Set a custom prompt template
export OBELISK_PROMPT_TEMPLATE="You are an Obelisk expert. Use the following information to answer the question:\n\n{retrieved_context}\n\nQuestion: {user_question}\n\nAnswer:"

# Or using the config API
obelisk-rag config --set "prompt_template=You are an Obelisk expert. Use the following information to answer the question:\n\n{retrieved_context}\n\nQuestion: {user_question}\n\nAnswer:"

Advanced Prompt Engineering Techniques¶

For optimal RAG performance, consider these prompt engineering practices:

Clear instructions: Include specific instructions on how to use the context
Context formatting: Format the context for better readability by the model
Response formatting: Specify desired response format (bullets, paragraphs, etc.)
Source attribution: Instruct the model to cite sources from the retrieved chunks
Fallback handling: Guide how to respond when information is not in the context

Response Generation¶

The final step involves:

Model invocation: Send the assembled prompt to Ollama
Parameter optimization: Adjust temperature, top_p, etc.
Citation tracking: Maintain source references
Response validation: Ensure factuality and relevance
Fallback strategies: Handle cases with no relevant context

Measuring Effectiveness¶

The RAG pipeline will include evaluation metrics:

Retrieval precision/recall: Measure retrieval quality
Answer relevance: Assess response relevance
Factual accuracy: Verify factual correctness
Citation accuracy: Check if sources are properly cited
User satisfaction: Collect user feedback