Agentic RAG

Multi-agent retrieval-augmented generation system with 9 reasoning strategies, powered by Oracle AI Database and local Ollama LLMs. A complete RAG pipeline from document ingestion to intelligent query synthesis.

oracle-ai-developer-hub/apps/agentic_rag
Active Development
Default: gemma3:270m via Ollama
Dashboard generated 2026-02-27
Vector Collections
0
Oracle AI Vector Search
Reasoning Strategies
0
x2 modes (ensemble)
UI Interfaces
0
Gradio, WebUI, API, CLI
A2A Methods
0
JSON-RPC 2.0
Event Tables
0
Full observability
OpenAI-compat Models
0
via /v1/models
02

Architecture Map

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a2040', 'primaryTextColor': '#c8d6e5', 'primaryBorderColor': '#4fc3f7', 'lineColor': '#4fc3f7', 'secondaryColor': '#161b33', 'tertiaryColor': '#111529', 'edgeLabelBackground': '#0c0f1d', 'clusterBkg': '#111529', 'clusterBorder': '#1e2a4a', 'nodeTextColor': '#c8d6e5' }}}%% graph TB subgraph INPUT["INPUT LAYER"] direction LR PDF["PDF Documents
Docling"] WEB["Web Pages
Trafilatura"] CODE["Code Repos
Gitingest"] end subgraph PROCESSING["PROCESSING PIPELINE"] direction LR SPLIT["OracleTextSplitter
Recursive chunking"] EMBED["OracleEmbeddings
all-MiniLM-L6-v2"] VS["OracleVS
Vector store"] end subgraph STORAGE["ORACLE AI DATABASE"] direction LR C1["PDFCOLLECTION"] C2["CODECOLLECTION"] C3["WEBCOLLECTION"] C4["LOCALCOLLECTION"] ET["6 Event Tables
API | MODEL | DOC
INGEST | QUERY | A2A"] end subgraph INTELLIGENCE["INTELLIGENCE LAYER"] AGENT["LocalRAGAgent
Core orchestrator"] FACTORY["Agent Factory"] PLAN["Planner Agent"] RES["Researcher Agent"] REAS["Reasoner Agent"] SYN["Synthesizer Agent"] ENSEMBLE["RAGReasoningEnsemble
9 strategies x 2 modes"] end subgraph OUTPUT["OUTPUT INTERFACES"] direction LR GR["Gradio UI
:7860"] OW["Open WebUI
18 models"] FA["FastAPI
OpenAI-compat"] CLI["CLI
Terminal"] A2A["A2A Handler
JSON-RPC 2.0"] end PDF --> SPLIT WEB --> SPLIT CODE --> SPLIT SPLIT --> EMBED --> VS VS --> C1 VS --> C2 VS --> C3 VS --> C4 AGENT --> FACTORY FACTORY --> PLAN FACTORY --> RES FACTORY --> REAS FACTORY --> SYN AGENT --> ENSEMBLE C1 & C2 & C3 & C4 --> AGENT AGENT --> ET AGENT --> GR AGENT --> OW AGENT --> FA AGENT --> CLI AGENT --> A2A style INPUT fill:#111529,stroke:#4fc3f7,stroke-width:1px style PROCESSING fill:#111529,stroke:#7c4dff,stroke-width:1px style STORAGE fill:#111529,stroke:#ef5350,stroke-width:1px style INTELLIGENCE fill:#111529,stroke:#66bb6a,stroke-width:1px style OUTPUT fill:#111529,stroke:#ffa726,stroke-width:1px
03

Dependency Graph

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#1a2040', 'primaryTextColor': '#c8d6e5', 'primaryBorderColor': '#4fc3f7', 'lineColor': '#253358', 'secondaryColor': '#161b33', 'tertiaryColor': '#111529', 'nodeTextColor': '#c8d6e5' }}}%% graph LR APP(("Agentic
RAG")) subgraph CORE["CORE"] LC["langchain-oracledb"] ODB["oracledb"] AR["agent-reasoning"] OL["ollama"] end subgraph INGEST["INGESTION"] DOC["docling"] TRA["trafilatura"] GIT["gitingest"] end subgraph IFACE["INTERFACES"] FAPI["fastapi"] GRAD["gradio"] OWUI["open-webui"] end subgraph FALLBACK["FALLBACKS"] CHR["chromadb"] ST["sentence-transformers"] end APP --> LC & ODB & AR & OL APP --> DOC & TRA & GIT APP --> FAPI & GRAD & OWUI APP -.-> CHR & ST style APP fill:#4fc3f7,stroke:#4fc3f7,color:#0c0f1d,font-weight:bold style CORE fill:#111529,stroke:#4fc3f7,stroke-width:1px style INGEST fill:#111529,stroke:#66bb6a,stroke-width:1px style IFACE fill:#111529,stroke:#7c4dff,stroke-width:1px style FALLBACK fill:#111529,stroke:#ffa726,stroke-width:1px,stroke-dasharray:5 5
LC

langchain-oracledb

Oracle AI Vector Search integration for LangChain. Provides OracleVS, OracleEmbeddings, OracleTextSplitter.

core database
DB

oracledb

Python driver for Oracle Database. Thick/thin client for connection pooling and session management.

database
AR

agent-reasoning

9 reasoning strategies library: CoT, ToT, ReAct, MCTS, R1, Beam Search, Self-Consistency, PRM, Meta-Reasoning.

core
OL

ollama

Local LLM inference server. Default model: gemma3:270m. No cloud dependency.

core
DL

docling

PDF document processing with structure extraction. Handles tables, headers, and metadata.

ingestion
TF

trafilatura

Web content extraction. Cleans HTML to readable text with metadata preservation.

ingestion
GI

gitingest

Code repository ingestion. Processes Git repos into document chunks for RAG indexing.

ingestion
FA

fastapi

High-performance API framework. Serves OpenAI-compatible endpoints and A2A protocol.

interface
GR

gradio

Interactive web UI with tabbed interface for querying, uploading documents, and monitoring.

interface
WU

open-webui

Full-featured chat UI. Connects via OpenAI-compatible API exposing 18 reasoning models.

interface
CR

chromadb

Fallback vector store when Oracle DB is unavailable. Same interface abstraction.

fallback
ST

sentence-transformers

Fallback embedding model. Used when OracleEmbeddings is not configured.

fallback
04

API Surface

Method Endpoint Description Protocol
POST /upload/pdf Upload and process PDF documents via Docling. Chunks, embeds, and stores in PDFCOLLECTION. REST
POST /query Execute a RAG query with optional reasoning strategy and collection targeting. REST
POST /a2a Agent-to-Agent protocol endpoint. Accepts JSON-RPC 2.0 messages for inter-agent communication. A2A / JSON-RPC
GET /agent_card Returns the agent capability card describing skills, supported methods, and metadata. A2A
POST /v1/chat/completions OpenAI-compatible chat completions. Routes to reasoning models via model name. OpenAI
GET /v1/models Lists all 18 available reasoning models. Each maps to a strategy + mode combination. OpenAI
GET /events/statistics Aggregated event monitoring. Counts, average response times, error rates across all tables. REST
GET /events/{type} Event details filtered by type: all, a2a, api, model, document, query. REST
POST /sync/embeddings Synchronize document embeddings from Open WebUI into Oracle Vector Store. REST
05

Event Logging System

Six Oracle Database tables provide full observability across every layer of the system. Every operation is tracked with timestamps, durations, and contextual metadata.

API_EVENTS
Tracks all HTTP endpoint invocations. Captures request method, path, status code, response time, and client metadata.
endpoint method status_code response_ms client_ip timestamp
MODEL_EVENTS
Records every LLM invocation. Stores model name, prompt, response, token counts, and generation latency.
model_name prompt response tokens_in tokens_out latency_ms
DOCUMENT_EVENTS
Logs document processing operations: uploads, parsing, chunking. Tracks source type, chunk count, and processing time.
doc_type filename chunk_count collection process_ms
INGEST_EVENTS
Detailed ingestion tracking per chunk. Records embedding generation, vector storage, and metadata association.
chunk_id collection embed_ms store_ms metadata
QUERY_EVENTS
Captures user queries end-to-end. Stores the query text, retrieved context, reasoning strategy used, and final response.
query_text strategy context_docs response total_ms
A2A_EVENTS
Agent-to-Agent protocol logging. Records JSON-RPC method calls, task IDs, agent identifiers, and message payloads.
rpc_method task_id agent_id payload status
06

Deployment Matrix

Local
Direct Python execution. Fastest iteration cycle. Requires Oracle DB and Ollama running locally.
# Default: Gradio UI on :7860
$ python run_app.py

# Specific interface modes
$ python run_app.py --gradio
$ python run_app.py --openwebui
$ python run_app.py --api-only
Docker
Containerized deployment with GPU passthrough. Uses host networking for Oracle DB and Ollama access.
# Build with host network (pip)
$ docker build --network=host \
  -t agentic-rag .

# Run with GPU support
$ docker run -d --gpus all \
  --network=host agentic-rag
Kubernetes
Production-grade orchestration with HuggingFace token injection for gated model access.
# Deploy full stack
$ cd k8s
$ ./deploy.sh

# With HF token for gated models
$ ./deploy.sh \
  --hf-token TOKEN
07

Open WebUI Integration

Three custom Open WebUI functions bridge the chat interface with Oracle AI Database. Together they enable transparent RAG augmentation across every conversation.

oracle_rag_filter.py
Inlet/outlet filter that intercepts every message, queries Oracle DB for relevant context, and injects it into the prompt before it reaches the LLM.
Transparent context injection
Works with any model selection
Inlet: augment prompt / Outlet: log response
oracle_rag_pipe.py
Manifold pipe that creates 6 dedicated Oracle RAG model entries in the model selector. Each maps to a different reasoning strategy.
6 virtual models in model picker
Strategy-specific behavior per model
Streaming response support
oracle_document_sync.py
Manual sync action. Triggers bulk document synchronization from Open WebUI's knowledge base into Oracle Vector Store.
On-demand sync trigger
Deduplication by document hash
Progress reporting via UI
Auto-Persistence Features
Attached webpages are automatically chunked and stored in WEBCOLLECTION with full source metadata.
Uploaded URLs are processed through WebProcessor (Trafilatura) and persisted to WEBCOLLECTION with title, URL, extraction date, and content hash.
08

Notable Code Patterns

Metadata Monkeypatch
Dual-format metadata handling for LangChain compatibility. Document metadata can arrive as either a Python dict or a JSON-encoded string depending on the source. The monkeypatch normalizes both formats transparently.
# Normalize metadata: dict or JSON string
def _get_metadata(doc):
    meta = doc.metadata
    if isinstance(meta, str):
        try:
            meta = json.loads(meta)
        except json.JSONDecodeError:
            meta = {"raw": meta}
    return meta
Vector Store Abstraction
ChromaDB and Oracle share the same interface contract. A factory function returns the appropriate implementation based on configuration, allowing seamless fallback without changing any query code.
# Unified vector store interface
def get_vector_store(config):
    if config.use_oracle:
        return OracleVS(
            client=config.connection,
            embedding=config.embeddings,
            table_name=config.collection
        )
    return Chroma(collection_name=config.collection)
Agent Factory Pattern
Chain-of-Thought agent creation via factory. Each agent type (Planner, Researcher, Reasoner, Synthesizer) is instantiated with a role-specific system prompt and tool configuration, then composed into a pipeline.
# Factory creates role-specific agents
class AgentFactory:
    def create(self, role, llm, tools):
        prompt = self.prompts[role]
        agent = create_react_agent(
            llm=llm,
            tools=tools,
            prompt=prompt
        )
        return AgentExecutor(agent=agent)
Similarity Score Normalization
Oracle Vector Search returns raw distance values. The normalization formula converts distance to a 0-1 similarity score where 1.0 means identical. This enables consistent threshold-based filtering across different distance metrics.
# Distance to similarity conversion
# score = 1 / (1 + distance)
# distance=0 -> score=1.0 (identical)
# distance=inf -> score~0.0 (unrelated)

def normalize_score(distance):
    return 1.0 / (1.0 + distance)