agentic_rag / Technical Explainer

From Document to Answer

A six-stage journey through ingestion, vectorization, and AI-reasoned retrieval — all powered by Oracle Database and nine reasoning strategies.

6 Stages 9 Strategies 4 Collections

1
Stage One

Document Ingestion

Every journey begins with raw information. A user uploads a PDF, provides a URL, or points to a GitHub repository. Three specialized processors handle each input type, normalizing wildly different formats into clean Markdown.

PDFProcessor

Uses Docling to parse complex layouts. Runs DocumentConverter then pipes output through export_to_markdown().

WebProcessor

Uses Trafilatura to cleanly extract article text via fetch_url() + extract(). Also captures author, date, tags, and categories via extract_metadata().

RepoProcessor

Uses Gitingest to crawl repositories, preserving file structure and extracting code with context-aware boundaries.

# Each processor returns normalized Markdown from docling.document_converter import DocumentConverter converter = DocumentConverter() result = converter.convert("report.pdf") markdown = result.document.export_to_markdown() # Web extraction with metadata import trafilatura downloaded = trafilatura.fetch_url("https://example.com/article") text = trafilatura.extract(downloaded) meta = trafilatura.extract_metadata(downloaded) # meta.author, meta.date, meta.tags, meta.categories
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#2a2522', 'primaryTextColor': '#f5e6d3', 'primaryBorderColor': '#e8913a', 'lineColor': '#d4763c', 'secondaryColor': '#221e1b', 'tertiaryColor': '#1a1614', 'edgeLabelBackground': '#1a1614', 'clusterBkg': '#221e1b', 'clusterBorder': '#c75b39', 'nodeTextColor': '#f5e6d3' }}}%% graph TD A["fa:fa-file-pdf PDF Upload"] -->|Docling| D["DocumentConverter"] B["fa:fa-globe URL / Web Page"] -->|Trafilatura| E["fetch_url + extract"] C["fa:fa-code GitHub Repo"] -->|Gitingest| F["Code + Structure"] D --> G["Normalized Markdown"] E --> G F --> G G --> H["Ready for Chunking"] style A fill:#2a2522,stroke:#e8913a,color:#f5e6d3 style B fill:#2a2522,stroke:#e8913a,color:#f5e6d3 style C fill:#2a2522,stroke:#e8913a,color:#f5e6d3 style D fill:#221e1b,stroke:#d4763c,color:#f5e6d3 style E fill:#221e1b,stroke:#d4763c,color:#f5e6d3 style F fill:#221e1b,stroke:#d4763c,color:#f5e6d3 style G fill:#2a2522,stroke:#c75b39,color:#f5e6d3 style H fill:#2a2522,stroke:#e8913a,color:#e8913a
Fig 1.1 — Three input paths converge into normalized Markdown
2
Stage Two

Chunking & Normalization

Raw Markdown is still too large for embedding models. OracleTextSplitter runs server-side inside Oracle Database itself, splitting text into overlapping chunks while normalizing whitespace and Unicode.

Each chunk is enriched with metadata: its source document, a unique document_id, and a sequential chunk_index for reassembly.

from langchain_community.document_loaders import OracleTextSplitter splitter = OracleTextSplitter( conn=oracle_connection, max_chunk_size=512, overlap=64, normalize="all" # whitespace + unicode normalization ) chunks = splitter.split_text(markdown) # Each chunk: { text, source, document_id, chunk_index }
Before — Raw Text
The   quick brown fox jumped
over the    lazy dog.  Unicode
chars like \u2018smart quotes\u2019
and em\u2014dashes are   scattered
throughout   the    document
alongside     irregular
whitespace and  line   breaks...
After — Normalized Chunk
{
  "text": "The quick brown fox jumped
over the lazy dog. Unicode chars
like 'smart quotes' and em-dashes
are scattered throughout the
document alongside irregular
whitespace and line breaks...",
  "source": "report.pdf",
  "document_id": "d7f3a1b2",
  "chunk_index": 14
}
512
max tokens
per chunk
64
overlap
tokens
server
side splitting
in Oracle DB
3
Stage Three

Embedding & Vectorization

OracleEmbeddings generates 384-dimensional vectors entirely inside the database — no external API calls. The model ALL_MINILM_L12_V2 converts each text chunk into a dense numerical representation.

Vectors are stored alongside their source text in OracleVS tables, organized into four separate collections for precise retrieval scoping.

from langchain_community.embeddings import OracleEmbeddings from langchain_community.vectorstores import OracleVS from langchain_community.vectorstores.utils import DistanceStrategy # Embedding model runs IN-DATABASE embeddings = OracleEmbeddings( conn=oracle_connection, model_name="ALL_MINILM_L12_V2", proxy="" # no external calls needed ) # Store in the appropriate collection vector_store = OracleVS.from_documents( documents=chunks, embedding=embeddings, client=oracle_connection, table_name="PDF_COLLECTION", distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE )
"The quick brown
fox jumped over..."
text chunk
ALL_MINILM_L12_V2
in-database model
[0.023, -0.417, ...,
0.891] × 384
384-dim vector
OracleVS Table
stored with metadata
Zero external dependencies. Both embedding generation and vector storage happen inside Oracle Database. No network round-trips, no API keys, no cold starts.
Collection Source Type Typical Use
PDF_COLLECTION Uploaded PDF documents Reports, papers, manuals
WEB_COLLECTION Crawled web pages Articles, documentation
REPO_COLLECTION GitHub repositories Source code, READMEs
GENERAL_COLLECTION Mixed / unclassified Cross-source queries
384
vector
dimensions
4
separate
collections
0
external
API calls
4
Stage Four

Querying with RAG

A user asks a question through Gradio, Open WebUI, the REST API, or directly from the CLI. The system selects the appropriate collection based on context, then performs similarity search.

OracleVS.similarity_search_with_score() finds the top-K most relevant chunks using Euclidean distance. The similarity score is computed as:

score = 1 / (1 + euclidean_distance)

Scores closer to 1.0 indicate near-perfect matches; scores closer to 0.0 mean the chunk is semantically distant from the query.

# User query arrives from any interface query = "How does the authentication system handle token refresh?" # Select collection and retrieve top-K chunks results = vector_store.similarity_search_with_score( query=query, k=5 ) # Each result: (Document, score) for doc, score in results: print(f"Score: {score:.4f} | Source: {doc.metadata['source']}") # Score: 0.8721 | Source: auth_module.py # Score: 0.8134 | Source: README.md # Score: 0.7456 | Source: api_docs.pdf
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#2a2522', 'primaryTextColor': '#f5e6d3', 'primaryBorderColor': '#e8913a', 'lineColor': '#d4763c', 'secondaryColor': '#221e1b', 'tertiaryColor': '#1a1614', 'edgeLabelBackground': '#1a1614', 'nodeTextColor': '#f5e6d3' }}}%% graph LR Q["User Query"] --> S{"Select
Collection"} S -->|PDF context| P["PDF_COLLECTION"] S -->|Web context| W["WEB_COLLECTION"] S -->|Code context| R["REPO_COLLECTION"] S -->|Mixed| G["GENERAL_COLLECTION"] P --> VS["similarity_search_with_score()"] W --> VS R --> VS G --> VS VS --> C["Top-K Chunks
+ Scores"] C --> LLM["Context for LLM"] style Q fill:#2a2522,stroke:#e8913a,color:#e8913a style S fill:#221e1b,stroke:#d4763c,color:#f5e6d3 style P fill:#2a2522,stroke:#c75b39,color:#f5e6d3 style W fill:#2a2522,stroke:#c75b39,color:#f5e6d3 style R fill:#2a2522,stroke:#c75b39,color:#f5e6d3 style G fill:#2a2522,stroke:#c75b39,color:#f5e6d3 style VS fill:#221e1b,stroke:#e8913a,color:#f5e6d3 style C fill:#2a2522,stroke:#d4763c,color:#f5e6d3 style LLM fill:#2a2522,stroke:#e8913a,color:#e8913a
Fig 4.1 — Query routing and similarity search flow
5
top-K
retrieved
0.87
typical best
match score
4
interface
options
5
Stage Five

Reasoning Strategy Selection

The user selects a reasoning strategy — or it is chosen automatically via the Open WebUI "model" dropdown. With 9 base strategies available in both standalone and RAG-augmented modes, the system offers 18 total configurations.

Standalone mode uses only the LLM's parametric knowledge. RAG mode augments the query with retrieved context from Stage 4. Same strategy, different information budget.
01
standard
Direct response. Single-pass generation with no intermediate reasoning.
02
cot
Chain of Thought. Step-by-step reasoning, showing the work before the answer.
03
tot
Tree of Thoughts. BFS exploration with scoring and pruning. Width=2, depth=3.
04
react
Reason+Act loop with tools: calculate, web_search, search.
05
self_reflection
Draft, Critique, Refine loop. Iterates up to 5 times until satisfied.
06
consistency
5 parallel samples with majority voting for robust consensus.
07
decomposed
Break into sub-tasks, solve independently, synthesize into a unified answer.
08
least_to_most
Easy-to-hard sub-questions with accumulating context at each step.
09
recursive
Python code generation with recursive LLM calls. Max 8 steps deep.
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#2a2522', 'primaryTextColor': '#f5e6d3', 'primaryBorderColor': '#e8913a', 'lineColor': '#d4763c', 'secondaryColor': '#221e1b', 'tertiaryColor': '#1a1614', 'edgeLabelBackground': '#1a1614', 'nodeTextColor': '#f5e6d3' }}}%% graph TD U["User / Open WebUI"] --> M{"Strategy
Selection"} M --> SA["Standalone Mode"] M --> RA["RAG-Augmented Mode"] SA --> S1["9 Base Strategies"] RA --> S2["9 Base Strategies
+ Retrieved Context"] S1 --> OUT["18 Total Configurations"] S2 --> OUT style U fill:#2a2522,stroke:#e8913a,color:#e8913a style M fill:#221e1b,stroke:#d4763c,color:#f5e6d3 style SA fill:#2a2522,stroke:#c75b39,color:#f5e6d3 style RA fill:#2a2522,stroke:#c75b39,color:#f5e6d3 style S1 fill:#221e1b,stroke:#d4763c,color:#f5e6d3 style S2 fill:#221e1b,stroke:#d4763c,color:#f5e6d3 style OUT fill:#2a2522,stroke:#e8913a,color:#e8913a
Fig 5.1 — 9 strategies × 2 modes = 18 configurations
9
base
strategies
2
modes per
strategy
18
total
configurations
6
Stage Six

Answer Generation

RAGReasoningEnsemble weaves the retrieved context into the user's query and hands it to the selected reasoning strategy. For Chain-of-Thought in agentic_rag, four specialized agents collaborate in a pipeline:

Planner decomposes the task
Researcher gathers evidence
Reasoner analyzes & infers
Synthesizer composes answer
from agentic_rag.ensemble import RAGReasoningEnsemble ensemble = RAGReasoningEnsemble( strategy="cot", vector_store=vector_store, llm=ollama_model, top_k=5 ) result = ensemble.process(query="How does token refresh work?") # Result structure print(result.answer) # The final answer print(result.reasoning_steps) # Step-by-step trace print(result.sources) # Source documents used print(result.execution_trace) # Full agent execution log

Every interaction is logged to Oracle Database across multiple event tables, creating a complete audit trail for observability and debugging:

API_EVENTS MODEL_EVENTS QUERY_EVENTS RETRIEVAL_EVENTS STRATEGY_EVENTS AGENT_EVENTS ERROR_EVENTS
%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#2a2522', 'primaryTextColor': '#f5e6d3', 'primaryBorderColor': '#e8913a', 'lineColor': '#d4763c', 'secondaryColor': '#221e1b', 'tertiaryColor': '#1a1614', 'edgeLabelBackground': '#1a1614', 'nodeTextColor': '#f5e6d3' }}}%% graph TD Q["User Query + Context"] --> ENS["RAGReasoningEnsemble"] ENS --> AUG["Augment query
with retrieved chunks"] AUG --> STR{"Selected Strategy"} STR -->|cot| PLAN["Planner Agent"] PLAN --> RES["Researcher Agent"] RES --> REA["Reasoner Agent"] REA --> SYN["Synthesizer Agent"] STR -->|tot| TOT["BFS Tree
Exploration"] STR -->|other| OTH["Strategy-specific
Pipeline"] SYN --> ANS["Final Answer"] TOT --> ANS OTH --> ANS ANS --> LOG["Oracle Event Logging"] style Q fill:#2a2522,stroke:#e8913a,color:#e8913a style ENS fill:#221e1b,stroke:#d4763c,color:#f5e6d3 style AUG fill:#2a2522,stroke:#c75b39,color:#f5e6d3 style STR fill:#221e1b,stroke:#e8913a,color:#f5e6d3 style PLAN fill:#2a2522,stroke:#d4763c,color:#f5e6d3 style RES fill:#2a2522,stroke:#d4763c,color:#f5e6d3 style REA fill:#2a2522,stroke:#d4763c,color:#f5e6d3 style SYN fill:#2a2522,stroke:#d4763c,color:#f5e6d3 style TOT fill:#2a2522,stroke:#c75b39,color:#f5e6d3 style OTH fill:#2a2522,stroke:#c75b39,color:#f5e6d3 style ANS fill:#2a2522,stroke:#e8913a,color:#e8913a style LOG fill:#221e1b,stroke:#c75b39,color:#f5e6d3
Fig 6.1 — Answer generation with RAGReasoningEnsemble
// Example response structure { "answer": "The authentication system handles token refresh through a background daemon that monitors JWT expiry timestamps. When a token reaches 80% of its TTL, the refresh cycle initiates...", "reasoning_steps": [ "Identified 3 relevant code modules in REPO_COLLECTION", "Cross-referenced with API documentation in WEB_COLLECTION", "Synthesized token lifecycle from auth_handler.py (score: 0.89)" ], "sources": [ { "file": "auth_handler.py", "chunk": 7, "score": 0.8912 }, { "file": "api_docs.pdf", "chunk": 23, "score": 0.8134 } ], "execution_trace": { "strategy": "cot", "mode": "rag", "agents_invoked": ["planner", "researcher", "reasoner", "synthesizer"], "total_llm_calls": 6, "elapsed_ms": 4821 } }