From Document to Answer — The Agentic RAG Journey

1

Stage One

Document Ingestion

Every journey begins with raw information. A user uploads a PDF, provides a URL, or points to a GitHub repository. Three specialized processors handle each input type, normalizing wildly different formats into clean Markdown.

PDFProcessor

Uses Docling to parse complex layouts. Runs DocumentConverter then pipes output through export_to_markdown().

WebProcessor

Uses Trafilatura to cleanly extract article text via fetch_url() + extract(). Also captures author, date, tags, and categories via extract_metadata().

RepoProcessor

Uses Gitingest to crawl repositories, preserving file structure and extracting code with context-aware boundaries.

        # Each processor returns normalized Markdown
from docling.document_converter import DocumentConverter

converter = DocumentConverter()
result = converter.convert("report.pdf")
markdown = result.document.export_to_markdown()

# Web extraction with metadata
import trafilatura
downloaded = trafilatura.fetch_url("https://example.com/article")
text = trafilatura.extract(downloaded)
meta = trafilatura.extract_metadata(downloaded)
# meta.author, meta.date, meta.tags, meta.categories
      

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#2a2522', 'primaryTextColor': '#f5e6d3', 'primaryBorderColor': '#e8913a', 'lineColor': '#d4763c', 'secondaryColor': '#221e1b', 'tertiaryColor': '#1a1614', 'edgeLabelBackground': '#1a1614', 'clusterBkg': '#221e1b', 'clusterBorder': '#c75b39', 'nodeTextColor': '#f5e6d3' }}}%% graph TD A["fa:fa-file-pdf PDF Upload"] -->|Docling| D["DocumentConverter"] B["fa:fa-globe URL / Web Page"] -->|Trafilatura| E["fetch_url + extract"] C["fa:fa-code GitHub Repo"] -->|Gitingest| F["Code + Structure"] D --> G["Normalized Markdown"] E --> G F --> G G --> H["Ready for Chunking"] style A fill:#2a2522,stroke:#e8913a,color:#f5e6d3 style B fill:#2a2522,stroke:#e8913a,color:#f5e6d3 style C fill:#2a2522,stroke:#e8913a,color:#f5e6d3 style D fill:#221e1b,stroke:#d4763c,color:#f5e6d3 style E fill:#221e1b,stroke:#d4763c,color:#f5e6d3 style F fill:#221e1b,stroke:#d4763c,color:#f5e6d3 style G fill:#2a2522,stroke:#c75b39,color:#f5e6d3 style H fill:#2a2522,stroke:#e8913a,color:#e8913a

Fig 1.1 — Three input paths converge into normalized Markdown

2

Stage Two

Chunking & Normalization

Raw Markdown is still too large for embedding models. OracleTextSplitter runs server-side inside Oracle Database itself, splitting text into overlapping chunks while normalizing whitespace and Unicode.

Each chunk is enriched with metadata: its source document, a unique document_id, and a sequential chunk_index for reassembly.

        from langchain_community.document_loaders import OracleTextSplitter

splitter = OracleTextSplitter(
    conn=oracle_connection,
    max_chunk_size=512,
    overlap=64,
    normalize="all"  # whitespace + unicode normalization
)

chunks = splitter.split_text(markdown)
# Each chunk: { text, source, document_id, chunk_index }
      

Before — Raw Text

The   quick brown fox jumped
over the    lazy dog.  Unicode
chars like \u2018smart quotes\u2019
and em\u2014dashes are   scattered
throughout   the    document
alongside     irregular
whitespace and  line   breaks...

After — Normalized Chunk

{
  "text": "The quick brown fox jumped
over the lazy dog. Unicode chars
like 'smart quotes' and em-dashes
are scattered throughout the
document alongside irregular
whitespace and line breaks...",
  "source": "report.pdf",
  "document_id": "d7f3a1b2",
  "chunk_index": 14
}

512

max tokens
per chunk

64

overlap
tokens

server

side splitting
in Oracle DB

3

Stage Three

Embedding & Vectorization

OracleEmbeddings generates 384-dimensional vectors entirely inside the database — no external API calls. The model ALL_MINILM_L12_V2 converts each text chunk into a dense numerical representation.

Vectors are stored alongside their source text in OracleVS tables, organized into four separate collections for precise retrieval scoping.

        from langchain_community.embeddings import OracleEmbeddings
from langchain_community.vectorstores import OracleVS
from langchain_community.vectorstores.utils import DistanceStrategy

# Embedding model runs IN-DATABASE
embeddings = OracleEmbeddings(
    conn=oracle_connection,
    model_name="ALL_MINILM_L12_V2",
    proxy=""  # no external calls needed
)

# Store in the appropriate collection
vector_store = OracleVS.from_documents(
    documents=chunks,
    embedding=embeddings,
    client=oracle_connection,
    table_name="PDF_COLLECTION",
    distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE
)
      

"The quick brown
fox jumped over..."

text chunk

→

ALL_MINILM_L12_V2

in-database model

→

[0.023, -0.417, ...,
0.891] × 384

384-dim vector

→

OracleVS Table

stored with metadata

Zero external dependencies. Both embedding generation and vector storage happen inside Oracle Database. No network round-trips, no API keys, no cold starts.

Collection	Source Type	Typical Use
PDF_COLLECTION	Uploaded PDF documents	Reports, papers, manuals
WEB_COLLECTION	Crawled web pages	Articles, documentation
REPO_COLLECTION	GitHub repositories	Source code, READMEs
GENERAL_COLLECTION	Mixed / unclassified	Cross-source queries

384

vector
dimensions

4

separate
collections

0

external
API calls

4

Stage Four

Querying with RAG

A user asks a question through Gradio, Open WebUI, the REST API, or directly from the CLI. The system selects the appropriate collection based on context, then performs similarity search.

OracleVS.similarity_search_with_score() finds the top-K most relevant chunks using Euclidean distance. The similarity score is computed as:

score = 1 / (1 + euclidean_distance)

Scores closer to 1.0 indicate near-perfect matches; scores closer to 0.0 mean the chunk is semantically distant from the query.

        # User query arrives from any interface
query = "How does the authentication system handle token refresh?"

# Select collection and retrieve top-K chunks
results = vector_store.similarity_search_with_score(
    query=query,
    k=5
)

# Each result: (Document, score)
for doc, score in results:
    print(f"Score: {score:.4f} | Source: {doc.metadata['source']}")
    # Score: 0.8721 | Source: auth_module.py
    # Score: 0.8134 | Source: README.md
    # Score: 0.7456 | Source: api_docs.pdf
      

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#2a2522', 'primaryTextColor': '#f5e6d3', 'primaryBorderColor': '#e8913a', 'lineColor': '#d4763c', 'secondaryColor': '#221e1b', 'tertiaryColor': '#1a1614', 'edgeLabelBackground': '#1a1614', 'nodeTextColor': '#f5e6d3' }}}%% graph LR Q["User Query"] --> S{"Select
Collection"} S -->|PDF context| P["PDF_COLLECTION"] S -->|Web context| W["WEB_COLLECTION"] S -->|Code context| R["REPO_COLLECTION"] S -->|Mixed| G["GENERAL_COLLECTION"] P --> VS["similarity_search_with_score()"] W --> VS R --> VS G --> VS VS --> C["Top-K Chunks
+ Scores"] C --> LLM["Context for LLM"] style Q fill:#2a2522,stroke:#e8913a,color:#e8913a style S fill:#221e1b,stroke:#d4763c,color:#f5e6d3 style P fill:#2a2522,stroke:#c75b39,color:#f5e6d3 style W fill:#2a2522,stroke:#c75b39,color:#f5e6d3 style R fill:#2a2522,stroke:#c75b39,color:#f5e6d3 style G fill:#2a2522,stroke:#c75b39,color:#f5e6d3 style VS fill:#221e1b,stroke:#e8913a,color:#f5e6d3 style C fill:#2a2522,stroke:#d4763c,color:#f5e6d3 style LLM fill:#2a2522,stroke:#e8913a,color:#e8913a

Fig 4.1 — Query routing and similarity search flow

5

top-K
retrieved

0.87

typical best
match score

4

interface
options

5

Stage Five

Reasoning Strategy Selection

The user selects a reasoning strategy — or it is chosen automatically via the Open WebUI "model" dropdown. With 9 base strategies available in both standalone and RAG-augmented modes, the system offers 18 total configurations.

Standalone mode uses only the LLM's parametric knowledge. RAG mode augments the query with retrieved context from Stage 4. Same strategy, different information budget.

01

standard

Direct response. Single-pass generation with no intermediate reasoning.

02

cot

Chain of Thought. Step-by-step reasoning, showing the work before the answer.

03

tot

Tree of Thoughts. BFS exploration with scoring and pruning. Width=2, depth=3.

04

react

Reason+Act loop with tools: calculate, web_search, search.

05

self_reflection

Draft, Critique, Refine loop. Iterates up to 5 times until satisfied.

06

consistency

5 parallel samples with majority voting for robust consensus.

07

decomposed

Break into sub-tasks, solve independently, synthesize into a unified answer.

08

least_to_most

Easy-to-hard sub-questions with accumulating context at each step.

09

recursive

Python code generation with recursive LLM calls. Max 8 steps deep.

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#2a2522', 'primaryTextColor': '#f5e6d3', 'primaryBorderColor': '#e8913a', 'lineColor': '#d4763c', 'secondaryColor': '#221e1b', 'tertiaryColor': '#1a1614', 'edgeLabelBackground': '#1a1614', 'nodeTextColor': '#f5e6d3' }}}%% graph TD U["User / Open WebUI"] --> M{"Strategy
Selection"} M --> SA["Standalone Mode"] M --> RA["RAG-Augmented Mode"] SA --> S1["9 Base Strategies"] RA --> S2["9 Base Strategies
+ Retrieved Context"] S1 --> OUT["18 Total Configurations"] S2 --> OUT style U fill:#2a2522,stroke:#e8913a,color:#e8913a style M fill:#221e1b,stroke:#d4763c,color:#f5e6d3 style SA fill:#2a2522,stroke:#c75b39,color:#f5e6d3 style RA fill:#2a2522,stroke:#c75b39,color:#f5e6d3 style S1 fill:#221e1b,stroke:#d4763c,color:#f5e6d3 style S2 fill:#221e1b,stroke:#d4763c,color:#f5e6d3 style OUT fill:#2a2522,stroke:#e8913a,color:#e8913a

Fig 5.1 — 9 strategies × 2 modes = 18 configurations

9

base
strategies

2

modes per
strategy

18

total
configurations

6

Stage Six

Answer Generation

RAGReasoningEnsemble weaves the retrieved context into the user's query and hands it to the selected reasoning strategy. For Chain-of-Thought in agentic_rag, four specialized agents collaborate in a pipeline:

Planner decomposes the task

→

Researcher gathers evidence

→

Reasoner analyzes & infers

→

Synthesizer composes answer

        from agentic_rag.ensemble import RAGReasoningEnsemble

ensemble = RAGReasoningEnsemble(
    strategy="cot",
    vector_store=vector_store,
    llm=ollama_model,
    top_k=5
)

result = ensemble.process(query="How does token refresh work?")

# Result structure
print(result.answer)           # The final answer
print(result.reasoning_steps)  # Step-by-step trace
print(result.sources)           # Source documents used
print(result.execution_trace)   # Full agent execution log
      

Every interaction is logged to Oracle Database across multiple event tables, creating a complete audit trail for observability and debugging:

API_EVENTS MODEL_EVENTS QUERY_EVENTS RETRIEVAL_EVENTS STRATEGY_EVENTS AGENT_EVENTS ERROR_EVENTS

%%{init: {'theme': 'dark', 'themeVariables': { 'primaryColor': '#2a2522', 'primaryTextColor': '#f5e6d3', 'primaryBorderColor': '#e8913a', 'lineColor': '#d4763c', 'secondaryColor': '#221e1b', 'tertiaryColor': '#1a1614', 'edgeLabelBackground': '#1a1614', 'nodeTextColor': '#f5e6d3' }}}%% graph TD Q["User Query + Context"] --> ENS["RAGReasoningEnsemble"] ENS --> AUG["Augment query
with retrieved chunks"] AUG --> STR{"Selected Strategy"} STR -->|cot| PLAN["Planner Agent"] PLAN --> RES["Researcher Agent"] RES --> REA["Reasoner Agent"] REA --> SYN["Synthesizer Agent"] STR -->|tot| TOT["BFS Tree
Exploration"] STR -->|other| OTH["Strategy-specific
Pipeline"] SYN --> ANS["Final Answer"] TOT --> ANS OTH --> ANS ANS --> LOG["Oracle Event Logging"] style Q fill:#2a2522,stroke:#e8913a,color:#e8913a style ENS fill:#221e1b,stroke:#d4763c,color:#f5e6d3 style AUG fill:#2a2522,stroke:#c75b39,color:#f5e6d3 style STR fill:#221e1b,stroke:#e8913a,color:#f5e6d3 style PLAN fill:#2a2522,stroke:#d4763c,color:#f5e6d3 style RES fill:#2a2522,stroke:#d4763c,color:#f5e6d3 style REA fill:#2a2522,stroke:#d4763c,color:#f5e6d3 style SYN fill:#2a2522,stroke:#d4763c,color:#f5e6d3 style TOT fill:#2a2522,stroke:#c75b39,color:#f5e6d3 style OTH fill:#2a2522,stroke:#c75b39,color:#f5e6d3 style ANS fill:#2a2522,stroke:#e8913a,color:#e8913a style LOG fill:#221e1b,stroke:#c75b39,color:#f5e6d3

Fig 6.1 — Answer generation with RAGReasoningEnsemble

        // Example response structure
{
  "answer": "The authentication system handles token refresh through a background daemon that monitors JWT expiry timestamps. When a token reaches 80% of its TTL, the refresh cycle initiates...",
  "reasoning_steps": [
    "Identified 3 relevant code modules in REPO_COLLECTION",
    "Cross-referenced with API documentation in WEB_COLLECTION",
    "Synthesized token lifecycle from auth_handler.py (score: 0.89)"
  ],
  "sources": [
    { "file": "auth_handler.py", "chunk": 7, "score": 0.8912 },
    { "file": "api_docs.pdf", "chunk": 23, "score": 0.8134 }
  ],
  "execution_trace": {
    "strategy": "cot",
    "mode": "rag",
    "agents_invoked": ["planner", "researcher", "reasoner", "synthesizer"],
    "total_llm_calls": 6,
    "elapsed_ms": 4821
  }
}