A six-stage journey through ingestion, vectorization, and AI-reasoned retrieval — all powered by Oracle Database and nine reasoning strategies.
Every journey begins with raw information. A user uploads a PDF, provides a URL, or points to a GitHub repository. Three specialized processors handle each input type, normalizing wildly different formats into clean Markdown.
Uses Docling to parse complex layouts. Runs DocumentConverter then pipes output through export_to_markdown().
Uses Trafilatura to cleanly extract article text via fetch_url() + extract(). Also captures author, date, tags, and categories via extract_metadata().
Uses Gitingest to crawl repositories, preserving file structure and extracting code with context-aware boundaries.
# Each processor returns normalized Markdown
from docling.document_converter import DocumentConverter
converter = DocumentConverter()
result = converter.convert("report.pdf")
markdown = result.document.export_to_markdown()
# Web extraction with metadata
import trafilatura
downloaded = trafilatura.fetch_url("https://example.com/article")
text = trafilatura.extract(downloaded)
meta = trafilatura.extract_metadata(downloaded)
# meta.author, meta.date, meta.tags, meta.categories
Raw Markdown is still too large for embedding models. OracleTextSplitter runs server-side inside Oracle Database itself, splitting text into overlapping chunks while normalizing whitespace and Unicode.
Each chunk is enriched with metadata: its source document, a unique document_id, and a sequential chunk_index for reassembly.
from langchain_community.document_loaders import OracleTextSplitter
splitter = OracleTextSplitter(
conn=oracle_connection,
max_chunk_size=512,
overlap=64,
normalize="all" # whitespace + unicode normalization
)
chunks = splitter.split_text(markdown)
# Each chunk: { text, source, document_id, chunk_index }
The quick brown fox jumped over the lazy dog. Unicode chars like \u2018smart quotes\u2019 and em\u2014dashes are scattered throughout the document alongside irregular whitespace and line breaks...
{
"text": "The quick brown fox jumped
over the lazy dog. Unicode chars
like 'smart quotes' and em-dashes
are scattered throughout the
document alongside irregular
whitespace and line breaks...",
"source": "report.pdf",
"document_id": "d7f3a1b2",
"chunk_index": 14
}
OracleEmbeddings generates 384-dimensional vectors entirely inside the database — no external API calls. The model ALL_MINILM_L12_V2 converts each text chunk into a dense numerical representation.
Vectors are stored alongside their source text in OracleVS tables, organized into four separate collections for precise retrieval scoping.
from langchain_community.embeddings import OracleEmbeddings
from langchain_community.vectorstores import OracleVS
from langchain_community.vectorstores.utils import DistanceStrategy
# Embedding model runs IN-DATABASE
embeddings = OracleEmbeddings(
conn=oracle_connection,
model_name="ALL_MINILM_L12_V2",
proxy="" # no external calls needed
)
# Store in the appropriate collection
vector_store = OracleVS.from_documents(
documents=chunks,
embedding=embeddings,
client=oracle_connection,
table_name="PDF_COLLECTION",
distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE
)
| Collection | Source Type | Typical Use |
|---|---|---|
| PDF_COLLECTION | Uploaded PDF documents | Reports, papers, manuals |
| WEB_COLLECTION | Crawled web pages | Articles, documentation |
| REPO_COLLECTION | GitHub repositories | Source code, READMEs |
| GENERAL_COLLECTION | Mixed / unclassified | Cross-source queries |
A user asks a question through Gradio, Open WebUI, the REST API, or directly from the CLI. The system selects the appropriate collection based on context, then performs similarity search.
OracleVS.similarity_search_with_score() finds the top-K most relevant chunks using Euclidean distance. The similarity score is computed as:
Scores closer to 1.0 indicate near-perfect matches; scores closer to 0.0 mean the chunk is semantically distant from the query.
# User query arrives from any interface
query = "How does the authentication system handle token refresh?"
# Select collection and retrieve top-K chunks
results = vector_store.similarity_search_with_score(
query=query,
k=5
)
# Each result: (Document, score)
for doc, score in results:
print(f"Score: {score:.4f} | Source: {doc.metadata['source']}")
# Score: 0.8721 | Source: auth_module.py
# Score: 0.8134 | Source: README.md
# Score: 0.7456 | Source: api_docs.pdf
The user selects a reasoning strategy — or it is chosen automatically via the Open WebUI "model" dropdown. With 9 base strategies available in both standalone and RAG-augmented modes, the system offers 18 total configurations.
RAGReasoningEnsemble weaves the retrieved context into the user's query and hands it to the selected reasoning strategy. For Chain-of-Thought in agentic_rag, four specialized agents collaborate in a pipeline:
from agentic_rag.ensemble import RAGReasoningEnsemble
ensemble = RAGReasoningEnsemble(
strategy="cot",
vector_store=vector_store,
llm=ollama_model,
top_k=5
)
result = ensemble.process(query="How does token refresh work?")
# Result structure
print(result.answer) # The final answer
print(result.reasoning_steps) # Step-by-step trace
print(result.sources) # Source documents used
print(result.execution_trace) # Full agent execution log
Every interaction is logged to Oracle Database across multiple event tables, creating a complete audit trail for observability and debugging:
// Example response structure
{
"answer": "The authentication system handles token refresh through a background daemon that monitors JWT expiry timestamps. When a token reaches 80% of its TTL, the refresh cycle initiates...",
"reasoning_steps": [
"Identified 3 relevant code modules in REPO_COLLECTION",
"Cross-referenced with API documentation in WEB_COLLECTION",
"Synthesized token lifecycle from auth_handler.py (score: 0.89)"
],
"sources": [
{ "file": "auth_handler.py", "chunk": 7, "score": 0.8912 },
{ "file": "api_docs.pdf", "chunk": 23, "score": 0.8134 }
],
"execution_trace": {
"strategy": "cot",
"mode": "rag",
"agents_invoked": ["planner", "researcher", "reasoner", "synthesizer"],
"total_llm_calls": 6,
"elapsed_ms": 4821
}
}