Agent Reasoning Integration Blueprint

Layer 1 / Foundation

agent-reasoning Library

PyPI: agent-reasoning

BaseAgent Abstract Class

The root of all 9 reasoning strategies. Every agent inherits from BaseAgent, which provides an OllamaClient connection, structured logging, and the contract: implement run(), optionally override stream().

agent_reasoning/agents/base.py Python

from abc import ABC, abstractmethod
from agent_reasoning.client import OllamaClient
from termcolor import colored

class BaseAgent(ABC):
    def __init__(self, model="gemma3:270m", base_url=None):
        self.client = OllamaClient(model=model, base_url=base_url)
        self.name = "BaseAgent"
        self.color = "white"

    def log_thought(self, message):
        print(colored(f"[{self.name}]: {message}", self.color))

    @abstractmethod
    def run(self, query) -> str:
        pass

    def stream(self, query) -> Generator[str, None, None]:
        """Default: yields the final result as one chunk."""
        result = self.run(query)
        if result:
            yield result

agent_reasoning/client.py Python

class OllamaClient:
    def __init__(self, model="gemma3:270m", base_url=None):
        if base_url is None:
            from agent_reasoning.config import get_ollama_host
            base_url = get_ollama_host()
        self.model = model
        self.base_url = base_url

    def generate(self, prompt, system=None, stream=True,
                 temperature=0.7, top_k=40, top_p=0.9,
                 num_predict=2048, stop=None):
        url = f"{self.base_url}/api/generate"
        # POSTs to Ollama, streams NDJSON chunks
        # Yields response text incrementally
        ...

9 Agent Implementations

Each agent implements a distinct reasoning strategy. All inherit from BaseAgent and communicate with Ollama through OllamaClient. The AGENT_MAP provides both canonical and alias keys for flexible lookup.

StandardAgent Direct prompt → Direct response

CoTAgent Step detection via regex parsing

ToTAgent BFS with scoring (width=2, depth=3)

ReActAgent Thought-Action-Observation loop

SelfReflectionAgent Draft → Critique → Refine (max 5)

ConsistencyAgent k=5 samples + majority vote

DecomposedAgent Sub-tasks → Solve → Synthesize

LeastToMostAgent Easy → Hard with context accum.

RecursiveAgent Python codegen + exec (max 8 steps)

Class Inheritance Diagram

classDiagram class BaseAgent { <<abstract>> +OllamaClient client +str name +str color +log_thought(message) +run(query)* str +stream(query) Generator } class StandardAgent { +run(query) str } class CoTAgent { +run(query) str -_detect_steps(text) list } class ToTAgent { +int width = 2 +int depth = 3 +run(query) str -_bfs_explore(query) str -_score_node(node) float } class ReActAgent { +dict tools +run(query) str -_parse_action(text) tuple -_execute_tool(name, args) str } class SelfReflectionAgent { +int max_iterations = 5 +run(query) str -_critique(draft) str -_refine(draft, critique) str } class ConsistencyAgent { +int k = 5 +run(query) str -_majority_vote(samples) str } class DecomposedAgent { +run(query) str -_decompose(query) list -_solve(subtask) str -_synthesize(results) str } class LeastToMostAgent { +run(query) str -_order_tasks(tasks) list -_solve_with_context(task, ctx) str } class RecursiveAgent { +int max_steps = 8 +run(query) str -_generate_code(query) str -_execute_code(code) str } BaseAgent <|-- StandardAgent BaseAgent <|-- CoTAgent BaseAgent <|-- ToTAgent BaseAgent <|-- ReActAgent BaseAgent <|-- SelfReflectionAgent BaseAgent <|-- ConsistencyAgent BaseAgent <|-- DecomposedAgent BaseAgent <|-- LeastToMostAgent BaseAgent <|-- RecursiveAgent

agent_reasoning/agents/__init__.py Python — AGENT_MAP

AGENT_MAP = {
    "standard":         StandardAgent,
    "cot":              CoTAgent,
    "chain_of_thought": CoTAgent,        # alias
    "tot":              ToTAgent,
    "tree_of_thoughts": ToTAgent,        # alias
    "react":            ReActAgent,
    "self_reflection":  SelfReflectionAgent,
    "reflection":       SelfReflectionAgent, # alias
    "consistency":      ConsistencyAgent,
    "self_consistency": ConsistencyAgent,    # alias
    "decomposed":       DecomposedAgent,
    "least_to_most":    LeastToMostAgent,
    "ltm":              LeastToMostAgent,    # alias
    "recursive":        RecursiveAgent,
    "rlm":              RecursiveAgent,      # alias
}
# 9 unique agents, 14 total keys (5 aliases)

ReasoningEnsemble (Base Orchestrator)

The ensemble orchestrates multiple reasoning strategies in parallel using a ThreadPoolExecutor (max 10 workers). Responses are clustered by semantic similarity using sentence-transformers embeddings, and the largest cluster wins via majority vote. Ties prefer Chain-of-Thought.

agent_reasoning/ensemble.py Python — Core Interface

class ReasoningEnsemble:
    def __init__(self,
        model_name: str = "gemma3:270m",
        similarity_threshold: float = 0.85,
        embedding_model: str = "all-MiniLM-L6-v2"
    ):
        self._executor = ThreadPoolExecutor(max_workers=10)

    async def run(self, query, strategies, config=None) -> Dict:
        # Single strategy → direct return (no voting)
        # Multiple strategies → parallel execution + majority vote
        return {
            "winner":         {strategy, response, vote_count},
            "all_responses":  [{strategy, response, duration_ms}, ...],
            "total_duration_ms": float,
            "voting_details":  {clusters, threshold, total_responses}
        }

    def _majority_vote(self, responses):
        # 1. Encode responses with SentenceTransformer
        # 2. Compute cosine similarity matrix
        # 3. Greedy clustering (threshold: 0.85)
        # 4. Largest cluster wins; ties prefer CoT
        ...

    def run_sync(self, query, strategies, config=None):
        return asyncio.run(self.run(query, strategies, config))

Layer 2 / Integration Bridge

RAGReasoningEnsemble

agentic_rag/reasoning/

Extending the Library with RAG

RAGReasoningEnsemble wraps the base ReasoningEnsemble from agent-reasoning and adds three capabilities: RAG context retrieval from Oracle AI Database, database event logging via OraDBEventLogger, and a streaming execution trace protocol for real-time UI updates.

agentic_rag/src/reasoning/rag_ensemble.py Python — Init + Composition

from agent_reasoning import ReasoningEnsemble
from agent_reasoning.agents import AGENT_MAP

class RAGReasoningEnsemble:
    def __init__(self,
        model_name: str = "gemma3:270m",
        vector_store=None,         # OraDBVectorStore
        event_logger=None,         # OraDBEventLogger
        similarity_threshold=0.85
    ):
        self.ensemble = ReasoningEnsemble(  # Composition, not inheritance
            model_name=model_name,
            similarity_threshold=similarity_threshold
        )
        self.vector_store = vector_store
        self.event_logger = event_logger

RAG Context Retrieval

_retrieve_context(query, collection) Flow

Query → _retrieve_context(query, collection)
  → collection_map: {
      "PDF":        "pdf_collection",
      "Web":        "web_collection",
      "Repository": "repo_collection",
      "General":    "general_knowledge"
    }
  → vector_store.query(query, collection_name, n_results=5)
  → Returns {
      chunks:    [{content, metadata, score}, ...],
      sources:   ["file.pdf", "page.html", ...],
      avg_score: float
    }

Query Augmentation

_build_augmented_prompt(query, context) Template

_build_augmented_prompt(query, context):
  → """Use the following context to answer the question.
  If the context doesn't contain relevant information,
  use your general knowledge.

  Context:
  [Source: file.pdf]
  chunk text here...

  [Source: page.html]
  another chunk...

  Question: {query}

  Answer:"""

Execution Flow

Sequence Diagram: Full Execution Path

sequenceDiagram participant U as User / API participant R as RAGReasoningEnsemble participant V as OraDBVectorStore participant O as Oracle AI Database participant E as ReasoningEnsemble participant A as BaseAgent Subclasses participant L as Ollama (LLM) participant D as OraDBEventLogger U->>R: run(query, strategies, use_rag=True, collection="PDF") Note right of R: log_event("start") rect rgb(13, 59, 59) R->>V: _retrieve_context(query, "PDF") V->>O: vector_store.query(query, "pdf_collection", n_results=5) O-->>V: [{content, metadata, score}, ...] V-->>R: {chunks, sources, avg_score} Note right of R: log_event("rag", chunks_count) end R->>R: _build_augmented_prompt(query, context) Note right of R: Augmented query = context + question rect rgb(26, 46, 46) R->>E: ensemble.run(augmented_query, strategies) Note right of E: ThreadPoolExecutor (10 workers) par Parallel Strategy Execution E->>A: StandardAgent.run(query) A->>L: OllamaClient.generate(prompt) L-->>A: streaming chunks A-->>E: response string and E->>A: CoTAgent.run(query) A->>L: OllamaClient.generate(prompt) L-->>A: streaming chunks A-->>E: response string and E->>A: ToTAgent.run(query) A->>L: OllamaClient.generate(prompt) L-->>A: streaming chunks A-->>E: response string end E->>E: _majority_vote(responses) Note right of E: SentenceTransformer embeddings Note right of E: Cosine similarity clustering Note right of E: Largest cluster wins E-->>R: {winner, all_responses, voting_details} end Note right of R: log_event("voting") Note right of R: log_event("complete") R->>D: log_reasoning_event(query, strategies, result, ...) D->>O: INSERT INTO REASONING_EVENTS R-->>U: ReasoningResult

ExecutionEvent Dataclass & Event Types

rag_ensemble.py — Data structures Python

@dataclass
class ExecutionEvent:
    timestamp: str        # "%H:%M:%S"
    event_type: str       # see table below
    message: str
    data: Optional[Dict] = None

@dataclass
class ReasoningResult:
    winner: Dict[str, Any]
    all_responses: List[Dict]
    execution_trace: List[ExecutionEvent]
    rag_context: Optional[Dict]
    total_duration_ms: float
    voting_details: Optional[Dict]

Event Type	Description	Data Payload
start	Query begins processing	Strategy count
rag	RAG context retrieved from Oracle	{chunks, score}
strategy_start	Individual strategy begins execution	Strategy icon + name
strategy_complete	Strategy finished with timing	{duration_ms}
voting	Majority voting / winner selection	Winner + vote count
complete	Full ensemble processing complete	Total duration
result	Final result (streaming mode only)	ReasoningResult

Strategy Icons

STRATEGY_ICONS Mapping

STRATEGY_ICONS = {
    "standard":         "📝",    "cot":              "🔗",
    "tot":              "🌳",    "react":            "🛠️",
    "self_reflection":  "🪞",    "consistency":      "🔄",
    "decomposed":       "🧩",    "least_to_most":    "📈",
    "recursive":        "🔁",
}

OraDBEventLogger: Oracle DB Persistence

All reasoning events are persisted to Oracle AI Database. The logger maintains 6 event tables, with REASONING_EVENTS specifically tracking ensemble executions including strategy selections, vote counts, durations, and RAG context metadata.

OraDBEventLogger.log_reasoning_event() Oracle DB Schema

CREATE TABLE REASONING_EVENTS (
    event_id              VARCHAR2(100) PRIMARY KEY,
    timestamp             TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    query_text            CLOB,
    strategies_requested  CLOB,           -- JSON array
    winner_strategy       VARCHAR2(50),
    winner_response       CLOB,
    vote_count            NUMBER,
    total_strategies      NUMBER,
    all_responses         CLOB,           -- JSON array
    rag_enabled           NUMBER(1),
    collection_used       VARCHAR2(200),
    chunks_retrieved      NUMBER,
    total_duration_ms     NUMBER,
    parallel_execution    NUMBER(1),      -- always 1 for ensemble
    config_json           CLOB,
    status                VARCHAR2(50),
    error_message         CLOB
);

Layer 3 / User-Facing

API Surface

FastAPI + OpenAI compat

Strategies Exposed as "Models" in Open WebUI

The OpenAI-compatible /v1/models endpoint presents each reasoning strategy (and its RAG variant) as a selectable "model" in Open WebUI. This means 9 base strategies x 2 (with/without RAG) = 18 model IDs. The REASONING_MODELS registry maps each model ID to its strategy key and RAG flag.

Model ID	Strategy	RAG	Display Name
standard	standard	--	Standard
standard-rag	standard	RAG	Standard + RAG
cot	cot	--	Chain of Thought
cot-rag	cot	RAG	Chain of Thought + RAG
tot	tot	--	Tree of Thoughts
tot-rag	tot	RAG	Tree of Thoughts + RAG
react	react	--	ReAct
react-rag	react	RAG	ReAct + RAG
self-reflection	self_reflection	--	Self-Reflection
self-reflection-rag	self_reflection	RAG	Self-Reflection + RAG
consistency	consistency	--	Self-Consistency
consistency-rag	consistency	RAG	Self-Consistency + RAG
decomposed	decomposed	--	Decomposed
decomposed-rag	decomposed	RAG	Decomposed + RAG
least-to-most	least_to_most	--	Least-to-Most
least-to-most-rag	least_to_most	RAG	Least-to-Most + RAG
recursive	recursive	--	Recursive
recursive-rag	recursive	RAG	Recursive + RAG

A2A Protocol Integration

The A2A (Agent-to-Agent) handler processes JSON-RPC 2.0 requests on POST /a2a. Reasoning methods are routed through the RAGReasoningEnsemble for strategy execution, while document and agent discovery methods operate through the standard A2A handler.

POST /a2a JSON-RPC 2.0

// Request
{
    "jsonrpc": "2.0",
    "method":  "reasoning.execute",
    "params":  {
        "query":    "What is machine learning?",
        "strategy": "cot-rag",
        "collection": "PDF"
    },
    "id": 1
}

// Additional reasoning methods:
//   reasoning.strategy  - Execute a specific strategy
//   reasoning.list      - List available strategies

// Other A2A methods:
//   document.query      - Query documents via RAG
//   document.upload     - Upload documents
//   agent.discover      - Discover agent capabilities
//   agent.register      - Register new agents

CoT Agent Factory (Separate Pipeline)

Independent from the reasoning ensemble, the Agent Factory provides a 4-stage Chain-of-Thought pipeline using LangChain-based agents. These use the LocalLLM wrapper (which calls Ollama) and are activated when use_cot=True is passed to LocalRAGAgent.

agents/agent_factory.py Python

def create_agents(llm, vector_store=None):
    """Create the set of specialized CoT agents."""
    return {
        "planner":     PlannerAgent(llm),       # Query → 3-4 plan steps
        "researcher":  ResearchAgent(llm, vs),  # Steps → Source research
        "reasoner":    ReasoningAgent(llm),     # Findings → Conclusions
        "synthesizer": SynthesisAgent(llm),     # Steps → Final answer
    }

# Each agent uses LangChain ChatPromptTemplate
# and the LocalLLM wrapper that calls OllamaModelHandler
# Pipeline: Plan → Research → Reason → Synthesize

Two RAG Paths: Side by Side

LocalRAGAgent provides two distinct processing paths based on the use_cot flag. Both paths use the same underlying OraDBVectorStore for retrieval, but the reasoning pipeline differs significantly.

Path A: use_cot=False

RAGReasoningEnsemble

Path B: use_cot=True

CoT Agent Factory

Complete Stack

Full Integration Architecture

Architecture: Complete Request Flow

flowchart TB subgraph UI["USER-FACING INTERFACES"] direction LR OW["Open WebUI
/v1/chat/completions"] GR["Gradio App
gradio_app.py"] CLI["Agent CLI
agent_cli.py"] A2A["A2A Protocol
POST /a2a"] REST["REST API
POST /query"] end subgraph MAIN["FASTAPI APPLICATION — main.py"] direction TB ROUTER["OpenAI-Compatible Router
/v1/models + /v1/chat/completions"] A2AH["A2AHandler
JSON-RPC 2.0 dispatch"] QUERY["Query Endpoint
/query"] end subgraph LOCAL["LocalRAGAgent"] direction TB COT_CHECK{"use_cot?"} subgraph PATHB["Path B: CoT Agent Factory"] direction TB PLAN["PlannerAgent
plan(query, context)"] RES["ResearchAgent
research(query, step)"] REASON["ReasoningAgent
reason(query, step, findings)"] SYNTH["SynthesisAgent
synthesize(query, steps)"] PLAN --> RES --> REASON --> SYNTH end end subgraph RAGENSEMBLE["RAGReasoningEnsemble"] direction TB RAG_CTX["_retrieve_context()
collection_map lookup"] AUG["_build_augmented_prompt()
context + question template"] subgraph ENSEMBLE["ReasoningEnsemble (agent-reasoning)"] direction TB PARALLEL["Parallel Execution
ThreadPoolExecutor, 10 workers"] AGENTS["9 BaseAgent Subclasses
Standard, CoT, ToT, ReAct,
SelfReflection, Consistency,
Decomposed, LeastToMost, Recursive"] VOTE["Majority Vote
SentenceTransformer embeddings
cosine similarity clustering"] PARALLEL --> AGENTS --> VOTE end RAG_CTX --> AUG --> PARALLEL end subgraph INFRA["INFRASTRUCTURE"] direction LR OLLAMA["Ollama
gemma3:270m"] ORADB[("Oracle AI Database 26ai
Vector Store + Event Logger")] end OW --> ROUTER GR --> QUERY CLI --> QUERY A2A --> A2AH REST --> QUERY ROUTER --> RAGENSEMBLE A2AH --> RAGENSEMBLE QUERY --> LOCAL COT_CHECK -- "False" --> RAGENSEMBLE COT_CHECK -- "True" --> PLAN RAG_CTX --> ORADB AGENTS --> OLLAMA PATHB --> OLLAMA RAGENSEMBLE -.->|"log_reasoning_event()"| ORADB PATHB --> ORADB style UI fill:#0d2828,stroke:#ffffff44,color:#e8f0f0 style MAIN fill:#0d2828,stroke:#ffffff44,color:#e8f0f0 style LOCAL fill:#122424,stroke:#bfa76a44,color:#bfa76a style RAGENSEMBLE fill:#0d2222,stroke:#00e5ff44,color:#00e5ff style ENSEMBLE fill:#0a1a1a,stroke:#00e5ff66,color:#00e5ff style PATHB fill:#1a2a1a,stroke:#bfa76a44,color:#bfa76a style INFRA fill:#0d1a1a,stroke:#00e5ff44,color:#e8f0f0 style COT_CHECK fill:#1a2a2a,stroke:#bfa76a,color:#bfa76a style OW fill:#0d3b3b,stroke:#00e5ff,color:#e8f0f0 style GR fill:#0d3b3b,stroke:#00e5ff,color:#e8f0f0 style CLI fill:#0d3b3b,stroke:#00e5ff,color:#e8f0f0 style A2A fill:#0d3b3b,stroke:#00e5ff,color:#e8f0f0 style REST fill:#0d3b3b,stroke:#00e5ff,color:#e8f0f0 style OLLAMA fill:#1a2a1a,stroke:#7ec699,color:#7ec699 style ORADB fill:#1a1a0a,stroke:#bfa76a,color:#bfa76a

Key Integration Points

Composition over Inheritance

RAGReasoningEnsemble wraps ReasoningEnsemble via composition (self.ensemble = ReasoningEnsemble(...)), not subclassing. This keeps the library layer clean and testable independently.

Design Pattern

Model ID Routing

When Open WebUI sends model="cot-rag", the router looks up REASONING_MODELS to extract strategy="cot" and rag=True, then dispatches to RAGReasoningEnsemble with the appropriate flags.

API Layer

Dual-Path RAG

The REST /query endpoint uses LocalRAGAgent (CoT factory or direct), while /v1/chat/completions uses RAGReasoningEnsemble. Both converge on OraDBVectorStore.

Architecture

Event Persistence

Every reasoning execution, API call, document upload, and A2A interaction is logged to Oracle AI Database through OraDBEventLogger, creating a full audit trail across 6 event tables.

Observability

Async/Threading Boundary

FastAPI is async. ReasoningEnsemble uses ThreadPoolExecutor to run synchronous agent.run() calls in parallel without blocking the event loop. The bridge is loop.run_in_executor().

Concurrency

Semantic Majority Voting

Unlike simple string comparison, the ensemble uses SentenceTransformer (all-MiniLM-L6-v2) to embed responses, compute cosine similarity, and cluster by a 0.85 threshold. Largest cluster wins.

Consensus

Appendix

Dependency & Import Map

Import Dependencies Between Modules

flowchart LR subgraph LIB["agent-reasoning (PyPI)"] BASE["agents/base.py
BaseAgent"] AGENTS["agents/__init__.py
AGENT_MAP"] CLIENT["client.py
OllamaClient"] ENS["ensemble.py
ReasoningEnsemble"] INT["interceptor.py
ReasoningInterceptor"] end subgraph RAG["agentic_rag"] RAGE["reasoning/rag_ensemble.py
RAGReasoningEnsemble"] MAIN2["main.py
FastAPI app"] OAI["openai_compat.py
/v1 router"] A2AH2["a2a_handler.py
A2AHandler"] LRAG["local_rag_agent.py
LocalRAGAgent"] FACT["agents/agent_factory.py
create_agents()"] STOR["OraDBVectorStore.py"] LOG["OraDBEventLogger.py"] end BASE --> AGENTS CLIENT --> BASE AGENTS --> ENS AGENTS --> INT ENS --> RAGE INT --> OAI AGENTS --> RAGE RAGE --> MAIN2 RAGE --> OAI RAGE --> A2AH2 LRAG --> MAIN2 FACT --> LRAG STOR --> RAGE STOR --> LRAG STOR --> A2AH2 LOG --> RAGE LOG --> MAIN2 OAI --> MAIN2 A2AH2 --> MAIN2 style LIB fill:#0a1a1a,stroke:#00e5ff44,color:#00e5ff style RAG fill:#0d1a0a,stroke:#bfa76a44,color:#bfa76a