Technical Architecture

langchain-oracledb

Oracle AI Database Integration for LangChain — in-database embeddings, vector search, and server-side text processing.

Package langchain-oracledb
Embedding Model ALL_MINILM_L12_V2
Vector Dimensions 384
Distance Strategy COSINE

Section 01

Core Components

Three primitives form the foundation of the Oracle AI Database integration layer. Each operates inside the database, eliminating round-trips to external services.

01
OracleVS
Vector Store wrapping Oracle AI Vector Search. Each collection maps to a dedicated Oracle table with document content, embedding vectors, and structured metadata.
  • similarity_search_with_score()
  • add_texts()
  • delete()
Distance: DistanceStrategy.COSINE
Storage: One table per collection
Schema: text, embedding, metadata columns
02
OracleEmbeddings
In-database embedding generation. No external API calls — vectors are produced inside Oracle DB using ONNX models loaded into the database runtime.
  • embed_documents()
  • embed_query()
Provider: "database"
Model: ALL_MINILM_L12_V2
Dimensions: 384-dimensional vectors
03
OracleTextSplitter
Server-side text chunking via PL/SQL. Raw text is normalized and split into chunks directly inside the database, keeping data close to compute.
  • split_text()
  • create_chunks()
Normalization: "all" (whitespace + unicode)
Execution: PL/SQL inside Oracle DB
Pipeline: raw text → normalized chunks
Component Relationship Diagram
graph TD subgraph LANGCHAIN["LangChain Framework"] LC_VS["VectorStore Interface"] LC_EMB["Embeddings Interface"] LC_SPLIT["TextSplitter Interface"] end subgraph ORACLE_LAYER["langchain-oracledb"] OVS["OracleVS"] OEM["OracleEmbeddings"] OTS["OracleTextSplitter"] end subgraph ORACLE_DB["Oracle Database"] AIVS["AI Vector Search Engine"] ONNX["ONNX Runtime
ALL_MINILM_L12_V2"] PLSQL["PL/SQL Text Processing"] TABLES["Collection Tables"] end LC_VS --> OVS LC_EMB --> OEM LC_SPLIT --> OTS OVS -->|"COSINE similarity"| AIVS OVS -->|"store/retrieve"| TABLES OEM -->|"embed in-database"| ONNX OTS -->|"normalize + chunk"| PLSQL AIVS --> TABLES style LANGCHAIN fill:#fafafa,stroke:#e0e0e0,stroke-width:1px style ORACLE_LAYER fill:#fff,stroke:#C74634,stroke-width:2px style ORACLE_DB fill:#f5f5f5,stroke:#111,stroke-width:1px style OVS fill:#fff,stroke:#C74634,stroke-width:1.5px,color:#111 style OEM fill:#fff,stroke:#C74634,stroke-width:1.5px,color:#111 style OTS fill:#fff,stroke:#C74634,stroke-width:1.5px,color:#111

Section 02

End-to-End Data Flow

From raw document ingestion through to scored similarity search results. Every processing step occurs inside the Oracle Database.

Ingestion Pipeline
graph LR DOC["Raw Document"] SPLIT["OracleTextSplitter
normalize + chunk"] EMBED["OracleEmbeddings
384-dim vectors"] STORE["OracleVS
store to table"] TABLE[("Oracle Table
text | vector | metadata")] DOC -->|"text input"| SPLIT SPLIT -->|"chunks[]"| EMBED EMBED -->|"vectors[]"| STORE STORE -->|"INSERT"| TABLE style DOC fill:#f5f5f5,stroke:#111,stroke-width:1px,color:#111 style SPLIT fill:#fff,stroke:#C74634,stroke-width:1.5px,color:#111 style EMBED fill:#fff,stroke:#C74634,stroke-width:1.5px,color:#111 style STORE fill:#fff,stroke:#C74634,stroke-width:1.5px,color:#111 style TABLE fill:#fafafa,stroke:#111,stroke-width:2px,color:#111
Query Pipeline
graph LR QUERY["User Query"] QEMBED["OracleEmbeddings
embed query"] SEARCH["OracleVS
similarity_search"] AIVS["AI Vector Search
COSINE distance"] RESULTS["Results + Scores
score = 1/(1+d)"] QUERY -->|"query string"| QEMBED QEMBED -->|"query vector"| SEARCH SEARCH -->|"vector compare"| AIVS AIVS -->|"ranked matches"| RESULTS style QUERY fill:#f5f5f5,stroke:#111,stroke-width:1px,color:#111 style QEMBED fill:#fff,stroke:#C74634,stroke-width:1.5px,color:#111 style SEARCH fill:#fff,stroke:#C74634,stroke-width:1.5px,color:#111 style AIVS fill:#fff,stroke:#111,stroke-width:1.5px,color:#111 style RESULTS fill:#fafafa,stroke:#C74634,stroke-width:2px,color:#111
Ingestion Sequence (Detailed)
sequenceDiagram participant App as Application participant Wrapper as OraDBVectorStore participant TS as OracleTextSplitter participant EM as OracleEmbeddings participant VS as OracleVS participant DB as Oracle Database App->>Wrapper: ingest(documents, collection) Wrapper->>Wrapper: sanitize_metadata() Wrapper->>TS: split_text(raw_text) TS->>DB: DBMS_VECTOR_CHAIN.UTL_TO_CHUNKS() DB-->>TS: normalized chunks[] TS-->>Wrapper: chunks[] Wrapper->>VS: add_texts(chunks, metadatas) VS->>EM: embed_documents(chunks) EM->>DB: DBMS_VECTOR.UTL_TO_EMBEDDING() DB-->>EM: 384-dim vectors[] EM-->>VS: vectors[] VS->>DB: INSERT INTO collection_table DB-->>VS: rows inserted VS-->>Wrapper: document IDs Wrapper-->>App: success

Section 03

Component Specifications

Detailed configuration parameters and internal specifications for each component.

OracleEmbeddings Configuration

# In-database embedding configuration embedding_params = { "provider": "database", "model": "ALL_MINILM_L12_V2" } embeddings = OracleEmbeddings( conn=connection, params=embedding_params ) # Produces 384-dimensional vectors # Runs entirely inside Oracle DB # No external API calls required
ParameterValue
providerdatabase (in-DB ONNX runtime)
modelALL_MINILM_L12_V2
dimensions384
data_typeFLOAT32
normalizationL2 normalized

OracleTextSplitter Configuration

# Server-side text splitting splitter_params = { "normalize": "all", "split": "recursively", "language": "AMERICAN", "max": "500", "overlap": "50" } splitter = OracleTextSplitter( conn=connection, params=splitter_params )
ParameterValue
normalize"all" — whitespace + unicode
splitRecursive character splitting
executionPL/SQL (DBMS_VECTOR_CHAIN)
max_chunk500 tokens (configurable)
overlap50 tokens

Section 04

Vector Store Collections

Each collection maps 1:1 to an Oracle Database table. OracleVS manages table creation, document storage, and vector indexing independently per collection.

PDFCOLLECTION
PDF document chunks and embeddings
WEBCOLLECTION
Web-scraped content and embeddings
REPOCOLLECTION
Code repository content and embeddings
GENERALCOLLECTION
General-purpose document store
Collection Table Schema
erDiagram COLLECTION_TABLE { VARCHAR2 id PK "UUID primary key" CLOB text "Document chunk text" VECTOR embedding "384-dim FLOAT32 vector" JSON metadata "Sanitized key-value pairs" } ORACLEVS ||--o{ COLLECTION_TABLE : "manages" ORACLE_EMBEDDINGS ||--|| COLLECTION_TABLE : "generates vectors" PDFCOLLECTION ||--|| COLLECTION_TABLE : "instance" WEBCOLLECTION ||--|| COLLECTION_TABLE : "instance" REPOCOLLECTION ||--|| COLLECTION_TABLE : "instance" GENERALCOLLECTION ||--|| COLLECTION_TABLE : "instance"

Section 05

OraDBVectorStore Wrapper

The bridge class in the agentic RAG system. Manages multiple OracleVS instances, sanitizes metadata, and applies monkeypatches for compatibility.

Wrapper Architecture
graph TD subgraph WRAPPER["OraDBVectorStore"] INIT["__init__
Initialize 4 OracleVS instances"] SANITIZE["sanitize_metadata()
Filter to str/int/float/bool"] MONKEY["Monkeypatch
Dual-format metadata handler"] SEARCH_FN["similarity_search()
Delegates to correct OracleVS"] ADD_FN["add_documents()
Route by collection name"] end subgraph INSTANCES["OracleVS Instances"] PDF_VS["OracleVS
PDFCOLLECTION"] WEB_VS["OracleVS
WEBCOLLECTION"] REPO_VS["OracleVS
REPOCOLLECTION"] GEN_VS["OracleVS
GENERALCOLLECTION"] end INIT --> PDF_VS INIT --> WEB_VS INIT --> REPO_VS INIT --> GEN_VS SEARCH_FN --> INSTANCES ADD_FN --> SANITIZE SANITIZE --> INSTANCES MONKEY --> INSTANCES style WRAPPER fill:#fff,stroke:#C74634,stroke-width:2px style INSTANCES fill:#fafafa,stroke:#111,stroke-width:1px style PDF_VS fill:#fff,stroke:#e0e0e0,stroke-width:1px,color:#111 style WEB_VS fill:#fff,stroke:#e0e0e0,stroke-width:1px,color:#111 style REPO_VS fill:#fff,stroke:#e0e0e0,stroke-width:1px,color:#111 style GEN_VS fill:#fff,stroke:#e0e0e0,stroke-width:1px,color:#111
score = 1 / (1 + distance)
Cosine distance is converted to a similarity score in [0, 1] where 1 = identical

Metadata Sanitization

Oracle AI Vector Search requires metadata values to be primitive types. The sanitizer strips any complex nested structures before insertion.

  • str values — preserved as-is
  • int values — preserved as-is
  • float values — preserved as-is
  • bool values — preserved as-is
  • All other types — silently dropped from metadata
  • Nested dicts and lists — removed recursively

Monkeypatch: Dual-Format Metadata

OracleVS may return metadata as either a Python dict or a JSON string depending on the Oracle driver version. The monkeypatch normalizes this.

# Monkeypatch for metadata compatibility def _parse_metadata(raw): if isinstance(raw, str): return json.loads(raw) elif isinstance(raw, dict): return raw return {} # Applied at OraDBVectorStore init # Handles both oracledb driver formats

Section 06

Connection Configuration

Oracle Database connection with TLS encryption and wallet-based authentication for Autonomous Database deployments.

Connection Flow
graph LR subgraph APP["Application Layer"] CONFIG["config.yaml"] ORADB["OraDBVectorStore"] end subgraph AUTH["Authentication"] WALLET["Oracle Wallet
cwallet.sso + ewallet.p12"] TLS["TLS 1.2/1.3
mTLS for ADB"] end subgraph TARGET["Oracle Database"] DSN["TNS Listener
host:port/service"] ADB["Autonomous DB
wallet-based auth"] end CONFIG -->|"dsn, credentials"| ORADB ORADB -->|"oracledb.connect()"| DSN ORADB -->|"wallet_location"| WALLET WALLET --> TLS TLS --> ADB DSN --> ADB style APP fill:#fff,stroke:#e0e0e0,stroke-width:1px style AUTH fill:#fff,stroke:#C74634,stroke-width:1.5px style TARGET fill:#f5f5f5,stroke:#111,stroke-width:1.5px

config.yaml

oracle:
  dsn: "(description=(retry_count=20)
        (retry_delay=3)(address=
        (protocol=tcps)(port=1522)
        (host=adb.region.oraclecloud.com))
        (connect_data=(service_name=
        xxx_high.adb.oraclecloud.com))
        (security=(ssl_server_dn_match=yes)))"
  user: "ADMIN"
  wallet_location: "/opt/oracle/wallet"
  wallet_password: "${WALLET_PASSWORD}"

embeddings:
  provider: "database"
  model: "ALL_MINILM_L12_V2"

collections:
  - PDFCOLLECTION
  - WEBCOLLECTION
  - REPOCOLLECTION
  - GENERALCOLLECTION

Connection Parameters

ParameterDescription
dsnTNS descriptor with TLS (tcps)
userDatabase username (ADMIN for ADB)
wallet_locationPath to unzipped wallet directory
wallet_passwordWallet encryption password
retry_countConnection retry attempts (20)
retry_delaySeconds between retries (3)
protocoltcps (TLS-secured TCP)
port1522 (ADB default TLS port)

Section 07

Full System Architecture

Complete view of how all components interact, from the LangChain application layer down through the Oracle Database internals.

System Architecture Overview
graph TB subgraph APPLICATION["Application Layer"] direction LR AGENT["LangChain Agent"] CHAIN["RetrievalQA Chain"] LOADER["Document Loaders
PDF / Web / Git"] end subgraph BRIDGE["Bridge Layer --- OraDBVectorStore"] direction LR ROUTE["Collection Router"] SANITIZER["Metadata Sanitizer"] MONKEY2["Metadata Monkeypatch"] SCORER["Score Converter
1/(1+d)"] end subgraph LANGCHAIN_ORACLE["langchain-oracledb"] direction LR OVS2["OracleVS
x4 instances"] OEM2["OracleEmbeddings"] OTS2["OracleTextSplitter"] end subgraph ORACLE["Oracle Database Engine"] direction LR VECTOR_ENGINE["AI Vector Search"] ONNX2["ONNX Runtime"] PLSQL2["PL/SQL Engine"] TABLES2[("Collection Tables")] INDEX["IVF Vector Index"] end AGENT --> CHAIN CHAIN --> BRIDGE LOADER --> BRIDGE ROUTE --> OVS2 SANITIZER --> OVS2 MONKEY2 --> OVS2 SCORER --> CHAIN OVS2 --> VECTOR_ENGINE OVS2 --> TABLES2 OEM2 --> ONNX2 OTS2 --> PLSQL2 VECTOR_ENGINE --> INDEX INDEX --> TABLES2 style APPLICATION fill:#fafafa,stroke:#e0e0e0,stroke-width:1px style BRIDGE fill:#fff,stroke:#C74634,stroke-width:2px style LANGCHAIN_ORACLE fill:#fff,stroke:#C74634,stroke-width:1.5px style ORACLE fill:#f5f5f5,stroke:#111,stroke-width:2px