v0.1 — Oracle 23ai & 26ai

Embeddings that live inside your database.

onnx2oracle packages any HuggingFace sentence-embedding model as an augmented ONNX graph and loads it directly into Oracle via DBMS_VECTOR.LOAD_ONNX_MODEL. No sidecar, no queue, no network hop.

Read the quickstart Source on GitHub

6 preset models 384 / 768 dim vectors BLOB-direct — no filesystem staging

Why

In-database embeddings, for one reason at a time.

Most vector pipelines bolt an embedding microservice in front of the database. That adds a second system to deploy, secure, patch, and observe. Oracle 23ai and 26ai can run the model itself — if you can get it in.

No sidecar service

The model runs inside the DB process. VECTOR_EMBEDDING(MODEL USING :text AS DATA) is a function call — no HTTP, no queue, no fanout.

Data never leaves the tablespace

Compliance teams like this. Your document column gets embedded without a byte crossing a network boundary.

Transactional consistency

Insert a row, embed it, index it — all inside the same transaction. No dual-writes, no eventual sync drift.

One binary, six models, zero YAML

onnx2oracle presets shows the shelf. load <name> does the rest, including tokenizer wrapping and L2 normalization.

Presets

Six vetted models. Pick by dimension, size, and language.

Each preset maps a HuggingFace repo to an Oracle mining-model name. Dimensions and pooling strategies match the model card — not a rewrite.

all-MiniLM-L6-v2384-d

The default. Small, fast, good enough for English-only semantic search up to a few million rows.

~90 MBmean poolALL_MINILM_L6_V2

all-MiniLM-L12-v2384-d

Same geometry as L6, deeper transformer. Better recall on longer documents; ~30% slower.

~130 MBmean poolALL_MINILM_L12_V2

all-mpnet-base-v2768-d

The quality pick for English. Double the vector width, heavier index. Worth it when MRR matters.

~420 MBmean poolALL_MPNET_BASE_V2

bge-small-en-v1.5384-d

Strong on BEIR benchmarks. Uses CLS pooling, not mean — the loader handles the difference.

~130 MBcls poolBGE_SMALL_EN_V1_5

nomic-embed-text-v1768-d

Long-context (8192 tokens) with Matryoshka-friendly output. Biggest curated preset — plan the disk.

~540 MBmean poolNOMIC_EMBED_TEXT_V1

Full model matrix

The guide

Five short pages, in order.

Each page assumes the previous one worked. If something breaks mid-way, the troubleshooting reference maps ORA-codes to fixes.

Embeddings that live inside your database.

In-database embeddings, for one reason at a time.

No sidecar service

Data never leaves the tablespace

Transactional consistency

One binary, six models, zero YAML

Six vetted models. Pick by dimension, size, and language.

Five short pages, in order.

Quickstart

How it works

Pick a model

Oracle ADB wallet flow

Custom models

CLI reference