No sidecar service
The model runs inside the DB process. VECTOR_EMBEDDING(MODEL USING :text AS DATA) is a function call — no HTTP, no queue, no fanout.
onnx2oracle packages any HuggingFace sentence-embedding model as an augmented ONNX graph
and loads it directly into Oracle via DBMS_VECTOR.LOAD_ONNX_MODEL. No sidecar, no queue, no network hop.
Most vector pipelines bolt an embedding microservice in front of the database. That adds a second system to deploy, secure, patch, and observe. Oracle 23ai and 26ai can run the model itself — if you can get it in.
The model runs inside the DB process. VECTOR_EMBEDDING(MODEL USING :text AS DATA) is a function call — no HTTP, no queue, no fanout.
Compliance teams like this. Your document column gets embedded without a byte crossing a network boundary.
Insert a row, embed it, index it — all inside the same transaction. No dual-writes, no eventual sync drift.
onnx2oracle presets shows the shelf. load <name> does the rest, including tokenizer wrapping and L2 normalization.
Each preset maps a HuggingFace repo to an Oracle mining-model name. Dimensions and pooling strategies match the model card — not a rewrite.
The default. Small, fast, good enough for English-only semantic search up to a few million rows.
Same geometry as L6, deeper transformer. Better recall on longer documents; ~30% slower.
The quality pick for English. Double the vector width, heavier index. Worth it when MRR matters.
Strong on BEIR benchmarks. Uses CLS pooling, not mean — the loader handles the difference.
Long-context (8192 tokens) with Matryoshka-friendly output. Biggest curated preset — plan the disk.
Each page assumes the previous one worked. If something breaks mid-way, the troubleshooting reference maps ORA-codes to fixes.
Docker up, load the default model, verify the round trip in about four minutes.
02The augmented ONNX pipeline — tokenizer, pool, L2-norm — explained with a diagram.
03Decision tree across the six presets: English vs. multilingual, 384 vs. 768, speed vs. recall.
04Point the loader at Autonomous Database. Walletless mTLS is shortest; classic wallet still works.
05The --from-huggingface escape hatch with worked examples, including CLS-pooled models.
Every flag, every subcommand, including DSN resolution precedence.