For Coding Agents
This page teaches a coding agent how to load any sentence-transformer ONNX model into an Oracle AI Database 23ai or 26ai instance using raw Python and SQL. Paste this URL into your agent, give it a target model plus DSN, and it'll write the code itself.
Prompt block
Paste this into Claude, Codex, Gemini, or any other coding agent. Fill in the bracketed target details before sending.
You are loading an ONNX embedding model into an Oracle AI Database
26ai instance.
Read the manual at:
https://jasperan.github.io/onnx2oracle/for-agents.html
Target: <describe your model (e.g. "sentence-transformers/all-MiniLM-L6-v2")
and DSN (user/pw@host:port/service)>.
Write the code yourself following the manual. Don't add external CLI
dependencies. Use python-oracledb, onnx, onnxruntime, onnxruntime-extensions,
transformers, and huggingface_hub only. Run the verification query at the
end and report its output back.
Prerequisites
Confirm each of these before writing any code. If a check fails, stop and tell the user exactly which one.
- Oracle AI Database 23ai or 26ai. The
DBMS_VECTORpackage andVECTOR_EMBEDDINGlanded in 23ai. Check withSELECT BANNER FROM v$version;. - DB user has
CREATE MINING MODELandEXECUTE ON DBMS_VECTOR. Check with:SELECT * FROM user_sys_privs WHERE privilege IN ('CREATE MINING MODEL'); SELECT * FROM user_tab_privs WHERE table_name = 'DBMS_VECTOR'; - Python 3.10 or newer.
- Packages installed:
pip install onnx onnxruntime onnxruntime-extensions transformers huggingface-hub oracledb - About 2 GB of free RAM during augmentation. The final ONNX is usually 90 to 540 MB depending on the model.
onnx2oracle checkout, run
scripts/run_real_db_integration.sh when you need proof against a real Oracle 26ai
Free container. It starts Docker, records the database banner and image id, loads MiniLM,
verifies VECTOR_EMBEDDING, and runs the live integration test. Use
ORACLE_PWD, ORACLE_PORT, ORACLE_IMAGE, and
ONNX2ORACLE_WAIT_TIMEOUT to adapt the run without editing files.
Before loading manually, run onnx2oracle preflight --target local to check version,
package, SQL function, privilege, and catalog readiness.
To test a preset and write machine-readable evidence, run
scripts/check_model_compatibility.py all-MiniLM-L6-v2.
Mental model of the task
The goal is a single self-contained ONNX graph whose input is a raw string
and whose output is a normalized float32 vector. Oracle invokes the graph through
DBMS_VECTOR.LOAD_ONNX_MODEL and queries it with
VECTOR_EMBEDDING(MODEL USING :text AS DATA).
The core problem: HuggingFace ships a Python tokenizer. Oracle needs that tokenizer compiled into the ONNX graph itself. This page bridges the gap.
pre_text), one output (embedding), four internal stages fused into a single ONNX graph.Build the augmented ONNX
Seven stages build the graph in-memory. Two more sections after this one (Load, Verify) ship it to Oracle and confirm it works. Use MiniLM-L6 as the worked example; parametrize where an agent would generalize. Every block is copy-pastable.
Stage 1: download the core transformer ONNX
from huggingface_hub import hf_hub_download, list_repo_files
HF_REPO = "sentence-transformers/all-MiniLM-L6-v2"
core_path = hf_hub_download(HF_REPO, "onnx/model.onnx")
# Some repos store weights in a sidecar file next to model.onnx (e.g. model.onnx_data
# or similar). Pull any external-data files in the onnx/ folder so onnx.load finds them.
for f in list_repo_files(HF_REPO):
if f.startswith("onnx/") and f != "onnx/model.onnx":
hf_hub_download(HF_REPO, f)
Some repos don't ship a pre-exported ONNX at all. In that case, export via
transformers.onnx or the optimum exporter first, then point
core_path at the resulting file.
Stage 2: wrap the tokenizer as ONNX ops
from transformers import AutoTokenizer
from onnxruntime_extensions import gen_processing_models
tokenizer = AutoTokenizer.from_pretrained(HF_REPO)
pre_model, _ = gen_processing_models(
tokenizer, pre_kwargs={}, post_kwargs=None, opset=14,
)
gen_processing_models produces ragged outputs Oracle's BertTokenizer op can't
consume. Skip those, or convert offline with a custom tokenizer op.
Stage 3: align opsets and merge
import onnx
from onnx import compose, version_converter
core_model = onnx.load(core_path)
core_model = version_converter.convert_version(core_model, 18)
core_model.ir_version = pre_model.ir_version
# Copy custom opset domains from pre-model to core
core_domains = {o.domain for o in core_model.opset_import}
for o in pre_model.opset_import:
if o.domain not in core_domains:
new_o = core_model.opset_import.add()
new_o.domain = o.domain
new_o.version = o.version
Stage 3b: unsqueeze tokenizer outputs from [seq_len] to [1, seq_len]
The tokenizer emits 1-D tensors. The transformer expects 2-D (batch + sequence). Insert an
Unsqueeze(axis=0) on each relevant output.
import numpy as np
from onnx import helper, numpy_helper, TensorProto
# Only touch tensor names that actually exist in the pre-model outputs.
# (MPNet-style encoders have no token_type_ids; RoBERTa does not either.)
pre_output_names = {o.name for o in pre_model.graph.output}
core_input_names = {i.name for i in core_model.graph.input}
bert_tensor_names = [
n for n in ("input_ids", "token_type_ids", "attention_mask")
if n in pre_output_names
]
# Add a shared axes=[0] initializer for the Unsqueeze ops.
axes_0 = numpy_helper.from_array(np.array([0], dtype=np.int64), name="unsqueeze_axes_0")
pre_model.graph.initializer.append(axes_0)
for name in bert_tensor_names:
flat_name = f"{name}_flat"
# Rename the tokenizer's raw output so we can feed it into Unsqueeze.
for node in pre_model.graph.node:
for i, out in enumerate(node.output):
if out == name:
node.output[i] = flat_name
# Emit Unsqueeze(flat_name, axes=[0]) -> original name.
pre_model.graph.node.append(
helper.make_node("Unsqueeze", [flat_name, "unsqueeze_axes_0"], [name])
)
# Update the graph-output shape declaration to [1, sequence_length].
for out in pre_model.graph.output:
if out.name == name:
shape = out.type.tensor_type.shape
while len(shape.dim) > 0:
shape.dim.pop()
shape.dim.add().dim_value = 1
shape.dim.add().dim_param = "sequence_length"
Stage 3c: merge pre + core via compose
io_map = [
("input_ids", "input_ids"),
("attention_mask", "attention_mask"),
("token_type_ids", "token_type_ids"),
]
# Keep only tensors present in both sides. RoBERTa-style tokenizers skip
# token_type_ids; MPNet cores skip it too.
io_map = [(a, b) for a, b in io_map
if a in pre_output_names and b in core_input_names]
merged = compose.merge_models(
pre_model, core_model, io_map=io_map, prefix1="pre_", prefix2="core_",
)
# Strip existing outputs; we'll declare the embedding output explicitly later.
while len(merged.graph.output) > 0:
merged.graph.output.pop()
The prefix1="pre_" argument matters: it determines the final input tensor name.
In this example, the merged graph's input becomes pre_text. Oracle's metadata
JSON must reference that exact name.
Stage 4: add pooling
Two options, depending on the model. Mean pooling for MiniLM, MPNet, and Nomic:
import numpy as np
from onnx import helper, numpy_helper, TensorProto
pool_axes = numpy_helper.from_array(np.array([1], dtype=np.int64), name="pool_axes_1")
merged.graph.initializer.append(pool_axes)
merged.graph.node.append(
helper.make_node(
"ReduceMean", ["core_last_hidden_state", "pool_axes_1"],
["pooled"], keepdims=0,
),
)
CLS pooling for BGE-style models:
cls_idx = numpy_helper.from_array(np.array([0], dtype=np.int64), name="cls_idx")
sq_axes = numpy_helper.from_array(np.array([1], dtype=np.int64), name="cls_sq_axes")
merged.graph.initializer.append(cls_idx)
merged.graph.initializer.append(sq_axes)
merged.graph.node.append(
helper.make_node(
"Gather", ["core_last_hidden_state", "cls_idx"],
["pooled_gathered"], axis=1,
),
)
merged.graph.node.append(
helper.make_node("Squeeze", ["pooled_gathered", "cls_sq_axes"], ["pooled"]),
)
Stage 5: L2 normalize
pow_exp = numpy_helper.from_array(np.array(2.0, dtype=np.float32), name="pow_exp")
axes_neg1 = numpy_helper.from_array(np.array([-1], dtype=np.int64), name="axes_neg1")
eps = numpy_helper.from_array(np.array(1e-12, dtype=np.float32), name="eps_val")
merged.graph.initializer.extend([pow_exp, axes_neg1, eps])
merged.graph.node.append(helper.make_node("Pow", ["pooled", "pow_exp"], ["squared"]))
merged.graph.node.append(
helper.make_node("ReduceSum", ["squared", "axes_neg1"], ["sum_sq"], keepdims=1),
)
merged.graph.node.append(helper.make_node("Sqrt", ["sum_sq"], ["l2_norm"]))
merged.graph.node.append(helper.make_node("Max", ["l2_norm", "eps_val"], ["l2_safe"]))
merged.graph.node.append(helper.make_node("Div", ["pooled", "l2_safe"], ["emb_2d"]))
Stage 6: squeeze the batch dim and declare the final output
sq_final = numpy_helper.from_array(np.array([0], dtype=np.int64), name="sq_final_axes")
merged.graph.initializer.append(sq_final)
merged.graph.node.append(
helper.make_node("Squeeze", ["emb_2d", "sq_final_axes"], ["embedding"]),
)
EMBEDDING_DIMS = 384 # change to match your model (768 for MPNet/Nomic, etc.)
merged.graph.output.append(
helper.make_tensor_value_info("embedding", TensorProto.FLOAT, [EMBEDDING_DIMS]),
)
# Fix leftover dynamic batch dims so Oracle's runtime can infer shapes.
for vi in list(merged.graph.value_info) + list(merged.graph.input):
for dim in vi.type.tensor_type.shape.dim:
if dim.dim_param == "batch_size":
dim.ClearField("dim_param")
dim.dim_value = 1
onnx.checker.check_model(merged)
augmented_bytes = merged.SerializeToString()
Stage 7 (optional): sanity-check on CPU before uploading
Run the graph locally with onnxruntime to confirm shape and L2 norm before sending ~100 MB over the network.
import onnxruntime as ort
from onnxruntime_extensions import get_library_path
so = ort.SessionOptions()
so.register_custom_ops_library(get_library_path())
sess = ort.InferenceSession(
augmented_bytes, sess_options=so, providers=["CPUExecutionProvider"],
)
vec = sess.run(None, {"pre_text": np.array(["hello world"])})[0]
assert vec.shape == (EMBEDDING_DIMS,)
assert 0.99 < float(np.linalg.norm(vec)) < 1.01
Load into Oracle
Oracle's DBMS_VECTOR.LOAD_ONNX_MODEL has an overload that takes the model as a
raw BLOB argument. No need to write the file to disk first.
import json
import oracledb
conn = oracledb.connect(
user=USER, password=PASSWORD,
dsn=f"{HOST}:{PORT}/{SERVICE}",
tcp_connect_timeout=30,
)
# Oracle identifier: uppercase, letters/digits/underscores, starts with letter or _.
ORACLE_NAME = "ALL_MINILM_L6_V2"
metadata = json.dumps({
"function": "embedding",
"embeddingOutput": "embedding",
# The input tensor name comes from prefix1="pre_" plus the original
# tokenizer input name ("text"). Match it exactly.
"input": {"pre_text": ["DATA"]},
})
cur = conn.cursor()
cur.execute("""
BEGIN
DBMS_VECTOR.LOAD_ONNX_MODEL(
model_name => :model_name,
model_data => :model_data,
metadata => JSON(:metadata)
);
END;
""", {
"model_name": ORACLE_NAME,
"model_data": augmented_bytes,
"metadata": metadata,
})
conn.commit()
ORACLE_NAME already exists,
Oracle raises ORA-20000. Drop the old one first:
cur.execute("""
BEGIN DBMS_VECTOR.DROP_ONNX_MODEL(model_name => :n, force => TRUE); END;
""", {"n": ORACLE_NAME})
conn.commit()
Verify
Three checks: shape, L2 norm, and semantic sanity. Run all three and report the output.
import math
import json
# Note: VECTOR_EMBEDDING takes the model name as a SQL token, not a bind
# variable. The ORACLE_NAME must be validated (see Security section below).
cur.execute(f"""
SELECT VECTOR_EMBEDDING({ORACLE_NAME} USING :t AS DATA) FROM dual
""", {"t": "hello world"})
vec = cur.fetchone()[0]
v = list(vec) if not isinstance(vec, str) else json.loads(vec)
print(f"dims={len(v)}") # expect 384 for MiniLM
norm = math.sqrt(sum(x * x for x in v))
print(f"norm={norm:.6f}") # expect ~1.0
def embed(text):
cur.execute(
f"SELECT VECTOR_EMBEDDING({ORACLE_NAME} USING :t AS DATA) FROM dual",
{"t": text},
)
r = cur.fetchone()[0]
return list(r) if not isinstance(r, str) else json.loads(r)
def cosine(a, b):
return sum(x * y for x, y in zip(a, b, strict=True)) / (
math.sqrt(sum(x * x for x in a)) * math.sqrt(sum(y * y for y in b))
)
vk, vq, vb = embed("king"), embed("queen"), embed("banana")
assert cosine(vk, vq) > cosine(vk, vb), "semantic similarity is broken"
print("verification passed.")
Security: guard the model name against SQL injection
VECTOR_EMBEDDING takes the model identifier as a SQL token, not a bind. So the
name must be validated before any f-string interpolation.
import re
_SAFE = re.compile(r"^[A-Z_][A-Z0-9_]{0,127}$")
def safe_oracle_name(name: str) -> str:
if not _SAFE.match(name):
raise ValueError(f"invalid Oracle identifier: {name!r}")
return name
# Apply to every model name before interpolation.
ORACLE_NAME = safe_oracle_name(ORACLE_NAME)
All other values (the input text, model blob, metadata) go through bind variables already. The model name is the one place where a raw SQL identifier is unavoidable.
Troubleshooting
- ORA-00942: table or view does not exist when querying
user_mining_models. The DB user lacksSELECT ANY DICTIONARY, or the instance is pre-23ai. - ORA-20000: LOAD_ONNX_MODEL failed. Two causes. First: opset mismatch.
Rebuild with
onnx>=1.16and opset 18. Second: the metadata JSON's input tensor name doesn't match the graph. It must equal whatever theprefix1=argument produced (e.g.pre_text). - ORA-01031: insufficient privileges. As SYSDBA, run:
GRANT EXECUTE ON DBMS_VECTOR TO <user>; GRANT CREATE MINING MODEL TO <user>;. - ORA-40XXX or BertTokenizer errors at load. The tokenizer's vocabulary
doesn't contain what Oracle expects (usually a SentencePiece model). Swap to a
WordPiece-based model, e.g.
microsoft/Multilingual-MiniLM-L12-H384for multilingual use cases. - First-call slowness. Oracle compiles the ONNX on the first call per session. Expect 100 to 500 ms cold, then under 20 ms for subsequent calls.
Model cheatsheet
Five models known to work end-to-end with this exact pipeline. All are WordPiece. Pick by the constraint that matters (speed, quality, long docs, language).
| HF repo | Dims | Size | Pooling |
|---|---|---|---|
| sentence-transformers/all-MiniLM-L6-v2 | 384 | ~90 MB | mean |
| sentence-transformers/all-MiniLM-L12-v2 | 384 | ~130 MB | mean |
| sentence-transformers/all-mpnet-base-v2 | 768 | ~420 MB | mean |
| BAAI/bge-small-en-v1.5 | 384 | ~130 MB | cls |
| nomic-ai/nomic-embed-text-v1 | 768 | ~540 MB | mean |
Appendix: if you'd rather just pip install something
This entire manual is packaged as a CLI if you don't need to customize the pipeline:
pip install onnx2oracle
onnx2oracle load all-MiniLM-L6-v2 --target local
The onnx2oracle CLI wraps everything on this page into one command. Use it when
you want the shortest path. Use the manual above when you need to customize pooling, swap the
tokenizer, or target an unusual model architecture.