Guide 03

Pick a model

Five presets, five questions. If you don't know which to use, all-MiniLM-L6-v2 is the right default for English text below ~5M rows.

The decision tree

Is any of your text non-English?

onnx2oracle's English-only WordPiece presets will happily embed Spanish or Japanese, but the cosine distances will be junk across languages. For truly multilingual retrieval, see the "Not supported" note below and consider a WordPiece-based multilingual model via --from-huggingface.

If every input is English (or can be translated to English upstream), continue.

How many rows are you embedding?

< 1M rows, latency-sensitive — all-MiniLM-L6-v2. Smallest model, fastest inference.
1M – 20M rows — all-MiniLM-L12-v2 or bge-small-en-v1.5. Same 384-d footprint, better recall on longer passages.
Quality matters more than speed — all-mpnet-base-v2. 768-d, noticeably better on ambiguous queries, ~2× the index size.

How long are your documents?

Four of the five presets truncate at 512 tokens. If you regularly feed multi-page documents in one shot, nomic-embed-text-v1's 8192-token window saves you from chunking logic — at the cost of more compute per embedding and a 540 MB model.

For most uses, chunking and using a cheaper 512-token model is the better trade. Nomic wins when the chunking itself hurts retrieval quality (e.g. legal contracts, research papers).

How much disk can you spend on the vector column?

Vector storage dominates for large tables. At FP32:

384-d vector ≈ 1.5 KB per row — 1M rows ≈ 1.5 GB.
768-d vector ≈ 3.0 KB per row — 1M rows ≈ 3.0 GB.

Indexes (HNSW, IVF_FLAT) roughly double that. If the 3× factor hurts, stay on 384-d.

Do you have a benchmark?

If you have eval queries and gold passages, run both candidates through VECTOR_DISTANCE and measure nDCG@10 yourself. Public benchmark rankings are a starting point — your data is the actual answer.

Shortcut table

When you don't want the tree:

Situation	Pick
English default, no special requirements	all-MiniLM-L6-v2
English, quality matters, can spend the disk	all-mpnet-base-v2
BEIR-style benchmarks, English	bge-small-en-v1.5
Long documents, no chunking	nomic-embed-text-v1
Same geometry as the default, more recall	all-MiniLM-L12-v2

Running two models side by side. Nothing stops you from loading both — VECTOR_EMBEDDING(MODEL_A USING ...) and VECTOR_EMBEDDING(MODEL_B USING ...) coexist. Store two vector columns and A/B test query quality on your real traffic.

Something weirder?

If none of the presets match — domain-specific models like BioBERT, instruction-tuned retrievers, custom fine-tunes — the --from-huggingface escape hatch handles it. You supply pooling, dims, and max-length; the loader handles everything else.

Not supported: SentencePiece-based multilingual models

Models like intfloat/multilingual-e5-small use SentencePiece/Unigram tokenizers. These can't be expressed as the BertTokenizer ONNX op Oracle needs. If you try to load one, onnx2oracle raises a clear NotImplementedError pointing you at WordPiece-based multilingual alternatives (e.g. microsoft/Multilingual-MiniLM-L12-H384).

How it works ADB wallet flow