word2vec, But for Food: Ingredient Embeddings You Can Do Math On

Researchers trained word2vec-style embeddings on 4.1 million recipes and found that culinary concepts, cuisine, nutrients, even basic tastes, fall out as linear directions you can navigate with vector arithmetic. The same trick that gave language models “king - man + woman = queen” works on food.

The system, called Epicure, comes from KAIKAKU.AI. Its more interesting contribution is a control knob: by varying only how the training random-walks traverse the ingredient graph, the team produced three sibling embeddings that slide from pure recipe co-occurrence toward shared flavor chemistry, exposing chemistry-versus-context as a design axis rather than a fixed property.

Built from 4,135,189 recipes across 11 sources and 7 languages, normalized to 1,790 canonical ingredients
Cuisine regions separate cleanly: Cohen’s d̄ of 2.43 to 3.07 across the three variants (a “huge” effect size starts around 0.8)
Ingredient vectors predict USDA macronutrients at Spearman ρ = 0.41 to 0.49 and held-out basic tastes at ρ = 0.32 to 0.47, with no nutrition labels in training
Each model surfaces 150 to 200 named “culinary modes”, emergent clusters like flavor families and regional staples, all linearly recoverable

The Chemistry-vs-Context Knob

All three embeddings share the same architecture, dimensions, and hyperparameters. The only difference is the walk template over a graph that links ingredients to each other and, optionally, to 2,247 flavor-compound nodes from FlavorDB.

Three sibling embeddings, one design axis

Cooc — pure co-occurrence

Ingredient-to-ingredient walks only. Captures “what gets cooked together.”

Core — mixed

Co-occurrence with 10x ingredient injection plus compound paths. The balanced middle.

Chem — compound-mediated

Walks routed through shared flavor compounds. Captures “what tastes related.”

Sliding toward the chemistry end produced the strongest directional signals on supervised tasks. Chem led on cuisine separation (d̄ 3.07 vs 2.43 for Cooc) and nutrient correlation (ρ 0.49 vs 0.41), suggesting flavor-compound structure carries more of the signal that culinary concepts ride on than raw recipe pairing does.

Concepts Come Out Linear

The headline property is that you do not need a classifier to read these concepts off. They are directions.

Cuisine separation by variant (Cohen’s d)

Macro-region separability. Higher = cleaner linear split.

Chem3.07

Core2.70

Cooc2.43

On top of the supervised directions, an ICA decomposition pulled out 20 stable factors per model, and GMM clustering named 150 to 200 emergent “modes.” Mode coherence ran well above random baselines (for example Core scored 0.833 against a 0.348 random floor), meaning these clusters are real structure, not artifacts. Epicure then exposes navigation operators: nearest-neighbor pairing lookups, and SLERP “direction arithmetic” that continuously rotates an ingredient toward a cuisine, nutrient, or taste pole by an adjustable angle. In practice, that is a substitution engine, push “soy sauce” toward a Mediterranean pole and read off what sits nearby.

Caveats

The corpus is lopsided. East Asian recipes make up 1.55M of the 4.14M total, so smaller regions carry wider confidence intervals and the cuisine geometry is best-supported where the data is densest. The chemistry signal is also thinner than it looks: only 523 of 1,790 ingredients connect directly to FlavorDB compounds, and the rest reach chemistry only indirectly through the graph.

Two more practical limits. The vocabulary was canonicalized with Claude Opus under deterministic decoding, so the ingredient set inherits whatever the LLM consolidation decided (the embeddings themselves are LLM-free). And the authors state plainly that code and trained artifacts are not being released, so for now this is a result to learn from rather than a model to download.

Still, the core takeaway is clean: with nothing but recipe graphs and a walk schema, food becomes a navigable vector space where taste, nutrition, and cuisine are directions you can dial. The chemistry-vs-context knob is the part worth stealing, it turns “what does this embedding capture” from a fixed outcome into a tuning decision.