The skill map

The skills top AI roles test. Graded, not discussed.

Not puzzles. This is the actual AI-engineering skill map, the things you get interviewed and judged on, and almost none of it is tested deterministically anywhere. Here every cell is a challenge a hidden oracle grades in a real sandbox: write the spec, ship it, and reality tells you if you were right.

5 live0 building8 on the roadmap

System Design

How AI systems are wired: pipelines, retrieval, memory, orchestration.

Agent orchestration

live

Direct a team of agents to ship a task. Graded on how you route and verify, not one clever prompt.

The Shift The Closer The Conductor Day One

Retrieval / RAG

live

Tune a chunk to search pipeline until it surfaces buried answers. Hidden queries grade recall on docs you can't see while tuning.

The Stacks

Tool routing

roadmap

Route a request to the right tool. Hidden cases catch the misroutes and the ambiguous intents.

on the roadmap

Context management

roadmap

Fit the right context in a token budget without dropping the answer to the hidden question.

on the roadmap

Cost & latency

roadmap

A cheap-first, escalate-when-needed routing policy, graded on a hidden cost by quality frontier.

on the roadmap

Prompt Engineering

Getting exact, machine-checkable behavior out of a model.

Structured prompting

live

Write a spec precise enough that hidden tests pass. Vague prompts confidently ship the bug.

Mini Checkout + Promo Code Engine Rate Limiter Fixed-Block Pool Allocator

Prompt injection & security

live

Harden a service against a real attack corpus. The gauntlet finds the gap a vague prompt leaves open.

Harden the Gauntlet Seal the Vault

Context compression

roadmap

Compress a long context to a budget and still answer the hidden questions it has to cover.

on the roadmap

Eval & Reliability

Proving a system actually works. The discipline PromptGolf is built on.

Build an eval harness

live

You're handed unlabeled AI outputs and a spec; you build the judge that catches the defects. We grade YOUR evaluator against a hidden labeled set.

Build the Eval

Data Engineering

The data behind retrieval: chunking, embeddings, search, cleaning.

Chunking

roadmap

Chunk documents so the answer survives the split. Graded on downstream recall.

on the roadmap

Vector search

roadmap

Index and query embeddings for recall@k against a hidden query set.

on the roadmap

Data cleaning

roadmap

Clean a messy dataset up to a hidden quality bar.

on the roadmap

Synthetic data

roadmap

Generate data that matches a hidden target distribution.

on the roadmap