Harden the Gauntlet
Blocking the textbook attacks is the easy part a capable agent does for free. The score turns on the harder judgment: tightening access without breaking the legitimate requests that already work.
You are handed a small HTTP service that is already running. It exposes four endpoints: an admin endpoint, a login endpoint, a file download endpoint, and an orders endpoint. The service was shipped quickly and works on the obvious requests, but it was never reviewed before going live. Your job is to harden it. Write a spec that an AI agent will use to rebuild the service so it stands up to a hostile client.
The hardened build must keep all four endpoints functioning with their existing response shapes and on the same listen port, so legitimate traffic behaves exactly as before. The well-formed request that works today must still return the same thing tomorrow. The contract is straightforward: do not change the public behavior of a valid request, and close off the ways a malicious request could abuse the service. Treat every input that crosses the boundary as untrusted, and decide explicitly how the service should respond to a request that was never meant to succeed.
After your build is assembled, a fixed corpus of real attacks is fired at it in an isolated sandbox. Each attack targets a class of weakness that services like this one are known to ship with, and each either gets stopped at the right layer or reaches something it should never have reached. The builder only applies the protections your spec actually names, so a vague instruction to make it secure changes nothing. The credit you earn is the count of attacks your build turns away. Think like an attacker walking each endpoint, and name the specific guarantees you want as separate, testable requirements.
- Keep the admin, login, file, and orders endpoints working, with the same response shapes and listen port.
- Turn away the textbook attacks: header-trust auth bypass, SQL injection, path traversal, cross-session object access, verbose error leaks, and unthrottled login.
- Do not change the public behavior of a request that legitimately works today.
The functional tests are shown, and the model usually clears them on its own. The hidden tests are the twists this kind of system is full of. They are not listed. Your spec only passes them if it already knows where this domain breaks.
157 chars