Prompt Preflight – catch vague AI-agent prompts before they burn tokens
Prompt Preflight is a local Codex plugin and standalone CLI that detects vague prompts before model execution, avoiding costly retry loops. It uses deterministic Python rules with no network requests or API calls, providing targeted clarification questions and domain-aware strong prompt examples. Suitable for software development and image generation scenarios, especially expensive tasks like repository-wide changes, migrations, and deployments.
Notifications You must be signed in to change notification settings
Fork 0
Star 0
BranchesTags
Open more actions menu
Folders and files
NameName
Last commit message
Last commit date
Latest commit
History
1 Commit
1 Commit
.codex-plugin
.codex-plugin
.github/workflows
.github/workflows
docs
docs
hooks
hooks
scripts
scripts
skills/prompt-preflight
skills/prompt-preflight
src/prompt_preflight
src/prompt_preflight
tests
tests
.gitignore
.gitignore
.prompt-preflight.example.json
.prompt-preflight.example.json
LICENSE
LICENSE
README.md
README.md
pyproject.toml
pyproject.toml
Repository files navigation
Catch underspecified requests before they become expensive model turns.
Prompt Preflight is a local Codex plugin and standalone CLI that checks whether a prompt is specific enough to act on. When ambiguity and the cost of being wrong are both high, it pauses the request and gives the user:
Their original prompt.
A domain-aware example of a stronger prompt.
Up to three questions that fill the most important gaps.
The check uses deterministic Python rules. It makes no network requests and calls no model.
Demo
Prompt Preflight catches a vague Codex prompt before model work begins:
The demo shows the core loop:
User submits a vague request → Prompt Preflight runs locally → Codex gets blocked before spending a model turn → the user receives a stronger prompt template and targeted questions
Why this exists
A vague prompt often creates an expensive loop:
Vague request → model reads project context → model produces the wrong interpretation → user corrects it → model reads the expanded conversation → model does the work again
The wasted cost is not limited to the first answer. The retry also carries the earlier prompt, output, corrections, and additional context.
Prompt Preflight moves clarification before that loop:
Vague request → local preflight check → targeted clarification → one stronger request → useful model work
It does not reduce the price per token. It reduces avoidable model input, unwanted output, repeated tool work, and correction turns.
Where token savings come from
Without preflight, the approximate cost of a failed attempt and retry is:
failed input + failed output + correction context + replacement input + replacement output
With preflight, the local check consumes zero model tokens. The intended path becomes:
clarified input + useful output
The potential tokens avoided are therefore approximately:
failed input + failed output + correction context + duplicated work
Actual savings depend on prompt quality, model behavior, context size, and task complexity. Prompt Preflight does not currently claim a fixed savings percentage; measured token telemetry is future work.
The largest benefit is expected on tasks where a wrong interpretation is costly, such as repository-wide changes, migrations, deployments, architecture work, or iterative image generation.
Example: image generation
User prompt:
Create a car image
Prompt Preflight responds before image generation begins:
Your prompt: "Create a car image"
Try asking: "Create a [photorealistic/illustrated/3D] image of a car with [key colors, materials, and distinctive details], in [setting/background], viewed from [camera angle/composition], with [lighting/mood], in [aspect ratio]."
Fill in the brackets by answering:
- What should the car look like?
- What visual style and mood do you want?
- What setting, composition, lighting, and aspect ratio should it use?
This prevents an arbitrary first image followed by several rounds of visual corrections.
Example output after the prompt is clarified:
Example: software work
User prompt:
Make the dashboard better
Prompt Preflight suggests:
Improve the dashboard in [specific page/component] so [observable outcome]. Keep [important behavior or design constraints] unchanged. Verify with [tests or acceptance criteria].
The model receives a target, outcome, boundaries, and definition of done before it reads files or edits code.
Key features
Runs before a Codex model turn through UserPromptSubmit.
Uses no model, API key, network access, or external service.
Routes prompts by domain before selecting feedback.
Includes software and image-generation feedback profiles.
Shows a tailored rewrite instead of only saying “be more specific.”
Asks at most three high-value questions.
Lets clear prompts and conversational follow-ups pass through.
Supports a one-time [preflight:skip] bypass.
Supports configurable block and nudge modes.
Fails open if hook input is malformed.
Provides structured JSON for evaluation and debugging.
How the decision works
Prompt Preflight estimates three things:
Intent: What kind of work is being requested?
Ambiguity: Which domain-specific details are missing?
Impact: How expensive would a wrong interpretation be?
It interrupts only when the prompt is actionable and both ambiguity and impact cross the configured threshold. This prevents the plugin from interrogating users about simple questions, confirmations, or already-specific work.
Current domain profiles include:
Software builds and changes
Bug fixes
Optimization
Deployment and migration
Image generation
Unsupported domains use a conservative fallback rather than receiving software-specific questions.
Quick local test
Requires Python 3.10 or later.
python3 scripts/prompt_preflight.py "Create a car image"
A prompt requiring clarification exits with status 2. A prompt ready to send exits with status 0:
python3 scripts/prompt_preflight.py \ "Create a photorealistic image of a red 1967 Ford Mustang on a wet Tokyo street at night, low camera angle, cinematic lighting, 16:9."
Inspect the full analysis:
python3 scripts/prompt_preflight.py --json "Rewrite the whole project"
Structured output includes the detected intent, ambiguity score, impact score, reasons, questions, and suggested prompt.
Benchmark vague-prompt detection
Prompt Preflight includes a fixed benchmark of 100 intentionally vague prompts across software work, bug fixes, deployment, migration, optimization, and image generation.
Run it locally:
python3 scripts/benchmark_vague_prompts.py
Save complete results as JSON:
python3 scripts/benchmark_vague_prompts.py \ --min-block-rate 0.90 \ --json-output benchmark-results.json
The benchmark reports:
Number of vague prompts blocked before model work
Missed prompts that should be reviewed
Average ambiguity, impact, and clarification scores
Results grouped by detected intent
What the first benchmark taught us
The first 100-prompt run exposed exactly the kind of regression risk this project is meant to catch. Early scoring was too lenient on short action prompts such as:
Update the API Fix checkout Integrate analytics Implement caching
Those prompts look actionable, but they omit the target behavior, constraints, and acceptance criteria. Acting on them can easily trigger a costly loop: the model guesses, the user corrects it, and the model repeats the work with more conversation history in context.
The benchmark also exposed a domain-routing issue. A prompt like:
Render a house
should receive image-generation feedback, not software-project feedback. The analyzer now treats common visual render prompts as image-generation requests so the user gets questions about style, composition, lighting, and output format instead of files, components, or platform stack.
With the current default threshold, the benchmark catches:
98 / 100 vague prompts 10 / 10 image-generation prompts
The two current misses are:
Fix the flaky tests Generate more tests
These misses are useful calibration cases. They show why the benchmark is not just a vanity metric: it gives maintainers concrete prompts to discuss, tune, and convert into regression tests when the desired behavior is clear.
This is a regression guard, not a token-savings guarantee. The benchmark consumes zero model tokens and helps catch changes that would let vague, costly prompts slip through.
The repository also includes a GitHub Actions workflow at .github/workflows/benchmark.yml. It runs the unit tests and the 100-prompt benchmark on pushes, pull requests, and manual workflow dispatch.
Install in Codex
Automatic install:
python3 scripts/install_codex_plugin.py
The installer copies the plugin to ~/plugins/prompt-preflight, creates or updates the personal marketplace at ~/.agents/plugins/marketplace.json, and attempts to run codex plugin add prompt-preflight@personal.
If the Codex CLI is not on your shell PATH, the installer still completes the file and marketplace setup, then prints the command and Codex app link needed to finish installation.
Preview the changes without writing files:
python3 scripts/install_codex_plugin.py --dry-run
See the external setup guide for:
macOS, Linux, and Windows installation
Personal marketplace configuration
Installer options and manual fallback steps
Hook review and trust
End-to-end Codex tests
Updating and uninstalling
Troubleshooting
After installation, restart Codex, open a new thread, and review the hook with /hooks.
Configuration
Create .prompt-preflight.json in the project where Codex runs:
{ "enabled": true, "mode": "block", "threshold": 45, "max_questions": 3 }
block: stop the vague submission before model work.
nudge: allow the turn while instructing Codex to clarify first.
threshold: raise it to interrupt less often.
max_questions: limit clarification questions from 1 to 5.
enabled: disable Prompt Preflight for a project.
Bypass one request without changing configuration:
Create a car image [preflight:skip]
Privacy and security
Prompt text is analyzed locally. Prompt Preflight does not:
Send prompt text to a server
Store prompt history
Require an API key
Invoke a cheaper model to decide whether an expensive model should run
Modify files during prompt analysis
As with any local plugin, review .codex-plugin/plugin.json, hooks/hooks.json, and scripts/prompt_preflight_hook.py before trusting the hook.
Limitations
Rule-based intent routing cannot understand every phrasing.
Domain coverage is intentionally narrow and high-precision today.
Clarification can add friction when the user prefers the model to make assumptions.
Token savings are task-dependent and are not yet measured automatically.
Prompts may use [preflight:skip] when interruption is not worthwhile.
Incorrect classifications should become regression tests. Run a questionable prompt with --json and capture its detected intent, reasons, and questions.
Development
Run the test suite:
python3 -m unittest discover -s tests -v
Smoke-test the Codex hook contract:
python3 scripts/prompt_preflight_hook.py <<'EOF' {"hook_event_name":"UserPromptSubmit","prompt":"Create a car image"} EOF
The project currently has regression coverage for vague and detailed prompts, domain routing, bypass behavior, nudge mode, and malformed hook input.
Roadmap
Token and retry savings telemetry
More domain profiles, including writing, research, data analysis, and presentations
User-defined terminology and intent rules
Per-domain thresholds
Claude Code and other agent adapters
False-positive feedback capture and calibration reports
Public launch checklist
Before making the repository public:
Publish the contents of this prompt-preflight folder as the GitHub repository root so this README appears on the landing page.
Add GitHub topics such as codex, ai-agents, prompt-engineering, llm, developer-tools, token-cost, python, hooks, and productivity.
Confirm the demo recording does not show secrets, private repo names, customer data, or personal notifications.
Run python3 scripts/install_codex_plugin.py --dry-run before tagging a release.
Run python3 -m unittest discover -s tests -q and python3 scripts/benchmark_vague_p
[truncated for AI cost control]