Verification Discipline (VD)
Weight: 20% — the second-highest weight. Unverified AI output is the #1 source of downstream bugs.
Review (0–10)
Section titled “Review (0–10)”Do you review AI-generated code before accepting it?
High review:
- Reads diffs before accepting changes
- Asks clarifying questions about the approach
- Catches issues before they compound
- Reviews generated tests for correctness
Low review:
- Accepts all changes without reading
- Never questions the approach
- Bugs surface later because changes weren’t reviewed
Validation (0–10)
Section titled “Validation (0–10)”Do you verify that changes work correctly?
High validation:
- Runs tests after changes (
npm test,cargo test) - Checks types (
tsc --noEmit,cargo check) - Manually verifies behavior (browser, API call, CLI output)
- Runs linting to catch style issues
Low validation:
- Never runs tests
- Assumes changes work without verification
- Skips type-checking
- Ships without any manual verification
Examples:
| Score | Pattern |
|---|---|
| 9 | ”Run pnpm test to verify the changes” → reviews output → “Fix the failing assertion in auth.test.ts” |
| 5 | Occasionally asks to run tests, but not consistently |
| 2 | Never mentions testing, linting, or verification |
Improving VD
Section titled “Improving VD”- Run tests after every change — make it a habit: implement → test → commit
- Read the diff — even a quick scan catches obvious issues
- Add verification prompts — end your prompt with “then run the tests”
- Use type-checking — “run
tsc --noEmitand fix any errors” - Manual spot-check — for UI changes, look at the result; for APIs, call the endpoint