Two-Tier Scoring
Prism uses two scoring methods to balance accuracy and cost.
LLM scorer (primary)
Section titled “LLM scorer (primary)”The LLM scorer reads the full session transcript and evaluates each of the 10 metrics against a detailed rubric.
| Property | Value |
|---|---|
| Model | Anthropic Sonnet |
| Accuracy | ~90% |
| Cost | ~$0.0008 per session |
| Latency | 2–5 seconds |
| Used when | API key available, rate limit not exceeded |
How it works:
- Session transcript is formatted with turn boundaries, tool calls, and file edits
- The rubric for all 10 metrics is embedded in the system prompt
- The LLM evaluates each metric on the 0–10 scale
- Coaching notes are generated for the weakest dimension
- Results are structured as JSON and persisted to Postgres
Advantages: understands nuance, context, and intent. Can detect subtle patterns like scope creep or missed verification opportunities.
Heuristic scorer (fallback)
Section titled “Heuristic scorer (fallback)”The heuristic scorer uses regex patterns, keyword matching, and structural analysis for free, instant scoring.
| Property | Value |
|---|---|
| Implementation | Rust-native |
| Accuracy | ~70% |
| Cost | Free |
| Latency | <10ms |
| Used when | LLM unavailable, rate-limited, or as real-time PQ scoring in the plugin |
How it works:
- Prompt text is analyzed for specificity markers (file paths, function names, etc.)
- Decomposition is measured by verb count, bundling phrases, and list items
- Session-level patterns are detected (retry storms, correction cascades)
- Point values are assigned per signal and summed to a 0–10 scale
- Coaching notes are generated from templates based on the lowest-scoring areas
Advantages: instant, free, works offline. Used for real-time PQ coaching in the UserPromptSubmit hook.
When each is used
Section titled “When each is used”| Context | Scorer | Reason |
|---|---|---|
| UserPromptSubmit hook | Heuristic | Must be instant (<100ms), runs on every prompt |
/prism:advisor command | Heuristic | Interactive — needs instant feedback |
/prism:score command | Reads from Postgres | Displays pre-computed scores |
| Background scoring worker | LLM (primary) | Accuracy matters, async processing |
| Background scoring worker (fallback) | Heuristic | LLM unavailable or rate-limited |
Feedback adjustments
Section titled “Feedback adjustments”After scoring, both methods apply adjustments based on session-level signals:
| Signal | Adjustment |
|---|---|
| Correction turns detected | Penalty to IE (Recovery) |
| Retry storm detected | Penalty to IE (Convergence) |
| Single-turn session with good output | Bonus to PQ and IE |
| No verification prompts in session | Penalty to VD |
| CLAUDE.md present in project | Bonus to AF (Configuration) |
Score storage
Section titled “Score storage”All scores are stored in the prism.prism_scores Postgres table:
- Session ID, timestamp, org ID, developer ID
- All 10 metric scores (0–10)
- Composite PRISM score (weighted average)
- Scoring method (LLM or heuristic)
- Coaching notes (text)
- Anti-patterns detected (JSON array)