Scoring Overview

PRISM scoring evaluates the quality of your AI-assisted coding sessions across five dimensions and ten metrics.

How scoring works

You code with Claude Code (Prism plugin active)
Telemetry flows to S3 via the ingest pipeline
The scoring worker picks up unscored sessions
Each session is scored on 5 dimensions (10 metrics total)
Scores persist to Postgres with coaching notes
You view results via /prism:score, /prism:report, or the dashboard

Two-tier scoring

Prism uses two scoring methods:

Tier	Method	Accuracy	Cost	When used
Primary	LLM scorer (Anthropic Sonnet)	~90%	~$0.0008/session	Default when API key available
Fallback	Heuristic scorer (Rust-native)	~70%	Free	When LLM unavailable or rate-limited

The LLM scorer reads the full session transcript and evaluates each metric against a detailed rubric. The heuristic scorer uses regex patterns and keyword matching for fast, free scoring.

See Two-Tier Scoring for implementation details.

What gets scored

Every dimension has two metrics, each 0–10:

Dimension	Metric 1	Metric 2
Prompt Quality (PQ)	Specificity	Decomposition
Iteration Efficiency (IE)	Convergence	Recovery
Verification Discipline (VD)	Review	Validation
Tool Use (TU)	Selection	Context
Advanced Features (AF)	Delegation	Configuration

Session-level PRISM

The overall PRISM score for a session uses recency-weighted averaging — later turns in a session count more than early turns, reflecting improvement during the session.

Coaching notes

Every scored session includes coaching notes — specific, actionable tips for the weakest dimension. These appear in:

/prism:score command output
Dashboard PRISM insights page
/prism:report review