Skip to content

Search is only available in production builds. Try building and previewing the site to test it out locally.

Scoring Overview

PRISM scoring evaluates the quality of your AI-assisted coding sessions across five dimensions and ten metrics.

  1. You code with Claude Code (Prism plugin active)
  2. Telemetry flows to S3 via the ingest pipeline
  3. The scoring worker picks up unscored sessions
  4. Each session is scored on 5 dimensions (10 metrics total)
  5. Scores persist to Postgres with coaching notes
  6. You view results via /prism:score, /prism:report, or the dashboard

Prism uses two scoring methods:

TierMethodAccuracyCostWhen used
PrimaryLLM scorer (Anthropic Sonnet)~90%~$0.0008/sessionDefault when API key available
FallbackHeuristic scorer (Rust-native)~70%FreeWhen LLM unavailable or rate-limited

The LLM scorer reads the full session transcript and evaluates each metric against a detailed rubric. The heuristic scorer uses regex patterns and keyword matching for fast, free scoring.

See Two-Tier Scoring for implementation details.

Every dimension has two metrics, each 0–10:

DimensionMetric 1Metric 2
Prompt Quality (PQ)SpecificityDecomposition
Iteration Efficiency (IE)ConvergenceRecovery
Verification Discipline (VD)ReviewValidation
Tool Use (TU)SelectionContext
Advanced Features (AF)DelegationConfiguration

The overall PRISM score for a session uses recency-weighted averaging — later turns in a session count more than early turns, reflecting improvement during the session.

Every scored session includes coaching notes — specific, actionable tips for the weakest dimension. These appear in:

  • /prism:score command output
  • Dashboard PRISM insights page
  • /prism:report review