Screening A[i]gent Blueprint: Hybrid Rubric & Candidate Evaluation
1. Problem & Mission
1.1 Structural Constraint
High-volume recruiting environments operate under structural constraints. A single open role can receive 800–1,000 applications before closing. Even if only 10% meet baseline qualification criteria, that still produces 80–100 viable candidates requiring thoughtful evaluation.
Recruiter and hiring manager interview capacity rarely scales at the same rate. Recruiters may be able to screen 25–50 candidates. Hiring managers may have bandwidth for 10–15 interviews.
When evaluation capacity is constrained, screening decisions compress into seconds per resume. Under these conditions, the bottleneck is not applicant supply — it is structured evaluation capacity.
In a typical funnel:
- ~1,000 applications received
- ~100 viable candidates
- ~25–50 recruiter screens
- ~10–15 hiring manager interviews
This means a significant portion of qualified candidates may never reach structured, consistent review. The system quietly optimizes for speed over judgment.
1.2 The Risk
When review time is compressed, evaluation becomes discretionary and inconsistent. Reasons for rejection are rarely structured or auditable. Bias risk increases under time pressure. Funnel data becomes too noisy to diagnose missed talent or fairness drift later.
The problem is not recruiter intent. It is infrastructure.
1.3 Mission
Screening A[i]gent exists to introduce structured, rubric-driven evaluation at the earliest screening stage — before interview bandwidth becomes the bottleneck.
The objective is not to replace recruiter judgment. It is to standardize and document it.
- Ensure every inbound application receives structured evaluation.
- Preserve human decision authority.
- Create auditable, explainable screening outputs.
- Improve calibration, fairness, and downstream analytics.
1.4 Outcomes We’re Aiming For
- Every inbound application receives structured, rubric-driven evaluation.
- Clear, explainable recommendations for recruiters and hiring managers.
- Reduced time spent on “obvious no” reviews.
- Structured data that feeds downstream workflow and analytics.
- Improved calibration and fairness monitoring over time.
2. Scope & Design Principles
2.1 In-Scope
- Inbound application / resume evaluation for defined roles.
- Transformation of JDs and HM requirements into role-specific rubrics.
- Hybrid scoring across experience, tenure, employer fit, skills, education, and soft indicators.
- Risk flagging (e.g., job hopping, gaps, inconsistent titles).
- Recommendations: Advance / HM Review / Do Not Advance.
- Ask-next prompts for recruiter screens.
- Structured outputs into ATS fields and Metrics A[i]gent.
2.2 Out-of-Scope (v1)
- Live interview workflows (owned by Workflow A[i]gent).
- Offer, onboarding, or HRIS provisioning.
- Sourcing, campaign management, or nurture flows.
- Full ML-based prediction of performance (future extension, grounded + validated).
2.3 Design Principles
- Rubric-first, model-second. The rubric is the contract; AI is the assistant.
- Hybrid scoring. Every category has a human-readable band and a numeric score under the hood.
- Explainable. Every recommendation can be traced to specific signals and weights.
- Human-in-the-loop. Recruiters must approve or override the agent’s recommendation.
- Bias-aware. Flags, not vetoes; calibration over time, not hard-coded stereotypes.
2.4 Global Rules
- No fully automated rejection: a human must confirm “Do Not Advance.”
- Every override requires a short note, creating an audit trail.
- Rubrics and weights are documented and discoverable by TA & HMs.
- The agent does not infer or store protected characteristic data.
3. Roles & Responsibilities
| Role | Responsibilities |
|---|---|
| Recruiter | Reviews output, approves/overrides recommendations, adds notes, and owns candidate communication. |
| Hiring Manager | Aligns on rubric and thresholds, reviews structured summaries, and provides feedback on edge cases. |
| TA Ops / ATS Admin | Configures fields + mappings; maintains rubric definitions and scoring weights. |
| People Analytics / Metrics A[i]gent Owner | Builds dashboards; monitors fairness, overrides, and funnel quality. |
| Owner | Diane Wilkinson – design, implementation, and continuous improvement of Screening A[i]gent. |
4. System Overview
At a high level, Screening A[i]gent runs the following loop:
- 1. Inputs: JD + HM priorities + resume + metadata (location, source, etc.).
- 2. Rubric assembly: Convert requirements into structured rubric components.
- 3. Evidence extraction: Parse for roles, companies, tenure, skills, education, signals.
- 4. Scoring: Apply hybrid scoring across categories, with risk penalties.
- 5. Recommendation: Generate disposition and ask-next prompts.
- 6. Output: Write scores, bands, tags, and recommendation back to ATS fields.
4.1 Technical Architecture
Under the hood, Screening A[i]gent runs on a modular stack: a reasoning layer for rubric scoring, a light orchestration layer to sequence steps, and ATS-native integrations (e.g., Greenhouse custom fields) to write structured outputs back into the system of record.
5. Rubric & Signal Library
The rubric is broken out into categories with both band-level interpretation and numeric scores. Risk flags act as negative adjustments to the overall score.
5.1 Rubric Categories
| Category | Max Points | Summary |
|---|---|---|
| Experience Depth | 30 | Relevancy of prior roles to this job’s scope and level. |
| Tenure Stability | 15 | Consistency and average time-in-role, tuned for tech. |
| Competitor / Industry Fit | 15 | Employer pedigree across competitors, adjacencies, and relevant tech. |
| Skills & Tools Match | 25 | Coverage of must-have and nice-to-have skills; transferability. |
| Education & Credentials | 5 | Baseline requirements and relevant advanced credentials. |
| Soft Indicators | 10 | Signals like clarity, detail, follow-through, and polish. |
| Risk Flags (penalty) | -20 | Moderate negative weighting for meaningful risk patterns. |
5.2 Experience Depth – Signals
- High band: 3+ years in highly relevant roles, progressive scope.
- Medium band: 1–3 years relevant, plus adjacent/transferable work.
- Low band: mostly unrelated roles, unclear match to level/domain.
5.3 Tenure Stability – Tech-Tuned
- High band: average 2–3 years per role, reasonable moves.
- Medium band: average ~1.5–2 years, coherent trajectory.
- Low band: repeated sub-12-month roles, unexplained gaps, frequent laterals.
5.4 Competitor / Industry Fit – Generous Model
- High band: direct competitors, adjacencies, strong tech brands.
- Medium band: general tech or relevant adjacent industries.
- Low band: low-signal employers for this role/domain.
5.5 Skills & Tools Match
- High band: all must-haves demonstrated; several nice-to-haves.
- Medium band: most must-haves; transferable skills; some gaps.
- Low band: missing critical skills or only adjacent experience.
5.6 Education & Credentials
- High band: baseline + directly relevant advanced credentials.
- Medium band: baseline/equivalent; relevant coursework.
- Low band: missing true business-critical requirements.
5.7 Soft Indicators
- Clarity and structure of resume content.
- Evidence of results (metrics, impact, ownership).
- Consistency between roles, responsibilities, and claimed achievements.
5.8 Risk Flags (Moderate Penalty)
Risk flags do not automatically disqualify candidates; they trigger ask-next questions and moderate penalties.
- Repeated short tenures (< 12 months) without context.
- Unexplained multi-year gaps.
- Title inflation vs scope (e.g., “VP” for an IC-level role).
- Buzzword-heavy content with little evidence of outcomes.
6. Hybrid Scoring & Recommendations
The hybrid model combines banded scores (explainability) with numeric ranges (analytics + tuning).
6.1 Category Bands & Points
| Category | Band | Points (example) |
|---|---|---|
| Experience Depth | High / Med / Low | 24–30 / 16–23 / 0–15 |
| Tenure Stability | High / Med / Low | 12–15 / 8–11 / 0–7 |
| Competitor Fit | High / Med / Low | 12–15 / 7–11 / 0–6 |
| Skills Match | High / Med / Low | 20–25 / 12–19 / 0–11 |
| Education | High / Med / Low | 4–5 / 2–3 / 0–1 |
| Soft Indicators | High / Med / Low | 8–10 / 4–7 / 0–3 |
| Risk Flags | None / Mild / Significant | 0 / -5 / -10 to -20 |
6.2 Overall Score Bands
- 90–100: Strong Match
- 75–89: Solid Match
- 60–74: Partial Match
- < 60: Weak Match
6.3 Recommendation Logic
- Advance: score ≥ 80, must-haves met, no significant risk flags.
- HM Review: score ~65–79, or mixed signals, or unusual-but-promising paths.
- Do Not Advance: missing non-negotiable + below threshold, or very low score with significant risk.
7. Ask-Next Prompts & HM Summaries
7.1 Ask-Next for Recruiter Screens
- Experience depth: “Walk me through your work on X; what were you directly responsible for?”
- Tenure: “I noticed a few shorter roles in [years]. Can you share the context?”
- Skills: “Tell me about a recent project where you used [tool/skill] end-to-end.”
- Risk flags: “I see a gap between [year] and [year]. What were you focused on then?”
7.2 HM Preview Summary
- 1–2 sentence overview of level + scope.
- Top 3 strengths relative to the rubric.
- Top 1–2 concerns / open questions.
- Overall band and score.
8. Calibration & Governance
8.1 Shadow Mode
- Run silently alongside manual decisions.
- Compare agent recommendations vs actual outcomes.
- Collect examples where humans disagree.
8.2 Override Tracking
- Every override requires a short note.
- Track override rate by role, recruiter, and segment.
- Clusters of overrides signal rubric misalignment or training gaps.
8.3 Rubric Reviews
- Quarterly reviews with TA Ops + HMs to tune thresholds and weights.
- Use real candidate samples where the rubric under/over-scored.
- Update documentation and communicate changes.
8.4 Fairness & Bias Monitoring
- Monitor pass-through and override patterns across segments.
- Use findings to refine rubrics and ask-next prompts — not hard-code stereotypes.
9. ATS Integration & Outputs
Screening A[i]gent is ATS-agnostic; the core requirement is a handful of stable fields.
9.1 Example ATS Fields
| Field Name | Type | Description |
|---|---|---|
| screening_score_overall | Number | 0–100 hybrid score. |
| screening_band_overall | Picklist | Strong / Solid / Partial / Weak. |
| screening_band_experience | Picklist | High / Medium / Low. |
| screening_band_tenure | Picklist | High / Medium / Low. |
| screening_band_competitor_fit | Picklist | High / Medium / Low. |
| screening_band_skills | Picklist | High / Medium / Low. |
| screening_risk_flags | Multi-select | Short tenures, gaps, title mismatch, etc. |
| screening_recommendation | Picklist | Advance / HM Review / Do Not Advance. |
| screening_override_flag | Boolean | Yes if recruiter overrode recommendation. |
| screening_decision_at | Datetime | Timestamp of final decision. |
9.2 Implementation Notes
- Start minimal; add fields after adoption.
- Hide internal-only fields from HMs if they add noise.
- Write fields only after recruiter confirmation.
10. Metrics & Metrics A[i]gent Integration
10.1 Key Screening Metrics
- Application → Screen pass-through (by source, role, recruiter).
- Distribution of screening bands.
- Override rate and direction (Advance vs Do Not Advance overrides).
- Time from application to screening decision.
- Downstream performance: do Strong/Solid candidates reach offers?
10.2 Event Mapping
| Screening Event | Metrics Dictionary Field | Used For |
|---|---|---|
| screening_started_at | screening_start_time | Turnaround time. |
| screening_decision_at | screening_decision_time | Lead time to decision. |
| screening_recommendation | screening_recommendation | Quality/fairness analysis. |
| screening_override_flag | screening_override_flag | Override patterns. |
| screening_score_overall | screening_score_overall | Score distribution + correlation. |
Appendix A – Bands & Point Ranges
| Band | Description | Typical Points (per category) |
|---|---|---|
| High | Clear, strong alignment with rubric expectations. | ≈ 80–100% of category max. |
| Medium | Good but not perfect; some gaps or trade-offs. | ≈ 50–79% of category max. |
| Low | Limited alignment or missing ingredients. | ≈ 0–49% of category max. |
| No Risk | No meaningful risk flags detected. | 0 penalty. |
| Mild Risk | One or two flags with plausible explanations. | ≈ -5 penalty. |
| Significant Risk | Multiple/severe flags warrant caution. | ≈ -10 to -20 penalty. |
Exact ranges can be tuned per company and role. The key is stable, documented bands and transparent weights.
Appendix B – Example Role Profiles
B.1 AE / Account Executive
- Experience: quota-carrying SaaS sales, similar ACV and cycle.
- Tenure: 2–3 year stints; some startup volatility acceptable.
- Skills: MEDDIC/BANT, CRM expertise, full-funnel sales motion.
B.2 SDR / BDR
- Experience: outbound prospecting, high-volume outreach.
- Signals: activity metrics + conversion to meetings/pipeline.
- Fit: similar ICP and sales environment preferred.
B.3 Recruiter / Talent Partner
- Experience: end-to-end recruiting for similar roles and stakeholders.
- Skills: sourcing, stakeholder management, ATS hygiene, calibration.
B.4 Software Engineer
- Experience: languages/frameworks/systems aligned to role.
- Signals: shipped features, ownership, depth vs breadth.
- Evidence: projects, OSS, measurable impact.
Let's Connect
Open to roles in People Analytics, Talent Intelligence, People Ops, and Recruiting Operations — especially teams building internal AI capabilities.