Why You Need a Vendor Evaluation Rubric
AI procurement in 2026 moves faster than most procurement teams can build evaluation processes for. A rubric makes the evaluation consistent across vendors, makes the decision auditable, and gives stakeholders a shared language. This template is structured for a typical 4-6 week enterprise AI vendor evaluation.
Category 1: Capability (25%)
- Does it solve the specific use case in our environment?
- How does it benchmark against alternatives on the metrics we care about?
- Does it support our data formats, languages, and regional requirements?
- Does it scale to our expected volume?
- Is there a free trial or pilot programme?
Category 2: Security and Compliance (20%)
- SOC 2 Type II report.
- ISO 27001 certification.
- GDPR / CCPA / regional privacy posture.
- Data residency options (US, EU, APAC).
- Does the vendor train on customer data by default? Opt-out path?
- Encryption at rest and in transit.
- Penetration testing cadence and recent findings.
- Subprocessor list and review cadence.
Category 3: Data Handling (15%)
- Where is data stored? Where is it processed?
- Data retention policy.
- Deletion guarantees and verification.
- PII handling specifics.
- Logging of prompts and responses, and access controls on logs.
Category 4: Reliability and Operations (10%)
- Published SLA.
- Incident history (past 12 months).
- Status page transparency.
- Escalation paths and named support contacts.
- Response and resolution time commitments.
Category 5: Total Cost of Ownership (15%)
- Listed pricing for our expected usage volume.
- Hidden fees (support tier, premium features, overage).
- Implementation cost (internal + vendor professional services).
- Year-over-year price escalation clauses.
- Switching cost if we leave.
Category 6: Vendor Stability (10%)
- Funding stage, runway, and recent investors.
- Public revenue or growth indicators.
- Customer reference list (named, willing to talk).
- Employee headcount and retention.
- Public communications about the company's strategy and roadmap.
Category 7: AI-Specific Risk (5%)
- Published responsible AI policy.
- Bias testing methodology and results.
- Incident response plan for AI-specific failures.
- Transparency about model versions and updates.
- Model deprecation policy and migration support.
Scoring Sheet
For each criterion, score 1-5:
1 = Does not meet
2 = Partially meets, significant gaps
3 = Meets minimum standard
4 = Meets fully
5 = Exceeds expectations
Apply category weights, then sum to a 100-point total.
Decision thresholds (typical):
- >= 85: Strong fit, move to contract
- 70-84: Workable with negotiated gaps
- 55-69: Weak fit, evaluate alternatives
- < 55: Do not proceed
Red Flags
- Vendor will not provide SOC 2 report under NDA.
- Vendor trains on customer data by default with no opt-out.
- No published model deprecation policy.
- No named technical contact during evaluation.
- Incident history hidden or sanitised.
- Reluctance to provide customer references.
- "AI" is a marketing label on a non-AI product.
- Pricing only available after a long sales process.
- Unilateral terms of service that override security requirements.
- Aggressive year-over-year escalation clauses (>15%/year) without performance commitments.
Evaluation Workflow
- Week 1: Use this rubric to score 4-6 candidate vendors based on public information and initial sales calls.
- Week 2: Shortlist 2-3 and request SOC 2, security questionnaire, and customer references.
- Week 3-4: Pilot with the top 1-2 candidates on a real use case.
- Week 5: Rescore based on pilot, finalise selection.
- Week 6: Contract negotiation, including any negotiated gaps from the rubric.