During modernization, legacy and modern code coexist. Changes carry more risk than in a stable codebase — a refactored service might break an integration point nobody documented, or a migrated module might subtly change behavior that downstream systems depend on.
Quality gates create checkpoints that catch these problems before they reach production.
Every pull request over 200 lines of code should include a structured header. This is not bureaucracy — it is communication. Reviewers spend less time understanding the change and more time evaluating it.
## PR Contract
**Intent:** Migrate invoice PDF generation from legacy template engine to modern renderer
**Proof:** 14 tests — 8 unit (renderer logic), 4 integration (PDF output), 2 parity (legacy vs modern output)
**Risk:** Medium — PDF layout differences could affect downstream print workflows
**Review Focus:** Parity test assertions in test/parity/invoice-pdf.test.ts
“Extract payment processing into standalone service”
Proof
Test count and what they verify
”12 tests: 6 unit, 4 integration, 2 regression”
Risk
Low/Medium/High with explanation
”High — changes database schema with rollback migration”
Review Focus
Where human reviewers should spend time
”Business logic in src/services/payment.ts lines 45-120”
Related PRs
Dependency chain or “Standalone"
"#201, #203 (must merge in order)”
The contract helps reviewers prioritize. A “Low risk, Standalone” PR with 90% test coverage needs a lighter touch than “High risk” with schema changes and a dependency chain.
PRs over 500 lines of changed code should be split into a chain. Each PR in the chain targets the previous branch, and each covers one bounded concern with its own tests.
The chain follows dependency direction: types first (no dependencies), then infrastructure (depends on types), then services (depends on infrastructure), then activation (depends on everything).
Security patterns (injection, auth bypass, secrets in code)
Code quality (naming, dead code, unused imports)
Pattern compliance (project conventions)
Common bugs (off-by-one, null dereference, race conditions)
Humans Evaluate
Intent: Is this the right thing to build?
Architecture: Does it fit the system’s direction?
Risk: What could go wrong in production?
Business logic: Does the domain behavior match reality?
Test quality: Are the tests testing the right things?
AI review tools are good at finding things that are objectively wrong. Humans are needed for things that require judgment about what should be built and how it fits the bigger picture.
When reviewing, categorize findings by type and severity. This creates a shared vocabulary for the team and sets clear expectations about what blocks a merge.
Category
Severity
Examples
Blocks Merge?
Bug
Critical / High
Logic error, race condition, unhandled edge case
Yes
Security
Critical / High
Auth bypass, injection, secrets exposure, SSRF
Yes
Architecture
Medium / High
Boundary violation, tight coupling, DDD drift
Yes (if high)
Performance
Medium
N+1 queries, memory leak, unbounded collection
Depends
Pattern
Low / Medium
Style inconsistency, convention violation
No
Quality
Low
Missing test, unclear name, dead code
No
Severity threshold: Critical and High findings must be resolved before merge. Medium findings should be addressed or tracked. Low findings are suggestions.
These are starting points. Adjust targets based on your team’s velocity and risk tolerance. The point is to measure — without metrics, “quality” is just a feeling.