Runtime Evidence Discovery
Static analysis tells you what the code can do. Runtime evidence tells you what it actually does. Legacy systems accumulate dead code, rarely-used features, and hot paths that are not obvious from reading source files. Runtime evidence grounds your modernization decisions in observed reality rather than assumptions.
Why Runtime Evidence Matters
Section titled “Why Runtime Evidence Matters”| Question | Static Analysis | Runtime Evidence |
|---|---|---|
| Is this function used? | ”It’s imported somewhere" | "It was called 47,000 times yesterday” |
| What’s the hot path? | ”This file has high complexity" | "This function handles 73% of all requests” |
| Are there N+1 queries? | ”The ORM could generate them" | "This endpoint generates 312 queries per request” |
| Is this code dead? | ”No references found" | "Zero executions in 90 days of production logs” |
| What’s the execution order? | ”These hooks are registered" | "Hook A fires before Hook C, Hook B is skipped for this entity type” |
Static analysis is necessary but insufficient. Runtime evidence fills the gaps that static analysis cannot reach, especially in framework-driven systems where execution flow is determined by configuration, conventions, and runtime state.
Evidence Hierarchy
Section titled “Evidence Hierarchy”Not all evidence carries equal weight. Rank evidence by proximity to production reality:
| Tier | Source | Confidence | Cost to Obtain |
|---|---|---|---|
| 1. Production traces | APM tools, distributed tracing | Highest | Low (if instrumented) |
| 2. Production logs | Structured application logs | High | Low (if logging exists) |
| 3. Staging traces | Same code, synthetic traffic | Medium-High | Medium |
| 4. Integration test traces | Test suite execution profiles | Medium | Medium |
| 5. Unit test coverage | Code coverage reports | Low-Medium | Low |
| 6. Static inference | Call graph analysis, type analysis | Lowest | Low |
Always prefer higher-tier evidence. When Tier 1 data is available, it overrides inferences from lower tiers.
When Lower Tiers Are All You Have
Section titled “When Lower Tiers Are All You Have”Not every legacy system has production tracing. When instrumentation is absent:
- Add lightweight instrumentation — structured logging at entry/exit of key functions
- Run in staging with realistic traffic patterns for a defined period
- Use integration test profiles as a proxy for production behavior
- Document the evidence tier so downstream decisions account for confidence level
Record the evidence tier in complexity.json or domains.json so AI agents and reviewers know how much to trust the data.
Hot Path Identification
Section titled “Hot Path Identification”A hot path is the code that handles the majority of production traffic. The Pareto principle applies: typically 20% of functions handle 80% of requests.
Discovery Methods
Section titled “Discovery Methods”Application Performance Monitoring tools (Datadog, New Relic, OpenTelemetry) collect execution traces continuously. Extract hot paths from:
- Top endpoints by request volume — which API routes handle the most traffic
- Top functions by execution time — which functions consume the most CPU
- Top database queries by frequency — which queries run most often
- Critical path analysis — the slowest chain of function calls in a typical request
This is the gold standard when available.
Parse structured logs to reconstruct traffic patterns:
- Count log entries per endpoint/function per time period
- Identify seasonal patterns (end-of-month spikes, daily peaks)
- Correlate error rates with specific code paths
- Build frequency tables for different request types
Less precise than APM but widely available.
Enable slow query logging and general query logging (temporarily) to see actual database access patterns:
- Which tables are queried most frequently
- Which queries are slowest
- Which queries are repeated (potential N+1)
- Which indexes are actually used
Database-level evidence often reveals performance characteristics invisible in application code.
Applying Hot Path Data to Migration
Section titled “Applying Hot Path Data to Migration”| Hot Path Status | Migration Priority | Reasoning |
|---|---|---|
| High traffic, simple code | Extract early | Maximum user impact, low extraction effort |
| High traffic, complex code | Extract mid-phase | High value but needs careful handling |
| Low traffic, simple code | Extract late or skip | Low ROI for extraction effort |
| Low traffic, complex code | Consider not migrating | High effort, low user impact — challenge the requirement |
Record hot path data in complexity.json as runtimeMetrics.requestVolume and runtimeMetrics.executionTime per component.
N+1 Query Detection
Section titled “N+1 Query Detection”Legacy ORMs are notorious for generating N+1 query patterns — one query to fetch a list, then N additional queries to fetch related data for each item. These are invisible in code review but devastating in production.
What N+1 Looks Like
Section titled “What N+1 Looks Like”# Visible in code (looks fine):invoices = get_all_invoices(filters)for invoice in invoices: customer = get_customer(invoice.customer_id) # N queries items = get_invoice_items(invoice.id) # N more queries
# Actual queries generated:SELECT * FROM invoice WHERE ... -- 1 querySELECT * FROM customer WHERE name = 'ABC' -- query 2SELECT * FROM invoice_item WHERE parent = 'INV-001' -- query 3SELECT * FROM customer WHERE name = 'DEF' -- query 4SELECT * FROM invoice_item WHERE parent = 'INV-002' -- query 5... (continues for every invoice)For 100 invoices, this generates 201 queries instead of 3 (with proper joins).
Detection Methods
Section titled “Detection Methods”| Method | Precision | Setup Effort |
|---|---|---|
| Query count per request | High | Add middleware that counts queries |
| Query pattern grouping | High | Log queries, group by template, count |
| ORM query logging | Medium | Enable ORM debug logging |
| Database slow query log | Medium | Enable at database level |
| Static analysis of loops | Low | AST scan for DB calls inside loops |
Recording for the Spec
Section titled “Recording for the Spec”N+1 patterns should be recorded in complexity.json as a complexity factor:
{ "component": "invoice-list", "runtimeMetrics": { "queriesPerRequest": { "p50": 47, "p95": 312, "p99": 891 }, "n1Patterns": [ { "loop": "invoice iteration", "queries": ["customer lookup", "item fetch"], "estimatedExtraQueries": "2N where N = invoice count" } ] }}When migrating, the new implementation should fix N+1 patterns — this is one of the few areas where intentionally deviating from legacy behavior is correct.
Medallion Data Quality
Section titled “Medallion Data Quality”Borrowed from data engineering, the medallion architecture applies Bronze/Silver/Gold quality tiers to runtime evidence.
Raw, unfiltered runtime data.
- Raw profiler output files
- Unstructured log dumps
- Database query logs with no context
- Coverage reports without analysis
Useful for archival. Not directly actionable.
Deduplicated, correlated, enriched.
- Query patterns grouped by template (not individual queries)
- Request traces correlated with business operations
- Error rates per module per time period
- Coverage mapped to business capabilities (not just files)
Actionable for prioritization. The minimum quality for migration decisions.
Validated, business-meaning attached, decision-ready.
- “The Invoicing module handles 47,000 requests/day with a P95 latency of 230ms”
- “The Tax Calculator has 3 N+1 patterns affecting P99 by 400ms”
- “The GL Posting path executes 15 hooks in sequence; 4 are no-ops for the common case”
- Directly maps to
complexity.jsonanddomains.jsonfields
This is what feeds the specification and guides extraction decisions.
Quality Progression
Section titled “Quality Progression”Collect Bronze → transform to Silver → derive Gold. Document which tier your evidence sits at so decisions account for confidence.
Dead Code Discovery
Section titled “Dead Code Discovery”Code that exists in the repository but never executes in production. Migrating dead code wastes effort and adds complexity to the new system for zero value.
Discovery Techniques
Section titled “Discovery Techniques”| Technique | Coverage | Confidence |
|---|---|---|
| Production code coverage (continuous profiling) | Definitive | Very High — 90+ days of production data |
| Feature flag analysis | Good for flagged code | High — flag off for 6+ months = dead |
| Log-based detection | Entry points only | Medium — absence of evidence is not evidence of absence |
| Static unreachability | Call graph dead ends | Low — dynamic dispatch and reflection may reach “unreachable” code |
| Git blame age | Correlation only | Very Low — old code may still be critical |
Decision Framework
Section titled “Decision Framework”| Evidence | Action |
|---|---|
| Zero production executions in 90+ days | Do not migrate. Mark as dead in complexity.json |
| Executions only during specific events (year-end, onboarding) | Investigate the event. Migrate if the event is needed |
| Low but non-zero executions | Migrate but deprioritize (late phase in extraction-plan.json) |
| High executions | Migrate in an appropriate phase based on priority |
Recording Dead Code
Section titled “Recording Dead Code”{ "component": "legacy-report-builder", "runtimeMetrics": { "lastExecution": null, "executionsLast90Days": 0, "evidenceTier": "production-traces", "recommendation": "do-not-migrate" }}Implicit Behavior Mapping
Section titled “Implicit Behavior Mapping”Framework-driven systems execute code through hooks, events, conventions, and middleware — paths that are invisible in static call graphs. Runtime tracing reveals the actual execution order.
Common Implicit Patterns
Section titled “Common Implicit Patterns”| Pattern | Framework Examples | Detection |
|---|---|---|
| Lifecycle hooks | on_create, validate, on_submit | Trace hook dispatch at runtime |
| Middleware chains | Express middleware, Django middleware | Log middleware execution order |
| Event listeners | EventEmitter.on('invoice.created') | Trace event dispatch |
| Convention-based routing | InvoiceController#show from /invoices/:id | Trace request → handler mapping |
| Dynamic dispatch | getattr(doc, method_name)() | Runtime method resolution tracing |
| Plugin systems | Plugin registry, service providers | Log plugin load and invocation order |
Building the Execution Map
Section titled “Building the Execution Map”- Instrument the framework’s dispatch mechanism — add tracing at the point where hooks/events/middleware are invoked
- Run representative traffic — a mix of common and edge-case operations
- Record the execution chain for each operation type
- Compare to static analysis — identify gaps where runtime behavior differs from call graph expectations
ERPNext Example
Section titled “ERPNext Example”ERPNext’s hooks system means that validate() and on_submit() methods fire implicitly when a document is saved or submitted. For a Sales Invoice submission:
User clicks "Submit" │ ▼frappe.handler.submit() │ ├──▶ SalesInvoice.validate() │ ├──▶ AccountsController.validate() │ │ ├──▶ validate_posting_time() │ │ ├──▶ validate_party() │ │ ├──▶ validate_currency() │ │ ├──▶ calculate_taxes_and_totals() ← 15 sub-calls │ │ └──▶ validate_accounts() │ └──▶ SalesInvoice.validate_specific() │ ├──▶ SalesInvoice.on_submit() │ ├──▶ make_gl_entries() ← GL posting │ ├──▶ update_stock_ledger() ← Stock impact │ ├──▶ update_billing_status() ← Cross-doc update │ └──▶ send_notification() ← Side effect │ └──▶ after_submit hooks (from hooks.py) └──▶ custom app hooks (if any)Static analysis can find that validate() exists on AccountsController. It cannot determine that calculate_taxes_and_totals() makes 15 sub-calls or that update_stock_ledger() fires during invoice submission. Runtime tracing reveals the complete chain.
This execution map is critical for extraction planning: you cannot extract Invoicing without accounting for its runtime dependency on Stock Ledger updates and GL Entry posting.
Feeding the Spec
Section titled “Feeding the Spec”Runtime evidence enriches multiple spec files:
| Evidence Type | Spec File | Specific Fields |
|---|---|---|
| Hot path data | complexity.json | runtimeMetrics.requestVolume, executionTime |
| N+1 patterns | complexity.json | runtimeMetrics.queriesPerRequest, n1Patterns |
| Dead code flags | complexity.json | runtimeMetrics.executionsLast90Days, recommendation |
| Evidence tier | complexity.json | runtimeMetrics.evidenceTier |
| Execution chains | domains.json | contexts[].implicitDependencies[] |
| Cross-boundary calls | domains.json | relationships[] (runtime-validated coupling) |
See Also
Section titled “See Also”- Codebase Analysis — Static analysis that runtime evidence validates
- complexity.json Specification — Schema reference for runtime metrics
- Domain Decomposition — How runtime evidence refines domain boundaries
- Complexity Heatmap — Interactive visualization that can overlay runtime data