Runtime Evidence Discovery

Static analysis tells you what the code can do. Runtime evidence tells you what it actually does. Legacy systems accumulate dead code, rarely-used features, and hot paths that are not obvious from reading source files. Runtime evidence grounds your modernization decisions in observed reality rather than assumptions.

Why Runtime Evidence Matters

Question	Static Analysis	Runtime Evidence
Is this function used?	”It’s imported somewhere"	"It was called 47,000 times yesterday”
What’s the hot path?	”This file has high complexity"	"This function handles 73% of all requests”
Are there N+1 queries?	”The ORM could generate them"	"This endpoint generates 312 queries per request”
Is this code dead?	”No references found"	"Zero executions in 90 days of production logs”
What’s the execution order?	”These hooks are registered"	"Hook A fires before Hook C, Hook B is skipped for this entity type”

Static analysis is necessary but insufficient. Runtime evidence fills the gaps that static analysis cannot reach, especially in framework-driven systems where execution flow is determined by configuration, conventions, and runtime state.

Evidence Hierarchy

Not all evidence carries equal weight. Rank evidence by proximity to production reality:

Tier	Source	Confidence	Cost to Obtain
1. Production traces	APM tools, distributed tracing	Highest	Low (if instrumented)
2. Production logs	Structured application logs	High	Low (if logging exists)
3. Staging traces	Same code, synthetic traffic	Medium-High	Medium
4. Integration test traces	Test suite execution profiles	Medium	Medium
5. Unit test coverage	Code coverage reports	Low-Medium	Low
6. Static inference	Call graph analysis, type analysis	Lowest	Low

Always prefer higher-tier evidence. When Tier 1 data is available, it overrides inferences from lower tiers.

When Lower Tiers Are All You Have

Not every legacy system has production tracing. When instrumentation is absent:

Add lightweight instrumentation — structured logging at entry/exit of key functions
Run in staging with realistic traffic patterns for a defined period
Use integration test profiles as a proxy for production behavior
Document the evidence tier so downstream decisions account for confidence level

Record the evidence tier in complexity.json or domains.json so AI agents and reviewers know how much to trust the data.

Hot Path Identification

A hot path is the code that handles the majority of production traffic. The Pareto principle applies: typically 20% of functions handle 80% of requests.

Discovery Methods

Application Performance Monitoring tools (Datadog, New Relic, OpenTelemetry) collect execution traces continuously. Extract hot paths from:

Top endpoints by request volume — which API routes handle the most traffic
Top functions by execution time — which functions consume the most CPU
Top database queries by frequency — which queries run most often
Critical path analysis — the slowest chain of function calls in a typical request

This is the gold standard when available.

Applying Hot Path Data to Migration

Hot Path Status	Migration Priority	Reasoning
High traffic, simple code	Extract early	Maximum user impact, low extraction effort
High traffic, complex code	Extract mid-phase	High value but needs careful handling
Low traffic, simple code	Extract late or skip	Low ROI for extraction effort
Low traffic, complex code	Consider not migrating	High effort, low user impact — challenge the requirement

Record hot path data in complexity.json as runtimeMetrics.requestVolume and runtimeMetrics.executionTime per component.

N+1 Query Detection

Legacy ORMs are notorious for generating N+1 query patterns — one query to fetch a list, then N additional queries to fetch related data for each item. These are invisible in code review but devastating in production.

What N+1 Looks Like

# Visible in code (looks fine):
invoices = get_all_invoices(filters)
for invoice in invoices:
    customer = get_customer(invoice.customer_id)    # N queries
    items = get_invoice_items(invoice.id)            # N more queries

# Actual queries generated:
SELECT * FROM invoice WHERE ...                      -- 1 query
SELECT * FROM customer WHERE name = 'ABC'            -- query 2
SELECT * FROM invoice_item WHERE parent = 'INV-001'  -- query 3
SELECT * FROM customer WHERE name = 'DEF'            -- query 4
SELECT * FROM invoice_item WHERE parent = 'INV-002'  -- query 5
... (continues for every invoice)

For 100 invoices, this generates 201 queries instead of 3 (with proper joins).

Detection Methods

Method	Precision	Setup Effort
Query count per request	High	Add middleware that counts queries
Query pattern grouping	High	Log queries, group by template, count
ORM query logging	Medium	Enable ORM debug logging
Database slow query log	Medium	Enable at database level
Static analysis of loops	Low	AST scan for DB calls inside loops

Recording for the Spec

N+1 patterns should be recorded in complexity.json as a complexity factor:

{
  "component": "invoice-list",
  "runtimeMetrics": {
    "queriesPerRequest": {
      "p50": 47,
      "p95": 312,
      "p99": 891
    },
    "n1Patterns": [
      {
        "loop": "invoice iteration",
        "queries": ["customer lookup", "item fetch"],
        "estimatedExtraQueries": "2N where N = invoice count"
      }
    ]
  }
}

When migrating, the new implementation should fix N+1 patterns — this is one of the few areas where intentionally deviating from legacy behavior is correct.

Medallion Data Quality

Borrowed from data engineering, the medallion architecture applies Bronze/Silver/Gold quality tiers to runtime evidence.

Raw, unfiltered runtime data.

Raw profiler output files
Unstructured log dumps
Database query logs with no context
Coverage reports without analysis

Useful for archival. Not directly actionable.

Validated, business-meaning attached, decision-ready.

“The Invoicing module handles 47,000 requests/day with a P95 latency of 230ms”
“The Tax Calculator has 3 N+1 patterns affecting P99 by 400ms”
“The GL Posting path executes 15 hooks in sequence; 4 are no-ops for the common case”
Directly maps to complexity.json and domains.json fields

This is what feeds the specification and guides extraction decisions.

Quality Progression

Collect Bronze → transform to Silver → derive Gold. Document which tier your evidence sits at so decisions account for confidence.

Dead Code Discovery

Code that exists in the repository but never executes in production. Migrating dead code wastes effort and adds complexity to the new system for zero value.

Discovery Techniques

Technique	Coverage	Confidence
Production code coverage (continuous profiling)	Definitive	Very High — 90+ days of production data
Feature flag analysis	Good for flagged code	High — flag off for 6+ months = dead
Log-based detection	Entry points only	Medium — absence of evidence is not evidence of absence
Static unreachability	Call graph dead ends	Low — dynamic dispatch and reflection may reach “unreachable” code
Git blame age	Correlation only	Very Low — old code may still be critical

Decision Framework

Evidence	Action
Zero production executions in 90+ days	Do not migrate. Mark as dead in `complexity.json`
Executions only during specific events (year-end, onboarding)	Investigate the event. Migrate if the event is needed
Low but non-zero executions	Migrate but deprioritize (late phase in `extraction-plan.json`)
High executions	Migrate in an appropriate phase based on priority

Recording Dead Code

{
  "component": "legacy-report-builder",
  "runtimeMetrics": {
    "lastExecution": null,
    "executionsLast90Days": 0,
    "evidenceTier": "production-traces",
    "recommendation": "do-not-migrate"
  }
}

Implicit Behavior Mapping

Framework-driven systems execute code through hooks, events, conventions, and middleware — paths that are invisible in static call graphs. Runtime tracing reveals the actual execution order.

Common Implicit Patterns

Pattern	Framework Examples	Detection
Lifecycle hooks	`on_create`, `validate`, `on_submit`	Trace hook dispatch at runtime
Middleware chains	Express middleware, Django middleware	Log middleware execution order
Event listeners	`EventEmitter.on('invoice.created')`	Trace event dispatch
Convention-based routing	`InvoiceController#show` from `/invoices/:id`	Trace request → handler mapping
Dynamic dispatch	`getattr(doc, method_name)()`	Runtime method resolution tracing
Plugin systems	Plugin registry, service providers	Log plugin load and invocation order

Building the Execution Map

Instrument the framework’s dispatch mechanism — add tracing at the point where hooks/events/middleware are invoked
Run representative traffic — a mix of common and edge-case operations
Record the execution chain for each operation type
Compare to static analysis — identify gaps where runtime behavior differs from call graph expectations

ERPNext Example

ERPNext’s hooks system means that validate() and on_submit() methods fire implicitly when a document is saved or submitted. For a Sales Invoice submission:

User clicks "Submit"
    │
    ▼
frappe.handler.submit()
    │
    ├──▶ SalesInvoice.validate()
    │        ├──▶ AccountsController.validate()
    │        │        ├──▶ validate_posting_time()
    │        │        ├──▶ validate_party()
    │        │        ├──▶ validate_currency()
    │        │        ├──▶ calculate_taxes_and_totals()  ← 15 sub-calls
    │        │        └──▶ validate_accounts()
    │        └──▶ SalesInvoice.validate_specific()
    │
    ├──▶ SalesInvoice.on_submit()
    │        ├──▶ make_gl_entries()               ← GL posting
    │        ├──▶ update_stock_ledger()            ← Stock impact
    │        ├──▶ update_billing_status()          ← Cross-doc update
    │        └──▶ send_notification()              ← Side effect
    │
    └──▶ after_submit hooks (from hooks.py)
             └──▶ custom app hooks (if any)

Static analysis can find that validate() exists on AccountsController. It cannot determine that calculate_taxes_and_totals() makes 15 sub-calls or that update_stock_ledger() fires during invoice submission. Runtime tracing reveals the complete chain.

This execution map is critical for extraction planning: you cannot extract Invoicing without accounting for its runtime dependency on Stock Ledger updates and GL Entry posting.

Feeding the Spec

Runtime evidence enriches multiple spec files:

Evidence Type	Spec File	Specific Fields
Hot path data	`complexity.json`	`runtimeMetrics.requestVolume`, `executionTime`
N+1 patterns	`complexity.json`	`runtimeMetrics.queriesPerRequest`, `n1Patterns`
Dead code flags	`complexity.json`	`runtimeMetrics.executionsLast90Days`, `recommendation`
Evidence tier	`complexity.json`	`runtimeMetrics.evidenceTier`
Execution chains	`domains.json`	`contexts[].implicitDependencies[]`
Cross-boundary calls	`domains.json`	`relationships[]` (runtime-validated coupling)