Skip to content

Runtime Evidence Discovery

Static analysis tells you what the code can do. Runtime evidence tells you what it actually does. Legacy systems accumulate dead code, rarely-used features, and hot paths that are not obvious from reading source files. Runtime evidence grounds your modernization decisions in observed reality rather than assumptions.

QuestionStatic AnalysisRuntime Evidence
Is this function used?”It’s imported somewhere""It was called 47,000 times yesterday”
What’s the hot path?”This file has high complexity""This function handles 73% of all requests”
Are there N+1 queries?”The ORM could generate them""This endpoint generates 312 queries per request”
Is this code dead?”No references found""Zero executions in 90 days of production logs”
What’s the execution order?”These hooks are registered""Hook A fires before Hook C, Hook B is skipped for this entity type”

Static analysis is necessary but insufficient. Runtime evidence fills the gaps that static analysis cannot reach, especially in framework-driven systems where execution flow is determined by configuration, conventions, and runtime state.

Not all evidence carries equal weight. Rank evidence by proximity to production reality:

TierSourceConfidenceCost to Obtain
1. Production tracesAPM tools, distributed tracingHighestLow (if instrumented)
2. Production logsStructured application logsHighLow (if logging exists)
3. Staging tracesSame code, synthetic trafficMedium-HighMedium
4. Integration test tracesTest suite execution profilesMediumMedium
5. Unit test coverageCode coverage reportsLow-MediumLow
6. Static inferenceCall graph analysis, type analysisLowestLow

Always prefer higher-tier evidence. When Tier 1 data is available, it overrides inferences from lower tiers.

Not every legacy system has production tracing. When instrumentation is absent:

  1. Add lightweight instrumentation — structured logging at entry/exit of key functions
  2. Run in staging with realistic traffic patterns for a defined period
  3. Use integration test profiles as a proxy for production behavior
  4. Document the evidence tier so downstream decisions account for confidence level

Record the evidence tier in complexity.json or domains.json so AI agents and reviewers know how much to trust the data.

A hot path is the code that handles the majority of production traffic. The Pareto principle applies: typically 20% of functions handle 80% of requests.

Application Performance Monitoring tools (Datadog, New Relic, OpenTelemetry) collect execution traces continuously. Extract hot paths from:

  • Top endpoints by request volume — which API routes handle the most traffic
  • Top functions by execution time — which functions consume the most CPU
  • Top database queries by frequency — which queries run most often
  • Critical path analysis — the slowest chain of function calls in a typical request

This is the gold standard when available.

Hot Path StatusMigration PriorityReasoning
High traffic, simple codeExtract earlyMaximum user impact, low extraction effort
High traffic, complex codeExtract mid-phaseHigh value but needs careful handling
Low traffic, simple codeExtract late or skipLow ROI for extraction effort
Low traffic, complex codeConsider not migratingHigh effort, low user impact — challenge the requirement

Record hot path data in complexity.json as runtimeMetrics.requestVolume and runtimeMetrics.executionTime per component.

Legacy ORMs are notorious for generating N+1 query patterns — one query to fetch a list, then N additional queries to fetch related data for each item. These are invisible in code review but devastating in production.

# Visible in code (looks fine):
invoices = get_all_invoices(filters)
for invoice in invoices:
customer = get_customer(invoice.customer_id) # N queries
items = get_invoice_items(invoice.id) # N more queries
# Actual queries generated:
SELECT * FROM invoice WHERE ... -- 1 query
SELECT * FROM customer WHERE name = 'ABC' -- query 2
SELECT * FROM invoice_item WHERE parent = 'INV-001' -- query 3
SELECT * FROM customer WHERE name = 'DEF' -- query 4
SELECT * FROM invoice_item WHERE parent = 'INV-002' -- query 5
... (continues for every invoice)

For 100 invoices, this generates 201 queries instead of 3 (with proper joins).

MethodPrecisionSetup Effort
Query count per requestHighAdd middleware that counts queries
Query pattern groupingHighLog queries, group by template, count
ORM query loggingMediumEnable ORM debug logging
Database slow query logMediumEnable at database level
Static analysis of loopsLowAST scan for DB calls inside loops

N+1 patterns should be recorded in complexity.json as a complexity factor:

{
"component": "invoice-list",
"runtimeMetrics": {
"queriesPerRequest": {
"p50": 47,
"p95": 312,
"p99": 891
},
"n1Patterns": [
{
"loop": "invoice iteration",
"queries": ["customer lookup", "item fetch"],
"estimatedExtraQueries": "2N where N = invoice count"
}
]
}
}

When migrating, the new implementation should fix N+1 patterns — this is one of the few areas where intentionally deviating from legacy behavior is correct.

Borrowed from data engineering, the medallion architecture applies Bronze/Silver/Gold quality tiers to runtime evidence.

Raw, unfiltered runtime data.

  • Raw profiler output files
  • Unstructured log dumps
  • Database query logs with no context
  • Coverage reports without analysis

Useful for archival. Not directly actionable.

Collect Bronze → transform to Silver → derive Gold. Document which tier your evidence sits at so decisions account for confidence.

Code that exists in the repository but never executes in production. Migrating dead code wastes effort and adds complexity to the new system for zero value.

TechniqueCoverageConfidence
Production code coverage (continuous profiling)DefinitiveVery High — 90+ days of production data
Feature flag analysisGood for flagged codeHigh — flag off for 6+ months = dead
Log-based detectionEntry points onlyMedium — absence of evidence is not evidence of absence
Static unreachabilityCall graph dead endsLow — dynamic dispatch and reflection may reach “unreachable” code
Git blame ageCorrelation onlyVery Low — old code may still be critical
EvidenceAction
Zero production executions in 90+ daysDo not migrate. Mark as dead in complexity.json
Executions only during specific events (year-end, onboarding)Investigate the event. Migrate if the event is needed
Low but non-zero executionsMigrate but deprioritize (late phase in extraction-plan.json)
High executionsMigrate in an appropriate phase based on priority
{
"component": "legacy-report-builder",
"runtimeMetrics": {
"lastExecution": null,
"executionsLast90Days": 0,
"evidenceTier": "production-traces",
"recommendation": "do-not-migrate"
}
}

Framework-driven systems execute code through hooks, events, conventions, and middleware — paths that are invisible in static call graphs. Runtime tracing reveals the actual execution order.

PatternFramework ExamplesDetection
Lifecycle hookson_create, validate, on_submitTrace hook dispatch at runtime
Middleware chainsExpress middleware, Django middlewareLog middleware execution order
Event listenersEventEmitter.on('invoice.created')Trace event dispatch
Convention-based routingInvoiceController#show from /invoices/:idTrace request → handler mapping
Dynamic dispatchgetattr(doc, method_name)()Runtime method resolution tracing
Plugin systemsPlugin registry, service providersLog plugin load and invocation order
  1. Instrument the framework’s dispatch mechanism — add tracing at the point where hooks/events/middleware are invoked
  2. Run representative traffic — a mix of common and edge-case operations
  3. Record the execution chain for each operation type
  4. Compare to static analysis — identify gaps where runtime behavior differs from call graph expectations

ERPNext’s hooks system means that validate() and on_submit() methods fire implicitly when a document is saved or submitted. For a Sales Invoice submission:

User clicks "Submit"
frappe.handler.submit()
├──▶ SalesInvoice.validate()
│ ├──▶ AccountsController.validate()
│ │ ├──▶ validate_posting_time()
│ │ ├──▶ validate_party()
│ │ ├──▶ validate_currency()
│ │ ├──▶ calculate_taxes_and_totals() ← 15 sub-calls
│ │ └──▶ validate_accounts()
│ └──▶ SalesInvoice.validate_specific()
├──▶ SalesInvoice.on_submit()
│ ├──▶ make_gl_entries() ← GL posting
│ ├──▶ update_stock_ledger() ← Stock impact
│ ├──▶ update_billing_status() ← Cross-doc update
│ └──▶ send_notification() ← Side effect
└──▶ after_submit hooks (from hooks.py)
└──▶ custom app hooks (if any)

Static analysis can find that validate() exists on AccountsController. It cannot determine that calculate_taxes_and_totals() makes 15 sub-calls or that update_stock_ledger() fires during invoice submission. Runtime tracing reveals the complete chain.

This execution map is critical for extraction planning: you cannot extract Invoicing without accounting for its runtime dependency on Stock Ledger updates and GL Entry posting.

Runtime evidence enriches multiple spec files:

Evidence TypeSpec FileSpecific Fields
Hot path datacomplexity.jsonruntimeMetrics.requestVolume, executionTime
N+1 patternscomplexity.jsonruntimeMetrics.queriesPerRequest, n1Patterns
Dead code flagscomplexity.jsonruntimeMetrics.executionsLast90Days, recommendation
Evidence tiercomplexity.jsonruntimeMetrics.evidenceTier
Execution chainsdomains.jsoncontexts[].implicitDependencies[]
Cross-boundary callsdomains.jsonrelationships[] (runtime-validated coupling)