Structured targets matter
Teams that had sequenced extraction targets (which module first, what depends on what) progressed faster than teams exploring freely. This shaped extraction-plan.json.
ModernizeSpec is an evolving specification. We are currently running real-world experiments to validate and refine it — and we update the spec as we learn.
PearlThoughts has deployed 7 intern teams working in parallel on ERPNext modernization. Each team takes a different approach: RAG-assisted code understanding, AST-based transpilation, graph-augmented retrieval, and direct extraction with parity testing.
The goal is not to produce a finished migration. The goal is to learn what information AI agents actually need when working with a large legacy codebase — and encode those needs into the specification.
Structured targets matter
Teams that had sequenced extraction targets (which module first, what depends on what) progressed faster than teams exploring freely. This shaped extraction-plan.json.
Parity needs baselines
Self-reported accuracy without baseline snapshots creates false confidence. This led to parity-tests.json requiring captured behavior, not just claims.
Progress tracking is essential
Without shared state, multiple teams converged on the same module unknowingly. This drove migration-state.json with per-context progress and blockers.
Multiple approaches coexist
RAG, AST parsing, direct extraction, and hybrid approaches all produced useful output. The spec is deliberately approach-agnostic — domains.json works with any toolchain.
Every lesson gets encoded:
| What We Observe | How It Shapes ModernizeSpec |
|---|---|
| Teams need to know where to start | complexity.json scores hotspots so agents prioritize correctly |
| Teams duplicate effort without coordination | migration-state.json tracks who is working on what |
| Teams need to verify behavioral equivalence | parity-tests.json defines test structure with baseline snapshots |
| Teams need extraction order | extraction-plan.json sequences phases with dependency awareness |
| Different AI toolchains need the same context | The spec is tool-agnostic JSON — any agent reads it |
The experiments are ongoing. As we gather more data, we will:
The specification is v0.1 precisely because we believe real-world usage should shape the standard, not theoretical design. We ship, learn, and iterate.