Skip to content

Domain Decomposition

Domain decomposition is the process of dividing a monolithic system into bounded contexts — self-contained areas with distinct data ownership, vocabulary, and lifecycle. The output feeds directly into ModernizeSpec’s domains.json, which AI agents use to understand where business boundaries fall in the legacy code.

A bounded context is a region of the codebase where a particular domain model applies consistently. Inside the boundary, terms have precise meanings. Across boundaries, the same word may mean different things.

SignalWhat to Look ForTool
Distinct data ownershipTables/entities not shared with other areasEntity relationship atlas
Separate vocabularyTerms used only within one area of the codeKeyword frequency analysis
Independent lifecycleCode that changes together but independently of other areasGit co-change analysis
Minimal cross-boundary callsFew function calls or imports crossing the proposed boundaryDependency graph
Separate UI sectionsDistinct pages, menus, or navigation groupsUI inventory
  1. Identify candidate boundaries using the signals above
  2. Score coupling between candidates — low coupling confirms the boundary, high coupling suggests merging or different splitting
  3. Validate with domain experts — boundaries should match how the business thinks, not how the code is organized
  4. Record in domains.json with entities, capabilities, and coupling scores

Splitting by Layer

Separating “all controllers” from “all services” from “all repositories” creates distributed monoliths. Split by business capability, not technical layer.

Too Many Contexts

Every entity does not need its own bounded context. Group related entities that share a lifecycle and consistency boundary.

Ignoring Coupling Data

Drawing boundaries on a whiteboard without measuring actual code coupling produces aspirational architecture, not actionable extraction plans.

An aggregate is a cluster of entities treated as a single unit for data changes. The aggregate root is the entry point — all modifications to the cluster go through it. Aggregates define transaction boundaries in the new system.

Look for parent-child table relationships where children cannot exist without the parent:

  • Parent table = aggregate root candidate
  • Child tables (FK to parent, cascade delete) = aggregate members
  • Reference tables (FK but independent lifecycle) = separate aggregate

Example: Invoice (root) owns InvoiceLineItem (member) and TaxDetail (member), but references Customer (separate aggregate).

SizeTypicalRisk
1-3 entitiesHealthyNone
4-7 entitiesAcceptableMonitor for unnecessary coupling
8-15 entitiesLargeConsider splitting into sub-aggregates
16+ entitiesToo largeAlmost certainly hiding multiple concerns

Capability-Driven vs Page-Driven Organization

Section titled “Capability-Driven vs Page-Driven Organization”

Legacy systems, especially web applications, are often organized around UI pages or screens. This creates modules that mix multiple business capabilities because a single page may touch invoicing, payments, customer management, and reporting.

┌─────────────────────────────────────────────────┐
│ "Invoice Page" Module │
│ │
│ ┌───────────┐ ┌───────────┐ ┌────────────────┐ │
│ │ Invoicing │ │ Payments │ │ Customer Info │ │
│ │ Logic │ │ Logic │ │ Display │ │
│ └───────────┘ └───────────┘ └────────────────┘ │
│ ┌───────────┐ ┌───────────┐ │
│ │ Tax Calc │ │ Reporting │ │
│ │ │ │ Summary │ │
│ └───────────┘ └───────────┘ │
└─────────────────────────────────────────────────┘

Five capabilities in one module. Extracting “Invoicing” requires untangling it from four other concerns.

┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Invoicing │ │ Payments │ │ Taxation │
│ │ │ │ │ │
│ Create │ │ Record │ │ Calculate │
│ Submit │ │ Allocate │ │ Apply rules │
│ Cancel │ │ Reconcile │ │ Report │
└──────────────┘ └──────────────┘ └──────────────┘
┌──────────────┐ ┌──────────────┐
│ Customers │ │ Reporting │
│ │ │ │
│ Profile │ │ Generate │
│ Credit limit │ │ Export │
│ History │ │ Schedule │
└──────────────┘ └──────────────┘

Each capability becomes a bounded context with clear data ownership. The “Invoice Page” in the new system assembles data from multiple capabilities at the UI layer.

  1. List every business action the system supports (create invoice, submit payment, apply tax rule)
  2. Group actions by data ownership — which entity does each action primarily modify?
  3. Name the capability using business language, not technical terms
  4. Map existing code to capabilities — a single file may contain code for multiple capabilities
  5. Record the mapping in domains.json as contexts[].capabilities[]

For large codebases, manually classifying every file is impractical. Use automated classification to assign each code artifact to a business domain, then refine manually.

StrategyHow It WorksAccuracyEffort
Namespace/pathUse directory structure as initial classificationLow-MediumAutomated
Keyword matchingMatch function/class names to domain glossariesMediumSemi-automated
Import clusteringGroup files that import each other heavilyMedium-HighAutomated
Co-change analysisFiles that change together in git belong togetherHighAutomated (needs history)
AI-assistedLLM reads code and assigns domain labelsMedium-HighSemi-automated
  1. Start with namespace classificationaccounts/*.py likely belongs to the Accounts domain
  2. Refine with import clustering — files in accounts/ that import primarily from stock/ may actually belong to the Stock domain
  3. Validate with co-change — if accounts/tax_calculator.py always changes alongside selling/invoice.py, they may be the same bounded context
  4. Flag cross-domain files — files imported by 3+ domains are candidates for a Shared Kernel or need splitting

ERPNext’s 521 doctypes are organized into 21 modules, but module boundaries do not align with domain boundaries:

Module BoundaryDomain Reality
accounts/ contains GL Entry, Tax Rule, Payment EntryThese are 3 distinct domains: General Ledger, Taxation, Payments
stock/ contains Stock Entry, Warehouse, ValuationValuation crosses into Accounts (it posts GL entries)
hr/ contains Payroll EntryPayroll crosses into Accounts (it creates journal entries)
selling/ and buying/ are separate modulesBoth use the same transaction controller (accounts_controller.py)

The decomposition revealed 21 modules but approximately 35 distinct bounded contexts when analyzed by actual data ownership and coupling patterns.

In framework-heavy systems, controller classes accumulate cross-cutting concerns through inheritance. A single controller file may contain logic for validation, calculation, persistence, authorization, and event handling — all mixed together.

  1. Extract the inheritance chain — from the most derived class up to the framework base
  2. Classify each method by concern (validation, calculation, persistence, event handling, framework hook)
  3. Identify cross-domain methods — methods that should belong to a different bounded context
  4. Mark framework methods — methods that exist only because the framework requires them (lifecycle hooks, serialization, etc.)

The controller hierarchy for a Sales Invoice spans four levels:

LevelClassLinesMethodsConcern
1Document (Frappe)~2,000~80Framework ORM, permissions, workflow
2TransactionBase~600~25Shared transaction logic
3AccountsController4,412168Financial calculations, GL posting, tax
4SalesInvoice~1,800~60Invoice-specific behavior

Methods in AccountsController (level 3) serve every financial transaction type — Sales Invoice, Purchase Invoice, Payment Entry, Journal Entry. This means extracting “Invoicing” as a bounded context requires untangling shared methods from invoice-specific ones.

Method CategoryExtraction Strategy
Domain-specific (e.g., validate_invoice_dates)Move to the target bounded context directly
Shared calculation (e.g., calculate_taxes_and_totals)Extract as a shared domain service, referenced by multiple contexts
Framework hook (e.g., on_submit, validate)Replace with domain events in the new system
Cross-cutting (e.g., update_stock_ledger)Extract as an integration event between bounded contexts

Domain decomposition produces the core data for domains.json:

Analysis OutputSpec FieldExample
Bounded context listcontexts[]{ "id": "invoicing", "name": "Invoicing" }
Entity groupingcontexts[].entities[]["SalesInvoice", "SalesInvoiceItem", "SalesTaxesAndCharges"]
Capability mappingcontexts[].capabilities[]["create-invoice", "submit-invoice", "cancel-invoice"]
Coupling scorescontexts[].coupling[]{ "target": "general-ledger", "score": 0.72 }
Cross-domain dependenciesrelationships[]{ "from": "invoicing", "to": "taxation", "type": "depends-on" }