How we rebuilt a backend without migrating the data

Keeping the data still

The most dangerous way to rebuild a live system is to change the code and the data at the same time. We did not. When we rebuilt the CORE Home backend under live traffic, we kept one shared data layer underneath both systems and held off on every schema change until after the last tenant had moved.

CORE Home is a multi-tenant real estate platform we rebuilt without taking a single tenant offline, after first stabilizing the system we inherited. The full migration story covers all four decisions that kept tenants running. This post is about one of them, the data layer, because it is the decision that made the whole migration verifiable, the one teams most often get backwards, and the one with the sharpest risks if you get it wrong.

The instinct on a rebuild is to do everything at once: new code, new schema, new data model, all shipped together. It feels efficient. It is the source of most of the risk. Holding the data still while the code changes is what let us prove the rebuild was correct before it carried anyone, and the rest of this post is about why that works, where it is dangerous, and when it does not apply.

Key Takeaways

Changing the code and the data at once gives a failure two possible causes and no way to tell them apart.
Point the old and rebuilt backends at one shared data layer, and read parity between them is automatic.
The same shared layer widens the blast radius for write bugs, so the new backend needs guardrails the old one already had.
Defer schema changes until the cutover is done and a single backend owns the data, then make them with the usual online care.

Why is changing code and data at once so risky?

Because it gives every failure two possible causes. When the new backend returns something wrong and the data shape changed in the same release, you cannot tell whether the bug is in the code or in the migrated data. You are debugging two systems at once, live, with customers on them. That is the position most rebuilds put themselves in, and it is avoidable.

Data migration is the part with the worst track record. Reshaping data and swapping the application in one move is where rebuilds slip their timelines, because the surprises only show up in production, against real records, after the old code is already gone. Keeping the data unchanged removes that whole category of surprise.

Rollback gets worse too. If a release changed the schema and a tenant breaks, undoing the code is not enough, because the data has already moved to a new shape. Now the rollback has to reverse a data change as well, under pressure, often by hand. A migration you cannot cleanly reverse is a migration you are afraid to run, which is how teams end up stalling.

What does sharing one data layer buy you?

Pointing both backends at the same data layer buys you read parity for free. On CORE Home, the legacy and rebuilt backends ran in parallel over one copy of the data, in its existing structure. Because they read the same records, the two systems produced the same answers on the read paths by construction, with nothing to reconcile between them.

This is an established migration technique, sometimes called a parallel run with reconciliation: you exercise the old and new implementations over the same inputs and compare the results, and the comparison is what builds confidence in the new code before it carries anyone.

During the migration, both backends share one unchanged data layer, so parity comes for free. The schema changes only once a single backend is left to own it.

Read parity comes by construction

When there is one copy of the data, there is no second copy to drift. The usual way to run an old and new system side by side is to copy data into the new one and keep the two in sync, which adds a whole class of bugs: lag, conflicts, records that disagree. Sharing the layer removes that class entirely. The rebuilt backend only ever had to be correct for one tenant at a time, against data that was already true.

Reads compare cleanly, writes are verified before the flip

Verification splits by path, and the split matters. On a read, you can run the rebuilt backend against the same records the live system uses and compare the answers, because a read changes nothing. A difference on a read points at the new code. This is where most of the confidence comes from.

A write is different. You cannot run both backends against the same write at once without applying it twice, so writes are not compared live. We verified a tenant against the rebuilt backend before flipping it, and the feature flag then sent that tenant's reads and writes to exactly one backend. There is never a moment when two writers touch the same tenant's rows, which is what keeps the shared layer safe to write through.

What are the risks of sharing a data layer?

The property that makes parity cheap, one copy of the data, is also the one that concentrates risk. Both backends write to the single source of truth, so the newer, less proven one can damage data that tenants still on the old backend depend on. A shared data layer is worth it, but only with eyes open about what it exposes.

First, the framing. Two applications permanently sharing one database is a known anti-pattern, the integration database, because the schema becomes a point of coupling that no one team owns. What makes our case different is that it is temporary, one team owns both backends and the schema, and it ends on a fixed date. The dual-write problem does not apply either, because there is one transactional store rather than two that can drift. Single-writer-per-tenant routing then removes the last race, two code paths mutating the same rows at once. With those cleared, the real risks are about what the new backend can write.

The new backend can write data the old one rejects

Parity by construction is true for the bytes, not for what they mean. If the old system enforces a rule in application code rather than a database constraint, the database will accept a write from the new backend that breaks the rule, and the old backend will then read invalid data. The same trap hides in interpretation: timezones, enum and string mappings, nullability, soft-delete conventions, and derived columns kept in sync by code. Two backends can read identical bytes and mean different things by them.

The defense is to make the invariants explicit. Promote the ones that matter into database constraints, so both backends are bound by them, and round-trip test the rest: write a record with each backend and confirm the other reads it the same way. This is the real work behind the phrase "differences point at the code," and it is only true once you have done it.

A write bug travels further than a read bug

Per-tenant routing contains behavior and reads, but it does not contain corruption of shared state. A localized write bug stays with one tenant. A bad write to a global table, a wrong cascade delete, or a violated cross-tenant invariant reaches tenants who never left the old backend, because they read the same data. The shared layer is exactly what removes that bulkhead.

So the new backend gets guardrails the old one earned over years: a deliberately small write surface, point-in-time backups, extra care around destructive and global operations, and a few canary tenants before any broad rollout. The blast radius for reads is one tenant. For writes to shared data, it is the platform, and the plan has to respect that.

The two systems still share machinery

One database means one connection pool, one set of locks, and one I/O budget. The rebuilt backend's query patterns, a missing index, a long transaction, a heavy job, can degrade the old backend that is still serving live tenants. Database-side logic is the other quiet coupling: triggers, stored procedures, computed columns, and cascade rules the old system relies on and the new code may not know about.

The mitigations are ordinary operations work: separate or capped connection pools, statement and lock timeouts, index parity so the new backend does not surprise the database, and an inventory of the existing database-side logic so the new code neither duplicates nor ignores it. None of it is exotic. All of it has to be deliberate, because the shared layer means one system's mistake is felt by the other.

Why not copy the data into the new system?

Because a copy is a second source of truth, and keeping two of them in sync is its own project. The common alternative to a shared layer is to stand up a fresh datastore for the rebuilt system and replicate into it. It works, but it adds the exact failure modes a careful migration is trying to avoid.

Keeping a copy current usually means a change-data-capture pipeline and the operational weight that comes with it: replication lag, write conflicts, and reconciliation jobs to catch the records that disagree. During a migration that weight is heavier than usual, because a sync bug looks exactly like a parity bug. You are back to a failure with two possible causes, unsure whether the new code is wrong or the copy is stale.

Sharing one layer removes the question. There is no copy, so there is nothing to fall behind, conflict, or reconcile. The rebuilt backend reads the same records the live system wrote a second ago, and any disagreement between the two backends on a read is the code. We took on a new backend without also taking on a data-sync system to keep it fed.

Copying is sometimes unavoidable. If the rebuild moves to a different database engine, or a new store is the actual goal, you cannot share the old layer and you take the sync work on with eyes open. That was not the case on CORE Home, where the existing data was sound and the work was in the code above it, so the cheaper and safer path was to leave the data where it was.

Why defer the schema changes?

Because a schema change is only simple once a single system owns the data. While the old and rebuilt backends both read and write the same structures, any change to those structures has to keep working for both of them at the same time. Every column you add, rename, or drop becomes a negotiation between two readers that understand the data differently.

We wanted schema changes. The rebuild was, in part, about a cleaner data model. We just refused to make those changes while two systems depended on the old shape. So the changes waited. Once the last tenant moved to the rebuilt backend and the legacy stack was retired, exactly one system was left reading and writing the data, and the schema was free to change on our own timeline, without coordinating a second reader.

Free of coordination is not the same as free of cost. A large table still does not change instantly, even with one owner: the wrong DDL can lock it or trigger a full rewrite, so real schema changes use online techniques and batched backfills rather than a single blocking statement. The win from deferring is that you do this hard, irreversible work after the risky part is over, against one backend you fully control, instead of during a live cutover with customers exposed.

What is safe to change while the data is shared?

Mostly additive changes, with a caveat. A new table the legacy system never touches is safe. A new column is usually safe too, as long as the old backend keeps working without it, though "additive" is not automatically free: depending on the database and version, adding a column with a default or a not-null constraint, or building an index the naive way, can still lock or rewrite the table. The changes that clearly have to wait are the breaking ones, renaming or dropping a column, splitting a table, changing a type, because each alters what the old backend reads. The working rule is additive now, breaking later, and breaking changes still go through an expand-and-contract sequence even when one backend is left.

When can you not share the data layer?

Sharing the data layer works as long as the new system can live with the old data shape for the length of the migration. That covers most rebuilds, where the problem is the code, the architecture, or the ability to scale, and the existing data is basically sound. There the data layer is the stable ground you build the migration on.

It breaks down when the rebuild exists precisely because the old data model cannot carry the product any longer. If the new system needs a different shape from its first day, you cannot defer the change, and sharing one layer is off the table. Then the honest path is an expand-and-contract migration: every schema change stays backward compatible in both directions, the old and new shapes coexist for a while, and you remove the old one only after everything reads the new.

Expand-and-contract runs in three moves. You expand the schema so it holds both the old and the new shape at once. You migrate, backfilling the new shape in batches and writing to both while the systems cut across. Then you contract, removing the old shape once nothing reads it any more. Every step is backward compatible, which is what keeps it safe, and also what makes it slow: you are evolving the data in place under live traffic rather than leaving it still.

That path is real work, and it is harder than what we did on CORE Home, so the first thing to settle is whether you actually need it. Most teams assume they do and are wrong. The data is usually fine. The honest test is whether a single tenant can run on the rebuilt backend against the existing schema. If yes, share the layer and defer the changes. If no, you are in expand-and-contract, and you should know that going in.

What we tell a team rebuilding a live system

Treat the data layer as the thing you do not move. The architecture decides where you end up, but the data is where the risk concentrates, because data is the part you cannot cleanly undo once customers have touched it. Keep it still while the code changes, and you keep the one variable that makes a rebuild provable.

Three rules carry most of it. Run the old and new systems over one shared data layer, so read parity is automatic and verification is a comparison. Give the new backend the guardrails the old one already had, because a write bug on shared data can reach every tenant, including the ones that never moved. And defer every schema change until a single backend owns the data, so the irreversible work happens after the risky part is done.

The discipline is in resisting the urge to change everything in one heroic release, and in being honest that a shared data layer trades a sync problem for a blast-radius problem. On a sound data model, with one team owning both sides and a fixed end date, that is the right trade.

Keeping the data still is one decision inside the larger job of taking over a live system at scale. This is the work we do when we rebuild and scale an existing product, and it usually grows into a longer engagement. For the screens, the stack, and the numbers behind the CORE Home rebuild, the case study has the full detail.

Frequently asked questions

Should you migrate the data and rebuild the code at the same time?

It is the riskiest way to do it. Two moving parts means a bad result could come from the new code or the new data shape, and you cannot tell which. Keeping the data layer unchanged while you swap the backend leaves one variable, so any difference points straight at the code.

How do you get parity between an old and a new backend?

Point both at the same data layer and compare them on the read paths, where it is safe to run both against the same records. A mismatch on a read points straight at the new code, because the data is identical. Write paths are verified per tenant before the cutover, never by running both writers at once.

When should you change the database schema during a rewrite?

After the cutover, not during it. While two systems share the data, every schema change has to be safe for both, which is slow and error-prone. Once the last tenant has moved and a single backend owns the data, you can change the schema freely without coordinating two readers.

When does a shared data layer not work?

When the rebuild exists because the old data model itself cannot carry the product. If the new system needs a different shape from day one, you cannot defer it, and you fall back to an expand-and-contract migration where every change stays backward compatible in both directions. That is harder, so confirm you actually need it.

Is sharing one database between two systems an anti-pattern?

Permanently sharing a database across applications is, because the schema becomes coupling no one owns. A migration is different: it is temporary, one team owns both backends and the schema, and it ends on a fixed date. With a single transactional store and one writer per tenant, you get the upside without the long-term coupling.