SRAL: A Framework for Evaluating Agentic AI Architectures

Sharan, Aakash

Event Sourcing: Store What Happened, Not What Is

July 15, 2023 #Distributed Systems 9 min read

Concentric tree growth rings spreading outward, a symbol of event sourcing: history is preserved layer by layer and the present state is derived from all of it.

Most systems store the present tense.

A row in a table holds the current balance, the current status, the current address. When something changes, you overwrite the old value with the new one. The previous truth is gone. You kept the answer and threw away the question.

Event sourcing inverts that.

Instead of storing the current state, you store the sequence of events that produced it. The deposit, the withdrawal, the correction, the reversal. Current state is no longer the thing you save. It is the thing you derive, by replaying the events from the beginning.

Store what happened, not what is.

That one decision changes the shape of the whole system, and most of the benefits and most of the pain follow from it.

Why the log is a better source of truth

A current-state database is lossy by design. It answers "what is the balance now" and forgets everything else. The moment you need "what was the balance on the day of the dispute," or "how did we arrive at this status," or "replay these decisions against a new rule," you are reconstructing history from logs and prayers.

An event log does not forget. The events are facts, and facts are append-only. Nothing is updated in place; nothing is deleted. You only ever add the next thing that happened.

That gives you properties a mutable table cannot:

A complete audit trail, because the log is the history, not a side-effect bolted on later.
Temporal queries: the state at any past point is just a replay up to that point.
New read models from the same source of truth: when a new question arrives, you build a new projection by replaying the existing events, with no migration of the source of truth. Rebuildable, not free, a projection still costs compute, replay time, and backfill care.
Debuggability: you can replay the exact sequence that produced a bug instead of guessing.

This is the same insight Jay Kreps wrote up in "The Log": an ordered, append-only log is a remarkably general backbone for data systems, because every other representation can be derived from it. Pat Helland reached the same place from the database side. Once you treat data as immutable, the accounting changes: you stop overwriting truth and start accumulating it.

Event sourcing is that idea applied to a single domain entity. The aggregate's life is a stream of events, and its current state is a fold over that stream.

The write side and the read side are different systems

Here is the part that trips up teams who adopt event sourcing as if it were just a different way to save rows.

You almost never query the event log directly to answer a user's question. Replaying every event of every entity to render a list view would be absurd. So event sourcing pulls naturally toward CQRS, the separation Greg Young named and Martin Fowler later wrote down: the model you write through is not the model you read from.

The write side is the event-sourced aggregate. It loads its history, validates a command against its current derived state, and appends new events. That is the source of truth, and it is optimized for consistency and correctness, one entity at a time.

The read side is a set of projections. A projection consumes the event stream and maintains a query-friendly view: a SQL table, a denormalized document, a search index, a counter. When you need a new view, you add a new projection and replay the events into it. In the Akka world this is exactly what Akka Persistence (the write-side journal) and Akka Projections (the read-side consumers) divide between them, with the projection tracking its own offset into the stream so it can stop, restart, and scale without losing its place. That offset is also where flow control belongs: a projection that cannot keep up with the stream has to slow its intake rather than fall over, the same backpressure discipline any stream consumer needs.

The consequence is the thing nobody can design away: the read side is eventually consistent with the write side. The event is committed before the projection catches up. For most views that lag is invisible. For "show the user the thing they just created," it is a support ticket. You either design the UI to tolerate the lag, or you read the freshly derived state from the write side for that one case. What you do not get to do is pretend the gap is not there.

The pitfalls that actually bite

The benefits are easy to sell. The pitfalls are where event-sourced systems get into trouble years later, and they are worth naming plainly.

Events are forever, so schema evolution is the real tax. A mutable row can be migrated. An event that you wrote three years ago will still be replayed tomorrow, exactly as it was written. The moment your event shape changes, you own every old version of it forever: upcasting old events to new shapes, versioning the schema, tolerating fields that did not exist yet. This is the single most underestimated cost of event sourcing. Decide your event versioning strategy before you have a million events, not after.

Snapshots are an optimization, not a feature. Replaying an aggregate with a long history on every load gets expensive. The answer is snapshots: periodically persist the derived state so replay can start from the latest snapshot instead of the beginning. Useful, and a quiet source of bugs, because a snapshot is derived data. If your fold logic changes, old snapshots may encode state your new code would never produce. Snapshots are a cache of the replay, and caches need invalidation discipline.

The dual-write problem will find you. The cardinal rule is that appending the event and updating the world must not be two separate, non-atomic writes. If you write the event to the journal and then publish to a message broker as a second step, a crash between them leaves the two permanently disagreeing. The event log has to be the single atomic write, and everything else, projections, integration messages, must be derived from it after the fact. This is the same reasoning behind the transactional outbox pattern and behind Helland's argument that you build reliable systems from idempotent work and local transactions, not distributed ones.

Idempotency is not optional on the read side. Projections and downstream consumers will, under failure and restart, see the same event more than once. At-least-once delivery is the honest default; exactly-once is mostly a story we tell about an at-least-once system with idempotent handlers and tracked offsets. If applying an event twice corrupts your read model, the read model is wrong, not the delivery.

Immutability collides with the right to be forgotten. An append-only log that never deletes is a beautiful audit trail and a compliance problem the first time someone invokes a deletion right over personal data. You cannot rewrite history without breaking the model. A common architectural approach is to keep the personal data out of the events and behind a key you can destroy, so deleting the key renders the event unreadable. Treat that as a design pattern, not legal guidance; what actually satisfies a given regulation is a question for counsel, not for your journal. Plan for it at design time regardless, because retrofitting forgetting into an immutable log is genuinely hard.

When not to do it

Event sourcing is not a default. It is a deliberate trade: more power and history in exchange for more moving parts and more discipline.

If a domain has no meaningful history, no audit requirement, no temporal questions, and no need for multiple independent read models, a boring CRUD table is the correct and honest answer. Event-sourcing a settings page is how you turn a one-day feature into a three-week architecture.

The places it earns its cost are the ones where the history is the product: ledgers, orders, trading, inventory, anything where "how did we get here" is a question someone will eventually need answered under pressure. There, the append-only log is not overhead. It is the asset.

The discipline is to apply it where history matters and resist it everywhere else.

The takeaway

Event sourcing is not a storage trick. It is a decision about where truth lives.

When you store current state, the database is the truth and history is a side-effect you reconstruct if you are lucky. When you store events, the log is the truth and current state is a view you can always rebuild. The first is cheaper to start. The second is the one that still has the answers when someone asks a question you did not anticipate.

Keep the events. The state can always be derived. The history cannot be recovered once it is overwritten.

Sources

Martin Fowler, "Event Sourcing" (martinfowler.com, 2005): https://martinfowler.com/eaaDev/EventSourcing.html
Martin Fowler, "CQRS" (martinfowler.com, 2011): https://martinfowler.com/bliki/CQRS.html
Greg Young, "CQRS Documents" (2010).
Jay Kreps, "The Log: What every software engineer should know about real-time data's unifying abstraction" (LinkedIn Engineering, 2013): https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
Pat Helland, "Immutability Changes Everything" (ACM Queue, 2015): https://queue.acm.org/detail.cfm?id=2884038
Pat Helland, "Life beyond Distributed Transactions: an Apostate's Opinion" (CIDR 2007; ACM Queue 2016).
Chris Richardson, "Pattern: Transactional Outbox" (microservices.io): https://microservices.io/patterns/data/transactional-outbox.html
Akka Persistence documentation (Akka core, Event Sourcing; since 2.6.0, 2019): https://doc.akka.io/libraries/akka-core/current/typed/persistence.html
Akka Projections documentation (a separate Akka library; since 1.0.0, 2020): https://doc.akka.io/libraries/akka-projection/current/overview.html