Skip to content

Change Data Capture + Event-Driven Architecture

Sponsor: Do you build complex software systems? See how NServiceBus makes it easier to design, build, and manage software systems that use message queues to achieve loose coupling. Get started for free.

Learn more about Software Architecture & Design.
Join thousands of developers getting weekly updates to increase your understanding of software architecture and design concepts.


Change Data Capture (CDC) is a way to monitor and capture changes in data so other systems can react to those changes or stay up-to-date with the data. It isn’t new, but it’s been gaining popularity around event-driven architecture. Let me explain so you don’t shoot yourself in the foot.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Change Data Capture

Change Data Capture (CDC) is a way to monitor and capture changes in the data in your database, often in real-time. Lately, combining CDC tools with an event-driven architecture has been a way to distribute these data changes. While this may sound good, it has some pitfalls, specifically around coupling.

Without CDC, you’d often be polling (pull-based) your database looking for changes on some interval using some scheduled batch job. Some common use cases would be for data warehousing and ETL purposes.

The common problem with this approach is each different process would be polling the database, which would increase the load on your database (or read replica). Another disadvantage of polling is you’re not getting the data changes in near real-time.

With CDC tools this turns into a more real-time push-based model. CDC tools monitor in real-time and publish data changes to consumers.

Tooling Example

Depending on the database you’re using is going to determine what CDC tool you use. For example, Debezium is a CDC tool that is becoming fairly popular and has a variety of connectors that capture changes and then produce events.

A toolchain might look like Debezium getting the data changes from the binlog of MySQL in real-time and then publishing those events to a Solace topic. Many consumers could then consume that topic.

Inside vs. Outside

This is all good, but where are you publishing these data change events, and for whom? I mentioned data warehousing, which can be a good use case. However, because you can publish events to a message broker or an event log, this is typically used to integrate with other logical boundaries. This is where you can shoot yourself in the foot.

If you’re publishing data change events consumed by another logical boundary (service), they are coupled to the internal implementation detail of your database schema. Typicalically, events derived from a CDC tool represent your database schema. If you publish these events and other services consume them, they know about your database schema.

We don’t want this coupling or them to know about our internal database implementation. There’s no difference in leaking your schema via events or another service reaching out directly and querying your database. Not a good thing. We don’t want this coupling.

Translation

To avoid this, you want to have some translation to create an event to be exposed to other logical boundaries. You want to create an integration event, or as I often call them, Outside Events. Outside events are contracts you can independently version from any of your internals. This allows you to change your underlying database schema without breaking consumers.

One important aspect of events is their purpose. If you’re using CDC to generate events, you’re likely thinking about data, not behavior. If you want to expose behavior-driven events, you must infer what happened. It can be very difficult to understand why data changed how it did. For example, you’ll likely end up with a ProductChanged event rather than an InventoryAdjusted event.

Another question you need to ask yourself is, why am I distributing data? Often it is because of wanting to query or UI purposes. Check out my post The Challenge of Microservices: UI Composition

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out my Patreon or YouTube Membership for more info.

Learn more about Software Architecture & Design.
Join thousands of developers getting weekly updates to increase your understanding of software architecture and design concepts.


Leave a Reply

Your email address will not be published. Required fields are marked *