Agile Continuous Delivery of Databases (2)

{Previous Article in Series}

Most applications include a database, and the goal of the project I’m working on is to make Agile continuous delivery possible in the database as well as in procedural code. The peculiar problem with databases is that when a change deployment fails we risk losing data that has been added since the release.

The “blue-green” deployment model is a way of dealing with this. We cut over to an “old version” database. The code and schema of this database pre-date the release, but it has received all the data changes since the release.

How can this be? It’s achieved by flowing client data not just to the new-version database but also to the old one. Does the application have to maintain two connections? No. The parallelism is handled on the server. There’s also a third destination: a “journal”. This is a complete record of the raw data received from the client.

How does the journal help? Well, perhaps the new database version was deployed because the structure of client data had changed – the old version of the application couldn’t store it. We’re rolling back to that version. So now our new data has nowhere to go. Is it at least saved anywhere? Yes, the journal is designed to accept data in any shape, and it will preserve the inputs while we decide what to do with them. This scenario shouldn’t happen if refactoring best practices are followed, but in an imperfect world it’s good to have a safety net.

As a full record of inputs, the journal might also be useful in testing. In cases where the type of inputs received varies, you may want to play back certain inputs from the journal when you are testing future versions of the database.
It might also help the business to meet regulatory requirements, although it probably isn’t a replacement for audit records in the relational database.

What’s the best technology for the journal? Relational databases have an xml data type which could deal with variations in data structure, but that may not be ideal for several reasons. The JSON data type is much less verbose and would fit well with the client application. Not all relational databases have a native JSON type. NoSql databases do handle JSONs, and are designed for heavy loads. A Key-Value store or a Document Database might be a good choice for the journal.

To decide between these two types of database, we should consider what transactional support is required. Will the data be arriving one row at a time, or will it be batched? If it’s batched, and we want the write to succeed or fail as one unit, the Key-Value database is not suitable, so we’ll choose a Document Database. Unlike the Key-Value database, where the data in the Value cannot be queried, in a Document Database we do have access to the interior of the document, which could be useful

I hope these notes are useful: I’ll continue to write them up as I develop the design of the project, and I’ll soon be posting on GitHub some of the code I’ve been writing.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.